date:20140717


[ 
https://issues.apache.org/jira/browse/HDFS-5624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064789#comment-14064789
 ] 

Hudson commented on HDFS-5624:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #615 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/615/])
HDFS-5624. Add HDFS tests for ACLs in combination with viewfs. Contributed by 
Stephen Chu. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1611068)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/viewfs/TestViewFileSystemWithAcls.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/viewfs/TestViewFsWithAcls.java


 Add HDFS tests for ACLs in combination with viewfs.
 ---

 Key: HDFS-5624
 URL: https://issues.apache.org/jira/browse/HDFS-5624
 Project: Hadoop HDFS
  Issue Type: Test
  Components: hdfs-client, test
Affects Versions: 3.0.0, 2.5.0
Reporter: Chris Nauroth
Assignee: Stephen Chu
 Fix For: 3.0.0, 2.6.0

 Attachments: HDFS-5624.001.patch, HDFS-5624.002.patch, 
 HDFS-5624.003.patch


 Add tests verifying that in a federated deployment, a viewfs wrapped over 
 multiple federated NameNodes will dispatch the ACL operations to the correct 
 NameNode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6689) NFS doesn't return correct lookup access for directories


[ 
https://issues.apache.org/jira/browse/HDFS-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064788#comment-14064788
 ] 

Hudson commented on HDFS-6689:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #615 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/615/])
HDFS-6689. NFS doesn't return correct lookup access for direcories. Contributed 
by Brandon Li (brandonli: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1611135)
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/Nfs3Utils.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/test/java/org/apache/hadoop/hdfs/nfs/nfs3/TestNfs3Utils.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 NFS doesn't return correct lookup access for directories
 

 Key: HDFS-6689
 URL: https://issues.apache.org/jira/browse/HDFS-6689
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.2.0
Reporter: Yesha Vora
Assignee: Brandon Li
 Fix For: 2.6.0

 Attachments: HDFS-6689.patch


 NFS does not allow other user to access a file with 644 permission and a 
 parent with 711 access right.
 Steps to reproduce:
 1. Create a directory /user/userX with 711 permissions
 2. Upload a file at /user/userX/TestFile with 644 as userX 
 3. Try to access WriteTest as userY.
  HDFS will allow to read TestFile. 
 {noformat}
 bash-4.1$ id
 uid=661(userY) gid=100(users) groups=100(users),13016(groupY)
 bash-4.1$ hdfs dfs -cat /user/userX/TestFile
 create a file with some content
 {noformat}
  NFS will not allow to read TestFile.
 {noformat}
 bash-4.1$ cat /tmp/tmp_mnt/user/userX/TestFile
 cat: /tmp/tmp_mnt/user/userX/TestFile: Permission denied
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6690) Deduplicate xattr names in memory


[ 
https://issues.apache.org/jira/browse/HDFS-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064799#comment-14064799
 ] 

Hudson commented on HDFS-6690:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #615 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/615/])
HDFS-6690. Deduplicate xattr names in memory. (wang) (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1611226)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/XAttrStorage.java


 Deduplicate xattr names in memory
 -

 Key: HDFS-6690
 URL: https://issues.apache.org/jira/browse/HDFS-6690
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.5.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Fix For: 2.6.0

 Attachments: hdfs-6690.001.patch, hdfs-6690.002.patch, 
 hdfs-6690.003.patch


 When the same string is used repeatedly for an xattr name, we could 
 potentially save some NN memory by deduplicating the strings.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-2538) option to disable fsck dots


[ 
https://issues.apache.org/jira/browse/HDFS-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064798#comment-14064798
 ] 

Hudson commented on HDFS-2538:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #615 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/615/])
HDFS-2538. option to disable fsck dots. Contributed by Mohammad Kamrul Islam. 
(aw: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1611220)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/CommandsManual.apt.vm
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSck.java


 option to disable fsck dots 
 

 Key: HDFS-2538
 URL: https://issues.apache.org/jira/browse/HDFS-2538
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Allen Wittenauer
Assignee: Mohammad Kamrul Islam
Priority: Minor
  Labels: newbie
 Fix For: 3.0.0

 Attachments: HDFS-2538-branch-0.20-security-204.patch, 
 HDFS-2538-branch-0.20-security-204.patch, HDFS-2538-branch-1.0.patch, 
 HDFS-2538.1.patch, HDFS-2538.2.patch, HDFS-2538.3.patch


 this patch turns the dots during fsck off by default and provides an option 
 to turn them back on if you have a fetish for millions and millions of dots 
 on your terminal.  i haven't done any benchmarks, but i suspect fsck is now 
 300% faster to boot.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-6699) Secure Windows DFS read when client co-located on nodes with data (short-circuit reads)

Remus Rusanu created HDFS-6699:
--

Summary: Secure Windows DFS read when client co-located on nodes
with data (short-circuit reads)
Key: HDFS-6699
URL: https://issues.apache.org/jira/browse/HDFS-6699
Project: Hadoop HDFS
Issue Type: Improvement
Components: datanode, hdfs-client, performance, security
Reporter: Remus Rusanu

A complication arises from the requirement to duplicate the handle in the hdfs
client process. Ordinary processes (as we desire datanode to run) do not have
the required privilege (SeDebugPrivilege). But with introduction of an elevated
service helper for the namenode Windows Secure Container Executor (YARN-2198)
we have at our disposal an elevated executor that can do the job of duplicating
the handle. the namenode would communicate with this process using the same
mechanism as the nodemanager, ie. LRPC.

With my proposed implementation the sequence of actions is as follows:

- the hdfs client requests Windows secure shortcircuit of a block in the data
transfer protocol. It passes the block, the token and its own process ID.
- datanode approves short-circuit. It opens the block file and obtains the
handle.
- datanode invokes the elevated privilege service to duplicate the handle into
the hdfs client process. datanode invokes the service LRPC interface over JNI
(LRPC being the Windows de-facto standard for interoperating with a service).
It passes the handle valeu, its own process id and the hdfs client process id.
- The elevated service duplicates the handle from the datanode process into
the hdfs client proces. It returns the duplicate handle value to the datanode
as output value from the LRPC call
- x 2 for CRC file
- the datanode responds to the short circuit datatransfer protocol request
with a message that contains the duplicate handle value (handles actually, x2
from CRC)
- the hdfs-client creates a Java stream that wraps the handles and reads the
block from this stream (ditto for CRC)

datanode needs to exercise care not to duplicate the same handle to different
clients (including the CRC handles) because a handle abstracts also the file
position and clients would inadvertently move each other file pointer to chaos
results.

TBD a mitigation for process ID reuse (the hdfs client can be terminated
immediately after the block request and a new process could reuse the same ID)
. In theory an attacker could use this as a mechanism to obtain a handle to a
block by killing the hdfs-client at the right moment and swing new processes
until it gets one with the desired ID. I'm not sure is a realistic threat
because the attacker already must have the privilege to kill the hdfs client
process, and having such privilege he could obtain the handle by other means
(eg. debug/inspect hdfs client process).

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6699) Secure Windows DFS read when client co-located on nodes with data (short-circuit reads)

[
https://issues.apache.org/jira/browse/HDFS-6699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Remus Rusanu updated HDFS-6699:
---

Description:
HDFS-347 Introduced secure short-circuit HDFS reads based on linux domain
sockets. Similar capability can be introduced in a secure Windows environment
using
[DuplicateHandle](http://msdn.microsoft.com/en-us/library/windows/desktop/ms724251(v=vs.85).aspx)
Win32 API. When short-circuit is allowed the datanode would open the block
file and then duplicate the handle into the hdfs client process and return to
the process the handle value. The hdfs client can then open a Java stream on
this handle and read the file. This is a secure mechanism, the HDFS acls are
validated by the namenode and the process does not gets direct access to the
file in a controlled manner (eg. read-only). The hdfs client process does not
need to have OS level access privilege to the block file.

A complication arises from the requirement to duplicate the handle in the hdfs
client process. Ordinary processes (as we desire datanode to run) do not have
the required privilege (SeDebugPrivilege). But with introduction of an elevated
service helper for the nodemanager Windows Secure Container Executor
(YARN-2198) we have at our disposal an elevated executor that can do the job of
duplicating the handle. The datanode would communicate with this process using
the same mechanism as the nodemanager, ie. LRPC.

With my proposed implementation the sequence of actions is as follows:

was:
HDFS-347 Introduced secure short-circuit HDFS reads based on linux domain
sockets. Similar capability can be introduced in a secure Windows environment
using
[DuplicateHandle](http://msdn.microsoft.com/en-us/library/windows/desktop/ms724251(v=vs.85).aspx)
Win32 API. When short-circuit is allowed the datanode would open the block
file and then duplicate the handle into the hdfs client process and return to
the process the handle value. The hdfs client can then open a Java stream on
this handle and read the file. This is a secure mechanism, the HDFS acls are
validated by the namenode and the process does not gets direct access to the
file in a controlled manner (eg. read-only). The hdfs client process does not
need to have OS level access privilege to the block file.

With my proposed implementation the sequence of actions is as follows:

- the hdfs client requests Windows secure shortcircuit of a block in the data
transfer protocol. It passes the block, the token and its

[jira] [Updated] (HDFS-6680) BlockPlacementPolicyDefault does not choose favored nodes correctly


 [ 
https://issues.apache.org/jira/browse/HDFS-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-6680:
--

Attachment: h6680_20140716.patch

h6680_20140716.patch: fixes another bug in the loop and updates a test.

 BlockPlacementPolicyDefault does not choose favored nodes correctly
 ---

 Key: HDFS-6680
 URL: https://issues.apache.org/jira/browse/HDFS-6680
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h6680_20140714.patch, h6680_20140716.patch


 In one of the chooseTarget(..) methods, it tries all the favoredNodes to 
 chooseLocalNode(..).  It expects chooseLocalNode to return null if the local 
 node is not a good target.  Unfortunately, chooseLocalNode will fallback to 
 chooseLocalRack but not returning null.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6700) BlockPlacementPolicy shoud choose storage but not datanode for deletion


 [ 
https://issues.apache.org/jira/browse/HDFS-6700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-6700:
--

Attachment: h6700_20140717.patch

h6700_20140717.patch: changes to choose storages.

 BlockPlacementPolicy shoud choose storage but not datanode for deletion
 ---

 Key: HDFS-6700
 URL: https://issues.apache.org/jira/browse/HDFS-6700
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h6700_20140717.patch


 HDFS-2832 changed datanode storage model from a single storage, which may 
 correspond to multiple physical storage medias, to a collection of storages 
 with each storage corresponding to a physical storage media.
 BlockPlacementPolicy.chooseReplicaToDelete still chooses replica in term of 
 DatanodeDescriptor but not DatanodeStorageInfo.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6700) BlockPlacementPolicy shoud choose storage but not datanode for deletion


 [ 
https://issues.apache.org/jira/browse/HDFS-6700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-6700:
--

Status: Patch Available  (was: Open)

 BlockPlacementPolicy shoud choose storage but not datanode for deletion
 ---

 Key: HDFS-6700
 URL: https://issues.apache.org/jira/browse/HDFS-6700
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h6700_20140717.patch


 HDFS-2832 changed datanode storage model from a single storage, which may 
 correspond to multiple physical storage medias, to a collection of storages 
 with each storage corresponding to a physical storage media.
 BlockPlacementPolicy.chooseReplicaToDelete still chooses replica in term of 
 DatanodeDescriptor but not DatanodeStorageInfo.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HDFS-6700) BlockPlacementPolicy shoud choose storage but not datanode for deletion

Tsz Wo Nicholas Sze created HDFS-6700:
-

 Summary: BlockPlacementPolicy shoud choose storage but not 
datanode for deletion
 Key: HDFS-6700
 URL: https://issues.apache.org/jira/browse/HDFS-6700
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h6700_20140717.patch

HDFS-2832 changed datanode storage model from a single storage, which may 
correspond to multiple physical storage medias, to a collection of storages 
with each storage corresponding to a physical storage media.

BlockPlacementPolicy.chooseReplicaToDelete still chooses replica in term of 
DatanodeDescriptor but not DatanodeStorageInfo.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6700) BlockPlacementPolicy shoud choose storage but not datanode for deletion


 [ 
https://issues.apache.org/jira/browse/HDFS-6700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-6700:
--

Component/s: namenode

 BlockPlacementPolicy shoud choose storage but not datanode for deletion
 ---

 Key: HDFS-6700
 URL: https://issues.apache.org/jira/browse/HDFS-6700
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h6700_20140717.patch


 HDFS-2832 changed datanode storage model from a single storage, which may 
 correspond to multiple physical storage medias, to a collection of storages 
 with each storage corresponding to a physical storage media.
 BlockPlacementPolicy.chooseReplicaToDelete still chooses replica in term of 
 DatanodeDescriptor but not DatanodeStorageInfo.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6671) Archival Storage: Consider block storage policy in replicaiton


 [ 
https://issues.apache.org/jira/browse/HDFS-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-6671:
--

Attachment: h6671_20140717.patch

h6671_20140717.patch: renames the chooseTarget methods as Vinay suggested.

 Archival Storage: Consider block storage policy in replicaiton
 --

 Key: HDFS-6671
 URL: https://issues.apache.org/jira/browse/HDFS-6671
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h6671_20140714.patch, h6671_20140715.patch, 
 h6671_20140715b.patch, h6671_20140715c.patch, h6671_20140717.patch


 In order to satisfy storage policy requirement, replication monitor in 
 addition reads storage policy information from INodeFile when performing 
 replication.  As before, it only adds replicas if a block is under 
 replicated, and deletes replicas if a block is over replicated.  It will NOT 
 move replicas around for satisfying storage policy requirement.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-6671) Archival Storage: Consider block storage policy in replicaiton


 [ 
https://issues.apache.org/jira/browse/HDFS-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze resolved HDFS-6671.
---

   Resolution: Fixed
Fix Version/s: Archival Storage (HDFS-6584)
 Hadoop Flags: Reviewed

Thanks Vinay for reviewing the patches.

I have committed this.

 Archival Storage: Consider block storage policy in replicaiton
 --

 Key: HDFS-6671
 URL: https://issues.apache.org/jira/browse/HDFS-6671
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Fix For: Archival Storage (HDFS-6584)

 Attachments: h6671_20140714.patch, h6671_20140715.patch, 
 h6671_20140715b.patch, h6671_20140715c.patch, h6671_20140717.patch


 In order to satisfy storage policy requirement, replication monitor in 
 addition reads storage policy information from INodeFile when performing 
 replication.  As before, it only adds replicas if a block is under 
 replicated, and deletes replicas if a block is over replicated.  It will NOT 
 move replicas around for satisfying storage policy requirement.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-4154) BKJM: Two namenodes usng bkjm can race to create the version znode


[ 
https://issues.apache.org/jira/browse/HDFS-4154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064893#comment-14064893
 ] 

Rakesh R commented on HDFS-4154:


Hi [~yians], Yeah I got your point. In this case we need to have zk-lock 
mechanism to fully isolate the format(). Otherwise there could be different 
scenarios it can hit exceptions.

But I have a different thought after seeing BOOTSTRAPSTANDBY option. IMHO it is 
not required to add extra logic to handle the concurrency between two FORMAT 
calls, if user can ensure that they will call FORMAT only on one NN server and 
other server will start the NN server with BOOTSTRAPSTANDBY option. Whats your 
opinion on this?

 BKJM: Two namenodes usng bkjm can race to create the version znode
 --

 Key: HDFS-4154
 URL: https://issues.apache.org/jira/browse/HDFS-4154
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: 3.0.0, 2.0.3-alpha
Reporter: Ivan Kelly
Assignee: Han Xiao
 Attachments: HDFS-4154.patch


 nd one will get the following error.
 2012-11-06 10:04:00,200 INFO 
 hidden.bkjournal.org.apache.zookeeper.ClientCnxn: Session establishment 
 complete on server 109-231-69-172.flexiscale.com/109.231.69.172:2181, 
 sessionid = 0x13ad528fcfe0005, negotiated timeout = 4000
 2012-11-06 10:04:00,710 FATAL 
 org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
 java.lang.IllegalArgumentException: Unable to construct journal, 
 bookkeeper://109.231.69.172:2181;109.231.69.173:2181;109.231.69.174:2181/hdfsjournal
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1251)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournals(FSEditLog.java:226)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.initSharedJournalsForRead(FSEditLog.java:206)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.initEditLog(FSImage.java:657)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:590)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:259)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:544)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:423)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:385)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:401)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:435)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:611)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:592)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1135)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1201)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1249)
 ... 14 more
 Caused by: java.io.IOException: Error initializing zk
 at 
 org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager.init(BookKeeperJournalManager.java:233)
 ... 19 more
 Caused by: 
 hidden.bkjournal.org.apache.zookeeper.KeeperException$NodeExistsException: 
 KeeperErrorCode = NodeExists for /hdfsjournal/version
 at 
 hidden.bkjournal.org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
 at 
 hidden.bkjournal.org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at 
 hidden.bkjournal.org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:778)
 at 
 org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager.init(BookKeeperJournalManager.java:222)
 ... 19 more



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-5624) Add HDFS tests for ACLs in combination with viewfs.


[ 
https://issues.apache.org/jira/browse/HDFS-5624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064897#comment-14064897
 ] 

Hudson commented on HDFS-5624:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1834 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1834/])
HDFS-5624. Add HDFS tests for ACLs in combination with viewfs. Contributed by 
Stephen Chu. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1611068)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/viewfs/TestViewFileSystemWithAcls.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/viewfs/TestViewFsWithAcls.java


 Add HDFS tests for ACLs in combination with viewfs.
 ---

 Key: HDFS-5624
 URL: https://issues.apache.org/jira/browse/HDFS-5624
 Project: Hadoop HDFS
  Issue Type: Test
  Components: hdfs-client, test
Affects Versions: 3.0.0, 2.5.0
Reporter: Chris Nauroth
Assignee: Stephen Chu
 Fix For: 3.0.0, 2.6.0

 Attachments: HDFS-5624.001.patch, HDFS-5624.002.patch, 
 HDFS-5624.003.patch


 Add tests verifying that in a federated deployment, a viewfs wrapped over 
 multiple federated NameNodes will dispatch the ACL operations to the correct 
 NameNode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-2538) option to disable fsck dots


[ 
https://issues.apache.org/jira/browse/HDFS-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064905#comment-14064905
 ] 

Hudson commented on HDFS-2538:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1834 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1834/])
HDFS-2538. option to disable fsck dots. Contributed by Mohammad Kamrul Islam. 
(aw: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1611220)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/CommandsManual.apt.vm
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSck.java


 option to disable fsck dots 
 

 Key: HDFS-2538
 URL: https://issues.apache.org/jira/browse/HDFS-2538
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Allen Wittenauer
Assignee: Mohammad Kamrul Islam
Priority: Minor
  Labels: newbie
 Fix For: 3.0.0

 Attachments: HDFS-2538-branch-0.20-security-204.patch, 
 HDFS-2538-branch-0.20-security-204.patch, HDFS-2538-branch-1.0.patch, 
 HDFS-2538.1.patch, HDFS-2538.2.patch, HDFS-2538.3.patch


 this patch turns the dots during fsck off by default and provides an option 
 to turn them back on if you have a fetish for millions and millions of dots 
 on your terminal.  i haven't done any benchmarks, but i suspect fsck is now 
 300% faster to boot.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6690) Deduplicate xattr names in memory


[ 
https://issues.apache.org/jira/browse/HDFS-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064906#comment-14064906
 ] 

Hudson commented on HDFS-6690:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1834 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1834/])
HDFS-6690. Deduplicate xattr names in memory. (wang) (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1611226)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/XAttrStorage.java


 Deduplicate xattr names in memory
 -

 Key: HDFS-6690
 URL: https://issues.apache.org/jira/browse/HDFS-6690
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.5.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Fix For: 2.6.0

 Attachments: hdfs-6690.001.patch, hdfs-6690.002.patch, 
 hdfs-6690.003.patch


 When the same string is used repeatedly for an xattr name, we could 
 potentially save some NN memory by deduplicating the strings.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6689) NFS doesn't return correct lookup access for directories


[ 
https://issues.apache.org/jira/browse/HDFS-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064896#comment-14064896
 ] 

Hudson commented on HDFS-6689:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1834 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1834/])
HDFS-6689. NFS doesn't return correct lookup access for direcories. Contributed 
by Brandon Li (brandonli: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1611135)
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/Nfs3Utils.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/test/java/org/apache/hadoop/hdfs/nfs/nfs3/TestNfs3Utils.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 NFS doesn't return correct lookup access for directories
 

 Key: HDFS-6689
 URL: https://issues.apache.org/jira/browse/HDFS-6689
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.2.0
Reporter: Yesha Vora
Assignee: Brandon Li
 Fix For: 2.6.0

 Attachments: HDFS-6689.patch


 NFS does not allow other user to access a file with 644 permission and a 
 parent with 711 access right.
 Steps to reproduce:
 1. Create a directory /user/userX with 711 permissions
 2. Upload a file at /user/userX/TestFile with 644 as userX 
 3. Try to access WriteTest as userY.
  HDFS will allow to read TestFile. 
 {noformat}
 bash-4.1$ id
 uid=661(userY) gid=100(users) groups=100(users),13016(groupY)
 bash-4.1$ hdfs dfs -cat /user/userX/TestFile
 create a file with some content
 {noformat}
  NFS will not allow to read TestFile.
 {noformat}
 bash-4.1$ cat /tmp/tmp_mnt/user/userX/TestFile
 cat: /tmp/tmp_mnt/user/userX/TestFile: Permission denied
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-4120) Add a new -skipSharedEditsCheck option for BootstrapStandby


[ 
https://issues.apache.org/jira/browse/HDFS-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064915#comment-14064915
 ] 

Rakesh R commented on HDFS-4120:


Hi [~vinayrpet], 

It looks like test case failure is not related to this patch. Please have a 
look at the patch when you get sometime.

Thanks,
Rakesh

 Add a new -skipSharedEditsCheck option for BootstrapStandby
 -

 Key: HDFS-4120
 URL: https://issues.apache.org/jira/browse/HDFS-4120
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: ha, namenode
Affects Versions: 3.0.0, 2.0.2-alpha
Reporter: Liang Xie
Assignee: Liang Xie
Priority: Minor
 Attachments: HDFS-4120.patch, HDFS-4120.patch, HDFS-4120.txt


 Per 
 https://issues.apache.org/jira/browse/HDFS-3752?focusedCommentId=13449466page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13449466
  , let's introduce a new option, it should be very safe, but really useful 
 for some corner case.  e.g. when SNN losts local storage, we need to reset 
 SNN, but in current trunk, it'll always get a FATAL msg and could never be 
 successful. Another workaroud for this case, is full-sync the current 
 directory from ANN, but it'll be cost more disk-space  net bandwidth, IMHO.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-2538) option to disable fsck dots


[ 
https://issues.apache.org/jira/browse/HDFS-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064926#comment-14064926
 ] 

Hudson commented on HDFS-2538:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1807 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1807/])
HDFS-2538. option to disable fsck dots. Contributed by Mohammad Kamrul Islam. 
(aw: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1611220)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/CommandsManual.apt.vm
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSck.java


 option to disable fsck dots 
 

 Key: HDFS-2538
 URL: https://issues.apache.org/jira/browse/HDFS-2538
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.2.0
Reporter: Allen Wittenauer
Assignee: Mohammad Kamrul Islam
Priority: Minor
  Labels: newbie
 Fix For: 3.0.0

 Attachments: HDFS-2538-branch-0.20-security-204.patch, 
 HDFS-2538-branch-0.20-security-204.patch, HDFS-2538-branch-1.0.patch, 
 HDFS-2538.1.patch, HDFS-2538.2.patch, HDFS-2538.3.patch


 this patch turns the dots during fsck off by default and provides an option 
 to turn them back on if you have a fetish for millions and millions of dots 
 on your terminal.  i haven't done any benchmarks, but i suspect fsck is now 
 300% faster to boot.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6690) Deduplicate xattr names in memory


[ 
https://issues.apache.org/jira/browse/HDFS-6690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064927#comment-14064927
 ] 

Hudson commented on HDFS-6690:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1807 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1807/])
HDFS-6690. Deduplicate xattr names in memory. (wang) (wang: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1611226)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/XAttrStorage.java


 Deduplicate xattr names in memory
 -

 Key: HDFS-6690
 URL: https://issues.apache.org/jira/browse/HDFS-6690
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.5.0
Reporter: Andrew Wang
Assignee: Andrew Wang
 Fix For: 2.6.0

 Attachments: hdfs-6690.001.patch, hdfs-6690.002.patch, 
 hdfs-6690.003.patch


 When the same string is used repeatedly for an xattr name, we could 
 potentially save some NN memory by deduplicating the strings.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-5624) Add HDFS tests for ACLs in combination with viewfs.


[ 
https://issues.apache.org/jira/browse/HDFS-5624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064918#comment-14064918
 ] 

Hudson commented on HDFS-5624:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1807 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1807/])
HDFS-5624. Add HDFS tests for ACLs in combination with viewfs. Contributed by 
Stephen Chu. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1611068)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/viewfs/TestViewFileSystemWithAcls.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/viewfs/TestViewFsWithAcls.java


 Add HDFS tests for ACLs in combination with viewfs.
 ---

 Key: HDFS-5624
 URL: https://issues.apache.org/jira/browse/HDFS-5624
 Project: Hadoop HDFS
  Issue Type: Test
  Components: hdfs-client, test
Affects Versions: 3.0.0, 2.5.0
Reporter: Chris Nauroth
Assignee: Stephen Chu
 Fix For: 3.0.0, 2.6.0

 Attachments: HDFS-5624.001.patch, HDFS-5624.002.patch, 
 HDFS-5624.003.patch


 Add tests verifying that in a federated deployment, a viewfs wrapped over 
 multiple federated NameNodes will dispatch the ACL operations to the correct 
 NameNode.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6689) NFS doesn't return correct lookup access for directories


[ 
https://issues.apache.org/jira/browse/HDFS-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064917#comment-14064917
 ] 

Hudson commented on HDFS-6689:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1807 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1807/])
HDFS-6689. NFS doesn't return correct lookup access for direcories. Contributed 
by Brandon Li (brandonli: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1611135)
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/Nfs3Utils.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs-nfs/src/test/java/org/apache/hadoop/hdfs/nfs/nfs3/TestNfs3Utils.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 NFS doesn't return correct lookup access for directories
 

 Key: HDFS-6689
 URL: https://issues.apache.org/jira/browse/HDFS-6689
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.2.0
Reporter: Yesha Vora
Assignee: Brandon Li
 Fix For: 2.6.0

 Attachments: HDFS-6689.patch


 NFS does not allow other user to access a file with 644 permission and a 
 parent with 711 access right.
 Steps to reproduce:
 1. Create a directory /user/userX with 711 permissions
 2. Upload a file at /user/userX/TestFile with 644 as userX 
 3. Try to access WriteTest as userY.
  HDFS will allow to read TestFile. 
 {noformat}
 bash-4.1$ id
 uid=661(userY) gid=100(users) groups=100(users),13016(groupY)
 bash-4.1$ hdfs dfs -cat /user/userX/TestFile
 create a file with some content
 {noformat}
  NFS will not allow to read TestFile.
 {noformat}
 bash-4.1$ cat /tmp/tmp_mnt/user/userX/TestFile
 cat: /tmp/tmp_mnt/user/userX/TestFile: Permission denied
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-4154) BKJM: Two namenodes usng bkjm can race to create the version znode


[ 
https://issues.apache.org/jira/browse/HDFS-4154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064939#comment-14064939
 ] 

Hadoop QA commented on HDFS-4154:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12562773/HDFS-4154.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7371//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7371//console

This message is automatically generated.

 BKJM: Two namenodes usng bkjm can race to create the version znode
 --

 Key: HDFS-4154
 URL: https://issues.apache.org/jira/browse/HDFS-4154
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: 3.0.0, 2.0.3-alpha
Reporter: Ivan Kelly
Assignee: Han Xiao
 Attachments: HDFS-4154.patch


 nd one will get the following error.
 2012-11-06 10:04:00,200 INFO 
 hidden.bkjournal.org.apache.zookeeper.ClientCnxn: Session establishment 
 complete on server 109-231-69-172.flexiscale.com/109.231.69.172:2181, 
 sessionid = 0x13ad528fcfe0005, negotiated timeout = 4000
 2012-11-06 10:04:00,710 FATAL 
 org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
 java.lang.IllegalArgumentException: Unable to construct journal, 
 bookkeeper://109.231.69.172:2181;109.231.69.173:2181;109.231.69.174:2181/hdfsjournal
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1251)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournals(FSEditLog.java:226)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.initSharedJournalsForRead(FSEditLog.java:206)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.initEditLog(FSImage.java:657)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:590)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:259)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:544)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:423)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:385)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:401)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:435)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:611)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:592)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1135)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1201)
 Caused by: java.lang.reflect.InvocationTargetException
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1249)
 ... 14 more
 Caused by: java.io.IOException: Error initializing zk
 at 
 org.apache.hadoop.contrib.bkjournal.BookKeeperJournalManager.init(BookKeeperJournalManager.java:233)
 ... 19 more
 Caused by:

[jira] [Commented] (HDFS-6680) BlockPlacementPolicyDefault does not choose favored nodes correctly


[ 
https://issues.apache.org/jira/browse/HDFS-6680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064949#comment-14064949
 ] 

Hadoop QA commented on HDFS-6680:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656261/h6680_20140716.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7369//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7369//console

This message is automatically generated.

 BlockPlacementPolicyDefault does not choose favored nodes correctly
 ---

 Key: HDFS-6680
 URL: https://issues.apache.org/jira/browse/HDFS-6680
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h6680_20140714.patch, h6680_20140716.patch


 In one of the chooseTarget(..) methods, it tries all the favoredNodes to 
 chooseLocalNode(..).  It expects chooseLocalNode to return null if the local 
 node is not a good target.  Unfortunately, chooseLocalNode will fallback to 
 chooseLocalRack but not returning null.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Work started] (HDFS-6509) distcp vs Data At Rest Encryption

2014-07-17 Thread Charles Lamb (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-6509 started by Charles Lamb.

 distcp vs Data At Rest Encryption
 -

 Key: HDFS-6509
 URL: https://issues.apache.org/jira/browse/HDFS-6509
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: security
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-6509distcpandDataatRestEncryption-2.pdf, 
 HDFS-6509distcpandDataatRestEncryption.pdf


 distcp needs to work with Data At Rest Encryption



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6700) BlockPlacementPolicy shoud choose storage but not datanode for deletion


[ 
https://issues.apache.org/jira/browse/HDFS-6700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064958#comment-14064958
 ] 

Hadoop QA commented on HDFS-6700:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656265/h6700_20140717.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  
org.apache.hadoop.hdfs.server.blockmanagement.TestReplicationPolicyWithNodeGroup
  org.apache.hadoop.TestGenericRefresh
  
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS
  org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA
  
org.apache.hadoop.hdfs.server.blockmanagement.TestReplicationPolicy

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7370//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7370//console

This message is automatically generated.

 BlockPlacementPolicy shoud choose storage but not datanode for deletion
 ---

 Key: HDFS-6700
 URL: https://issues.apache.org/jira/browse/HDFS-6700
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h6700_20140717.patch


 HDFS-2832 changed datanode storage model from a single storage, which may 
 correspond to multiple physical storage medias, to a collection of storages 
 with each storage corresponding to a physical storage media.
 BlockPlacementPolicy.chooseReplicaToDelete still chooses replica in term of 
 DatanodeDescriptor but not DatanodeStorageInfo.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-4265) BKJM doesn't take advantage of speculative reads


 [ 
https://issues.apache.org/jira/browse/HDFS-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-4265:
---

Attachment: 004-HDFS-4265.patch

 BKJM doesn't take advantage of speculative reads
 

 Key: HDFS-4265
 URL: https://issues.apache.org/jira/browse/HDFS-4265
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: 2.2.0
Reporter: Ivan Kelly
Assignee: Rakesh R
 Attachments: 001-HDFS-4265.patch, 002-HDFS-4265.patch, 
 003-HDFS-4265.patch, 004-HDFS-4265.patch


 BookKeeperEditLogInputStream reads entry at a time, so it doesn't take 
 advantage of the speculative read mechanism introduced by BOOKKEEPER-336.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-4265) BKJM doesn't take advantage of speculative reads


[ 
https://issues.apache.org/jira/browse/HDFS-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064971#comment-14064971
 ] 

Rakesh R commented on HDFS-4265:


Thanks [~ikelly]. 

I have exposed 'readEntryTimeoutSec' configuration in BKJM inorder to set to a 
high value in BookKeeper client. By default its configured as 5 seconds which 
is same as BookKeeper's 'readEntryTimeoutSec' default value.

 BKJM doesn't take advantage of speculative reads
 

 Key: HDFS-4265
 URL: https://issues.apache.org/jira/browse/HDFS-4265
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: 2.2.0
Reporter: Ivan Kelly
Assignee: Rakesh R
 Attachments: 001-HDFS-4265.patch, 002-HDFS-4265.patch, 
 003-HDFS-4265.patch, 004-HDFS-4265.patch


 BookKeeperEditLogInputStream reads entry at a time, so it doesn't take 
 advantage of the speculative read mechanism introduced by BOOKKEEPER-336.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-4265) BKJM doesn't take advantage of speculative reads

2014-07-17 Thread Ivan Kelly (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064972#comment-14064972
 ] 

Ivan Kelly commented on HDFS-4265:
--

lgtm +1

 BKJM doesn't take advantage of speculative reads
 

 Key: HDFS-4265
 URL: https://issues.apache.org/jira/browse/HDFS-4265
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: 2.2.0
Reporter: Ivan Kelly
Assignee: Rakesh R
 Attachments: 001-HDFS-4265.patch, 002-HDFS-4265.patch, 
 003-HDFS-4265.patch, 004-HDFS-4265.patch


 BookKeeperEditLogInputStream reads entry at a time, so it doesn't take 
 advantage of the speculative read mechanism introduced by BOOKKEEPER-336.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-4266) BKJM: Separate write and ack quorum


[ 
https://issues.apache.org/jira/browse/HDFS-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064990#comment-14064990
 ] 

Rakesh R commented on HDFS-4266:


Yes [~ikelly], you are right. I think setting BookKeeper's 'addEntryTimeoutSec' 
to a higher value will avoid this ensemble reformation and will make the test 
case more reliable. What do you feel?

 BKJM: Separate write and ack quorum
 ---

 Key: HDFS-4266
 URL: https://issues.apache.org/jira/browse/HDFS-4266
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Reporter: Ivan Kelly
Assignee: Rakesh R
 Attachments: 001-HDFS-4266.patch, 002-HDFS-4266.patch


 BOOKKEEPER-208 allows the ack and write quorums to be different sizes to 
 allow writes to be unaffected by any bookie failure. BKJM should be able to 
 take advantage of this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-6611) secure components have the wrong pid in their files


 [ 
https://issues.apache.org/jira/browse/HDFS-6611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-6611.


Resolution: Duplicate

I've provided a workaround in HADOOP-9902.

 secure components have the wrong pid in their files
 ---

 Key: HDFS-6611
 URL: https://issues.apache.org/jira/browse/HDFS-6611
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Reporter: Allen Wittenauer

 When we launch a secure process, we store the jsvc process id in both pid 
 files.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-4265) BKJM doesn't take advantage of speculative reads


[ 
https://issues.apache.org/jira/browse/HDFS-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065000#comment-14065000
 ] 

Hadoop QA commented on HDFS-4265:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656277/004-HDFS-4265.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-hdfs-project/hadoop-hdfs/src/contrib/bkjournal.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7372//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7372//console

This message is automatically generated.

 BKJM doesn't take advantage of speculative reads
 

 Key: HDFS-4265
 URL: https://issues.apache.org/jira/browse/HDFS-4265
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: 2.2.0
Reporter: Ivan Kelly
Assignee: Rakesh R
 Attachments: 001-HDFS-4265.patch, 002-HDFS-4265.patch, 
 003-HDFS-4265.patch, 004-HDFS-4265.patch


 BookKeeperEditLogInputStream reads entry at a time, so it doesn't take 
 advantage of the speculative read mechanism introduced by BOOKKEEPER-336.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-315) Allow simplified versioning for namenode and datanode metadata.


 [ 
https://issues.apache.org/jira/browse/HDFS-315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-315.
---

Resolution: Fixed

I'm going to resolve this since FSimage and buddies have been version-ed for a 
very long time now. :D

 Allow simplified versioning for namenode and datanode metadata.
 ---

 Key: HDFS-315
 URL: https://issues.apache.org/jira/browse/HDFS-315
 Project: Hadoop HDFS
  Issue Type: Improvement
 Environment: All
Reporter: Milind Bhandarkar
Assignee: Sameer Paranjpye
 Attachments: hadoop-224.patch


 Currently namenode has two types of metadata: The FSImage, and FSEdits. 
 FSImage contains information abut Inodes, and FSEdits contains a list of 
 operations that were not saved to FSImage. Datanode currently does not have 
 any metadata, but would have it some day. 
 The file formats used for storing these metadata will evolve over time. It is 
 important for the file-system to be backward compatible. That is, the 
 metadata readers need to be able to identify which version of the file-format 
 we are using, and need to be able to read information therein. As we add 
 information to these metadata, the complexity of the reader increases 
 dramatically.
 I propose a versioning scheme with a major and minor version number, where a 
 different reader class is associated with a major number, and that class 
 interprets the minor number internally. The readers essentially form a chain 
 starting with the latest version. Each version-reader looks at the file and 
 if it does not recognize the version number, passes it to the version reader 
 next to it by calling the parse method, returnng the results of the parse 
 method up the chain (In case of the namenode, the parse result is an array of 
 Inodes.
 This scheme has an advantage that every time a new major version is added, 
 the new reader only needs to know about the reader for its immediately 
 previous version, and every reader needs to know only about which major 
 version numbers it can read.
 The writer is not so versioned, because metadata is always written in the 
 most current version format.
 One more change that is needed for simplified versioning is that the 
 struct-surping of dfs.Block needs to be removed. Block's contents will 
 change in later versions, and older versions should still be able to 
 readFields properly. This is more general than Block of course, and in 
 general only basic datatypes should be used as Writables in DFS metadata.
 For edits, the reader should return opcode, ArrayWritable pairs' array. 
 This will also remove the limitation of two operands for very opcodes, and 
 will be more extensible.
 Even with this new versioning scheme, the last Reader in the reader-chain 
 would recognize current format, thus maintaining full backward compatibility.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6114) Block Scan log rolling will never happen if blocks written continuously leading to huge size of dncp_block_verification.log.curr

2014-07-17 Thread Vinayakumar B (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065071#comment-14065071
 ] 

Vinayakumar B commented on HDFS-6114:
-

Seems like there was a compilation error in this build even before applying the 
patch. Later builds dont have that problem. So triggered the jenkins again.

 Block Scan log rolling will never happen if blocks written continuously 
 leading to huge size of dncp_block_verification.log.curr
 

 Key: HDFS-6114
 URL: https://issues.apache.org/jira/browse/HDFS-6114
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.3.0, 2.4.0
Reporter: Vinayakumar B
Assignee: Vinayakumar B
Priority: Critical
 Attachments: HDFS-6114.patch, HDFS-6114.patch, HDFS-6114.patch, 
 HDFS-6114.patch


 1. {{BlockPoolSliceScanner#scan()}} will not return until all the blocks are 
 scanned. 
 2. If the blocks (with size in several MBs) to datanode are written 
 continuously 
 then one iteration of {{BlockPoolSliceScanner#scan()}} will be continously 
 scanning the blocks
 3. These blocks will be deleted after some time (enough to get block scanned)
 4. As Block Scanning is throttled, So verification of all blocks will take so 
 much time.
 5. Rolling will never happen, so even though the total number of blocks in 
 datanode doesn't increases, entries ( which contains stale entries of deleted 
 blocks) in *dncp_block_verification.log.curr* continuously increases leading 
 to huge size.
 In one of our env, it grown more than 1TB where total number of blocks were 
 only ~45k.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-188) DFS web UI does should have TAIL this block option


 [ 
https://issues.apache.org/jira/browse/HDFS-188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-188.
---

Resolution: Won't Fix

I'm going to close this as Won't Fix.  Tailing a file provides a reasonable way 
to see the bottom of the content.

 DFS web UI does should have TAIL this block option
 

 Key: HDFS-188
 URL: https://issues.apache.org/jira/browse/HDFS-188
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: arkady borkovsky
Assignee: Sameer Paranjpye

 It should be available either in addition or instead TAIL this file as it 
 covers it.
 In case of 1 block files, the two are the same.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-318) Building Hadoop results in a lot of warnings


 [ 
https://issues.apache.org/jira/browse/HDFS-318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-318.
---

Resolution: Incomplete

I'm going to close this out as stale.  A new jira should be re-opened for any 
new warnings.

 Building Hadoop results in a lot of warnings
 

 Key: HDFS-318
 URL: https://issues.apache.org/jira/browse/HDFS-318
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: eric baldeschwieler
 Attachments: example-warnings.patch, fs-unchecked.patch


 We are getting hundreds of warnings right now.  Most of these are a result of 
 our transition to 1.5 and deprecated uses of generics.  We should still fix 
 these, since producing lots of warnings:
 A) Leads to the perception that our code is of low quality
 B) Can mask warnings that come from real issues.
 ---
 I suggest we do two things
 1) Submit a patch or set of patches to clean this up
 2) Change our patch tester to validate that the number of warnings per build 
 did not go up with this patch



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6699) Secure Windows DFS read when client co-located on nodes with data (short-circuit reads)

[
https://issues.apache.org/jira/browse/HDFS-6699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065106#comment-14065106
]

Chris Nauroth commented on HDFS-6699:
-

Thanks for filing this, Remus. At a high level, the approach of moving the
privileged operations into a separate elevated service with a much smaller
attack surface makes sense to me. I'll catch up on YARN-2198 to get more
context on this new service.

Secure Windows DFS read when client co-located on nodes with data
(short-circuit reads)
---

Key: HDFS-6699
URL: https://issues.apache.org/jira/browse/HDFS-6699
Project: Hadoop HDFS
Issue Type: Improvement
Components: datanode, hdfs-client, performance, security
Reporter: Remus Rusanu
Labels: windows

HDFS-347 Introduced secure short-circuit HDFS reads based on linux domain
sockets. Similar capability can be introduced in a secure Windows environment
using
[DuplicateHandle](http://msdn.microsoft.com/en-us/library/windows/desktop/ms724251(v=vs.85).aspx)
Win32 API. When short-circuit is allowed the datanode would open the block
file and then duplicate the handle into the hdfs client process and return to
the process the handle value. The hdfs client can then open a Java stream on
this handle and read the file. This is a secure mechanism, the HDFS acls are
validated by the namenode and the process does not gets direct access to the
file in a controlled manner (eg. read-only). The hdfs client process does not
need to have OS level access privilege to the block file.
A complication arises from the requirement to duplicate the handle in the
hdfs client process. Ordinary processes (as we desire datanode to run) do not
have the required privilege (SeDebugPrivilege). But with introduction of an
elevated service helper for the nodemanager Windows Secure Container Executor
(YARN-2198) we have at our disposal an elevated executor that can do the job
of duplicating the handle. The datanode would communicate with this process
using the same mechanism as the nodemanager, ie. LRPC.
With my proposed implementation the sequence of actions is as follows:
- the hdfs client requests Windows secure shortcircuit of a block in the
data transfer protocol. It passes the block, the token and its own process ID.
- datanode approves short-circuit. It opens the block file and obtains the
handle.
- datanode invokes the elevated privilege service to duplicate the handle
into the hdfs client process. datanode invokes the service LRPC interface
over JNI (LRPC being the Windows de-facto standard for interoperating with a
service). It passes the handle valeu, its own process id and the hdfs client
process id.
- The elevated service duplicates the handle from the datanode process into
the hdfs client proces. It returns the duplicate handle value to the datanode
as output value from the LRPC call
- x 2 for CRC file
- the datanode responds to the short circuit datatransfer protocol request
with a message that contains the duplicate handle value (handles actually, x2
from CRC)
- the hdfs-client creates a Java stream that wraps the handles and reads the
block from this stream (ditto for CRC)
datanode needs to exercise care not to duplicate the same handle to different
clients (including the CRC handles) because a handle abstracts also the file
position and clients would inadvertently move each other file pointer to
chaos results.
TBD a mitigation for process ID reuse (the hdfs client can be terminated
immediately after the block request and a new process could reuse the same
ID) . In theory an attacker could use this as a mechanism to obtain a handle
to a block by killing the hdfs-client at the right moment and swing new
processes until it gets one with the desired ID. I'm not sure is a realistic
threat because the attacker already must have the privilege to kill the hdfs
client process, and having such privilege he could obtain the handle by other
means (eg. debug/inspect hdfs client process).

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-25) The FSDirectory should not have waitForReady


 [ 
https://issues.apache.org/jira/browse/HDFS-25?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-25.
--

Resolution: Incomplete

I'm going to close this out as stale.  I suspect this is no longer an issue.

 The FSDirectory should not have waitForReady
 

 Key: HDFS-25
 URL: https://issues.apache.org/jira/browse/HDFS-25
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: dhruba borthakur

 The Name Node should lock the entire namesystem while the image is being 
 loaded rather than using waitForReady in each method in the FSDirectory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-149) TestDFSUpgrade fails once in a while


 [ 
https://issues.apache.org/jira/browse/HDFS-149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-149.
---

Resolution: Fixed

I'm going to close this out as stale.  I suspect this is no longer an issue.

 TestDFSUpgrade fails once in a while
 

 Key: HDFS-149
 URL: https://issues.apache.org/jira/browse/HDFS-149
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Devaraj Das
 Attachments: TEST-org.apache.hadoop.dfs.TestDFSUpgrade.txt


 Once in a while, the TestDFSUpgrade fails. I am attaching the log file which 
 points to something like there are no datanodes to write blocks to.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6695) Investigate using Java 7's nonblocking file I/O in BlockReaderLocal to implement read timeouts


[ 
https://issues.apache.org/jira/browse/HDFS-6695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065173#comment-14065173
 ] 

Chris Nauroth commented on HDFS-6695:
-

I'm linking this to HADOOP-9590, which was created a while back as a sort of 
umbrella jira to track ideas for migration to improved JDK7 APIs.

 Investigate using Java 7's nonblocking file I/O in BlockReaderLocal to 
 implement read timeouts
 --

 Key: HDFS-6695
 URL: https://issues.apache.org/jira/browse/HDFS-6695
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Colin Patrick McCabe

 In BlockReaderLocal, the read system call could block for a long time if 
 the disk drive is having problems, or there is a huge amount of I/O 
 contention.  This might cause poor latency performance.
 In the remote block readers, we have implemented a read timeout, but we don't 
 have one for the local block reader, since {{FileChannel#read}} doesn't 
 support this.  
 Once we move to JDK 7, we should investigate the {{java.nio.file}} 
 nonblocking file I/O package to see if it could be used to implement read 
 timeouts.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-68) TestDFSUpgrade fails sporadically in nightly and patch builds


 [ 
https://issues.apache.org/jira/browse/HDFS-68?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-68.
--

Resolution: Incomplete

Stale. Closing

 TestDFSUpgrade fails sporadically in nightly and patch builds
 -

 Key: HDFS-68
 URL: https://issues.apache.org/jira/browse/HDFS-68
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Jim Kellerman

 TestDFSUpgrade has failed in nightly build #179 and in patch builds 498, 494, 
 493, 491



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6478) RemoteException can't be retried properly for non-HA scenario

2014-07-17 Thread Jing Zhao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-6478:


   Resolution: Fixed
Fix Version/s: 2.6.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I've committed this to trunk and branch-2. Thanks for the contribution, 
[~mingma]!

 RemoteException can't be retried properly for non-HA scenario
 -

 Key: HDFS-6478
 URL: https://issues.apache.org/jira/browse/HDFS-6478
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
 Fix For: 2.6.0

 Attachments: HDFS-6478-2.patch, HDFS-6478-3.patch, HDFS-6478-4.patch, 
 HDFS-6478.patch


 For HA case, the call stack is DFSClient - RetryInvocationHandler - 
 ClientNamenodeProtocolTranslatorPB - ProtobufRpcEngine. ProtobufRpcEngine. 
 ProtobufRpcEngine throws ServiceException and expects the caller to unwrap 
 it; ClientNamenodeProtocolTranslatorPB is the component that takes care of 
 that.
 {noformat}
 at org.apache.hadoop.ipc.Client.call
 at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke
 at com.sun.proxy.$Proxy26.getFileInfo
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo
 at sun.reflect.GeneratedMethodAccessor24.invoke
 at sun.reflect.DelegatingMethodAccessorImpl.invoke
 at java.lang.reflect.Method.invoke
 at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod
 at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke
 at com.sun.proxy.$Proxy27.getFileInfo
 at org.apache.hadoop.hdfs.DFSClient.getFileInfo
 at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus
 {noformat}
 However, for non-HA case, the call stack is DFSClient - 
 ClientNamenodeProtocolTranslatorPB - RetryInvocationHandler - 
 ProtobufRpcEngine. RetryInvocationHandler gets ServiceException and can't be 
 retried properly.
 {noformat}
 at org.apache.hadoop.ipc.Client.call
 at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke
 at com.sun.proxy.$Proxy9.getListing
 at sun.reflect.NativeMethodAccessorImpl.invoke0
 at sun.reflect.NativeMethodAccessorImpl.invoke
 at sun.reflect.DelegatingMethodAccessorImpl.invoke
 at java.lang.reflect.Method.invoke
 at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod
 at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke
 at com.sun.proxy.$Proxy9.getListing
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing
 at org.apache.hadoop.hdfs.DFSClient.listPaths
 {noformat}
 Perhaps, we can fix it by have NN wrap RetryInvocationHandler around 
 ClientNamenodeProtocolTranslatorPB and other PBs, instead of the current wrap 
 order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6699) Secure Windows DFS read when client co-located on nodes with data (short-circuit reads)

2014-07-17 Thread Arpit Agarwal (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065208#comment-14065208
]

Arpit Agarwal commented on HDFS-6699:
-

Hi Remus, an alternative approach you may have already considered is named file
mapping objects via {{CreateFileMapping}}. By scoping the security descriptor
the DataNode can theoretically restrict access to just the client user.

The advantage is that NameNode and any other processes don't need to act as
brokers.

Secure Windows DFS read when client co-located on nodes with data
(short-circuit reads)
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-242) Improve debugabillity


 [ 
https://issues.apache.org/jira/browse/HDFS-242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-242.
---

Resolution: Incomplete

I'm going to close this as stale.  Many more metrics are now pumped up through 
the RPC channels.  It would probably be better if each missing metric was 
handled as a separate JIRA given the amount of work required to add some of 
these requests.

 Improve debugabillity
 -

 Key: HDFS-242
 URL: https://issues.apache.org/jira/browse/HDFS-242
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Senthil Subramanian

 Is there a plan to provide an option to enable metrics (both system and 
 application level) collection on the JobTracker and NameNode from 
 TaskTrackers and DataNodes respectively? Visualizing this data can help us 
 understand what is happening on the various nodes in the cluster in terms of 
 CPU, Memory, Disk Utilization, Network IO etc and possibly spot issues when 
 cluster performance degrades.
 Any thoughts on this are welcome.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6478) RemoteException can't be retried properly for non-HA scenario


[ 
https://issues.apache.org/jira/browse/HDFS-6478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065212#comment-14065212
 ] 

Hudson commented on HDFS-6478:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5899 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5899/])
HDFS-6478. RemoteException can't be retried properly for non-HA scenario. 
Contributed by Ming Ma. (jing9: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1611410)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/NameNodeProxies.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/NamenodeProtocolTranslatorPB.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileCreation.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestIsMethodSupported.java


 RemoteException can't be retried properly for non-HA scenario
 -

 Key: HDFS-6478
 URL: https://issues.apache.org/jira/browse/HDFS-6478
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
 Fix For: 2.6.0

 Attachments: HDFS-6478-2.patch, HDFS-6478-3.patch, HDFS-6478-4.patch, 
 HDFS-6478.patch


 For HA case, the call stack is DFSClient - RetryInvocationHandler - 
 ClientNamenodeProtocolTranslatorPB - ProtobufRpcEngine. ProtobufRpcEngine. 
 ProtobufRpcEngine throws ServiceException and expects the caller to unwrap 
 it; ClientNamenodeProtocolTranslatorPB is the component that takes care of 
 that.
 {noformat}
 at org.apache.hadoop.ipc.Client.call
 at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke
 at com.sun.proxy.$Proxy26.getFileInfo
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo
 at sun.reflect.GeneratedMethodAccessor24.invoke
 at sun.reflect.DelegatingMethodAccessorImpl.invoke
 at java.lang.reflect.Method.invoke
 at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod
 at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke
 at com.sun.proxy.$Proxy27.getFileInfo
 at org.apache.hadoop.hdfs.DFSClient.getFileInfo
 at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus
 {noformat}
 However, for non-HA case, the call stack is DFSClient - 
 ClientNamenodeProtocolTranslatorPB - RetryInvocationHandler - 
 ProtobufRpcEngine. RetryInvocationHandler gets ServiceException and can't be 
 retried properly.
 {noformat}
 at org.apache.hadoop.ipc.Client.call
 at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke
 at com.sun.proxy.$Proxy9.getListing
 at sun.reflect.NativeMethodAccessorImpl.invoke0
 at sun.reflect.NativeMethodAccessorImpl.invoke
 at sun.reflect.DelegatingMethodAccessorImpl.invoke
 at java.lang.reflect.Method.invoke
 at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod
 at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke
 at com.sun.proxy.$Proxy9.getListing
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing
 at org.apache.hadoop.hdfs.DFSClient.listPaths
 {noformat}
 Perhaps, we can fix it by have NN wrap RetryInvocationHandler around 
 ClientNamenodeProtocolTranslatorPB and other PBs, instead of the current wrap 
 order.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-261) Side classes should be moved to separate files


 [ 
https://issues.apache.org/jira/browse/HDFS-261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-261.
---

Resolution: Incomplete

Given the number of changes in source code layout, the switch to maven, etc, 
I'm going to close this out as stale.

 Side classes should be moved to separate files
 --

 Key: HDFS-261
 URL: https://issues.apache.org/jira/browse/HDFS-261
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: HADOOP-1826-BlockCrcUpgrade.ingnorespace.patch, 
 HADOOP-1826-BlockCrcUpgrade.patch


 The following classes are side classes that aren't in files with the same 
 name. This caused problems last night because ant compiled things in the 
 wrong order and couldn't find one of the relevant classes. I think it would 
 make the code easier to read and understand if you could always find a given 
 class in the expected place.
 {code}
  find src/java -name '*.java' | xargs grep '^class' | sed -e 
  's|\([a-zA-Z0-9/]*/\)\([^/.]*\)[.]java:class \([^ ]*\).*|\1 \2 \3|' | awk 
  '{if ($2 != $3) print $1$2.java,$3}'
 src/java/org/apache/hadoop/mapred/BasicTypeSorterBase.java 
 MRSortResultIterator
 src/java/org/apache/hadoop/dfs/BlockCommand.java DatanodeCommand
 src/java/org/apache/hadoop/dfs/Storage.java StorageInfo
 src/java/org/apache/hadoop/dfs/BlockCrcUpgrade.java BlockCrcInfo
 src/java/org/apache/hadoop/dfs/BlockCrcUpgrade.java DNBlockUpgradeInfo
 src/java/org/apache/hadoop/dfs/BlockCrcUpgrade.java BlockCrcUpgradeUtils
 src/java/org/apache/hadoop/dfs/BlockCrcUpgrade.java 
 BlockCrcUpgradeObjectDatanode
 src/java/org/apache/hadoop/dfs/BlockCrcUpgrade.java 
 BlockCrcUpgradeObjectNamenode
 src/java/org/apache/hadoop/dfs/INode.java INodeDirectory
 src/java/org/apache/hadoop/dfs/INode.java INodeFile
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-180) namenode -format does not reset version


 [ 
https://issues.apache.org/jira/browse/HDFS-180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-180.
---

Resolution: Fixed

Fairly confident this has been fixed.

 namenode -format does not reset version
 ---

 Key: HDFS-180
 URL: https://issues.apache.org/jira/browse/HDFS-180
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Owen O'Malley

 If I have a name node with layout version -8 and do bin/hadoop namenode 
 -format it does not reset the version to -7.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-295) Update the how to configure HDFS documentation to include new features


 [ 
https://issues.apache.org/jira/browse/HDFS-295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-295.
---

Resolution: Fixed

... and in 7 years, no one has been able to configure this stuff. ;)



 Update the how to configure HDFS documentation to include new features
 

 Key: HDFS-295
 URL: https://issues.apache.org/jira/browse/HDFS-295
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: dhruba borthakur

 There has been lots of cases where HDFS administrators have enquired about 
 setup and configuration of HDFS. A recent question asked about configuring 
 Datanodes to understand the rack that it belongs to. There is a wiki page 
 that is out-of-date:
 http://wiki.apache.org/lucene-hadoop/HowToConfigure
 Some new things that come to mind are:
 1. rack location for Datanodes.
 2. Trash configuration
 3. SecondaryNamenode config parameters



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-321) All FileSystem implementations should return Paths fully-qualified with scheme and host


[ 
https://issues.apache.org/jira/browse/HDFS-321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065236#comment-14065236
 ] 

Allen Wittenauer commented on HDFS-321:
---

Does that work still need to happen?

 All FileSystem implementations should return Paths fully-qualified with 
 scheme and host
 ---

 Key: HDFS-321
 URL: https://issues.apache.org/jira/browse/HDFS-321
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Tom White

 This change should include a change to FileSystem's javadoc, changes to all 
 FileSystem implementations to make sure they conform, and some unit tests.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6699) Secure Windows DFS read when client co-located on nodes with data (short-circuit reads)

[
https://issues.apache.org/jira/browse/HDFS-6699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065255#comment-14065255
]

Chris Nauroth commented on HDFS-6699:
-

Thanks for pointing that out, Arpit. From my quick reading on named file
mappings, it looks viable to me. Certainly the deployment model is simpler if
there isn't a dependency on a separate service. Remus, what are your thoughts?

Secure Windows DFS read when client co-located on nodes with data
(short-circuit reads)
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6699) Secure Windows DFS read when client co-located on nodes with data (short-circuit reads)

[
https://issues.apache.org/jira/browse/HDFS-6699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065269#comment-14065269
]

Remus Rusanu commented on HDFS-6699:

[~arpitagarwal] I didn't consider mmap and friends, frankly. Tough I think one
would still need a 'broker' because the DN and the hdfs client process would
not share the session so the named mapped file would have to be Global\, which
requires `SeCreateGlobalPrivilege` privilege. If the elevated service 'broker'
is still required, is down to file handle (and InputStream) semantics vs.
memory (and ByteBuffer) semantics. I'm not saying 'no', but I'm not yet
convinced is a better idea.

Secure Windows DFS read when client co-located on nodes with data
(short-circuit reads)
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6699) Secure Windows DFS read when client co-located on nodes with data (short-circuit reads)

[
https://issues.apache.org/jira/browse/HDFS-6699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065283#comment-14065283
]

Remus Rusanu commented on HDFS-6699:

Also if the hdfs client ever impersonates (arbitrary user code, so is free to
do so) securing the named mapped file is more complex (eg. the hdf-client would
need to send the current token SID as part of the block request and the DN
would create a security descriptor granting access to that SID).

To be clear, I absolutely agree that eliminating the privileged 'broker'
requirement is a worthy goal, I'm not convinced we're there yet though.

Secure Windows DFS read when client co-located on nodes with data
(short-circuit reads)
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-390) Hadoop daemons should support generic command-line options by implementing the Tool interface


 [ 
https://issues.apache.org/jira/browse/HDFS-390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-390.
---

Resolution: Fixed

This was done a long time ago.

 Hadoop daemons should support generic command-line options by implementing 
 the Tool interface
 -

 Key: HDFS-390
 URL: https://issues.apache.org/jira/browse/HDFS-390
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Arun C Murthy
Assignee: Arun C Murthy
Priority: Critical

 Hadoop daemons (NN/DN/JT/TT) should support generic command-line options 
 (i.e. -nn / -jt/ -conf / -D) by implementing the Tool interface.
 This is particularly useful for cases where the masters(NN/JT) are to be 
 configured dynamically e.g. via HoD.
 (I suspect we will need to possibly tweak some the the hadoop scripts too, 
 possibly.)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6695) Investigate using Java 7's nonblocking file I/O in BlockReaderLocal to implement read timeouts

2014-07-17 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065288#comment-14065288
]

Colin Patrick McCabe commented on HDFS-6695:

bq. I did not look through AsynchronousfileChannel impl, but per the above link
saying: This class also defines read and write methods that initiate
asynchronous operations, returning a Future to represent the pending result of
the operation. The Future may be used to check if the operation has completed,
wait for its completion, and retrieve the result. , seems does the same idea
like my pseudo code...

Yes, we could definitely set up a thread pool ourselves, but we don't want to
do that because it would be slow. Passing large amounts of data between
threads can mean moving it between CPU cores, which really kills performance.
It also introduces latency from the context switches. I was hoping that Java7
had support for the kernel's native AIO interface... I guess we'll have to see.

We might be able to do something interesting by having a thread pool where
certain threads were locked to certain physical CPU cores. Of course, that
requires JNI to pull off, at the moment, and the overheads of that might wipe
out any gain...

Investigate using Java 7's nonblocking file I/O in BlockReaderLocal to
implement read timeouts
--

Key: HDFS-6695
URL: https://issues.apache.org/jira/browse/HDFS-6695
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: Colin Patrick McCabe

In BlockReaderLocal, the read system call could block for a long time if
the disk drive is having problems, or there is a huge amount of I/O
contention. This might cause poor latency performance.
In the remote block readers, we have implemented a read timeout, but we don't
have one for the local block reader, since {{FileChannel#read}} doesn't
support this.
Once we move to JDK 7, we should investigate the {{java.nio.file}}
nonblocking file I/O package to see if it could be used to implement read
timeouts.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-354) Data node process consumes 180% cpu

[
https://issues.apache.org/jira/browse/HDFS-354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Allen Wittenauer resolved HDFS-354.
---

Resolution: Fixed

I'm going to close this out as stale. There has been a lot of reworking of the
DN process so it isn't clear if this is still an issue. I'm going to go with
no.

Data node process consumes 180% cpu

Key: HDFS-354
URL: https://issues.apache.org/jira/browse/HDFS-354
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: Runping Qi
Assignee: Chris Douglas

I did a test on DFS read throughput and found that the data node
process consumes up to 180% cpu when it is under heavi load. Here are the
details:
The cluster has 380+ machines, each with 3GB mem and 4 cpus and 4 disks.
I copied a 10GB file to dfs from one machine with a data node running there.
Based on the dfs block placement policy, that machine has one replica for
each block of the file.
then I run 4 of the following commands in parellel:
hadoop dfs -cat thefile /dev/null
Since all the blocks have a local replica, all the read requests went to the
local data node.
I observed that:
The data node process's cpu usage was around 180% for most of the time .
The clients's cpu usage was moderate (as it should be).
All the four disks were working concurrently with comparable read
throughput.
The total read throughput was maxed at 90MB/Sec, about 60% of the
expected total
aggregated max read throughput of 4 disks (160MB/Sec). Thus disks were
not a bottleneck
in this case.
The data node's cpu usage seems unreasonably high.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6698) try to optimize DFSInputStream.getFileLength()

2014-07-17 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065300#comment-14065300
 ] 

stack commented on HDFS-6698:
-

Makes sense [~xieliang007] You see this as prob in prod?

  public synchronized long getFileLength() {
return locatedBlocks == null? 0:
locatedBlocks.getFileLength() + lastBlockBeingWrittenLength;
  }

The last block length does not change post construction of the FSDIS. Maybe it 
will when 'tail' starts to work but for now it looks fixed after open. Block 
locations may change during life of stream but length of other-than-last block 
should not change (would be a problem if it did -- could check each time 
located blocks changed?).  Could we not have the length be a final data member 
rather than do a calculation inside a synchronized block each time?

Or maybe easier, change getFileLength to do something like as follows:

  public long getFileLength() {
if (!this.locatedBlocks.locatedBlocks.isUnderConstruction()  
this.locatedBlocks.isLastBlockComplete()) {
  return cachedFileLength;
}
cachedFileLength = calculateFileLength();
return cachedFileLength;
  }

  private synchronized long calculateFileLength() {
return locatedBlocks == null? 0:
locatedBlocks.getFileLength() + lastBlockBeingWrittenLength;
  }

 try to optimize DFSInputStream.getFileLength()
 --

 Key: HDFS-6698
 URL: https://issues.apache.org/jira/browse/HDFS-6698
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 3.0.0
Reporter: Liang Xie
Assignee: Liang Xie

 HBase prefers to invoke read() serving scan request, and invoke pread() 
 serving get reqeust. Because pread() almost holds no lock.
 Let's image there's a read() running, because the definition is:
 {code}
 public synchronized int read
 {code}
 so no other read() request could run concurrently, this is known, but pread() 
 also could not run...  because:
 {code}
   public int read(long position, byte[] buffer, int offset, int length)
 throws IOException {
 // sanity checks
 dfsClient.checkOpen();
 if (closed) {
   throw new IOException(Stream closed);
 }
 failures = 0;
 long filelen = getFileLength();
 {code}
 the getFileLength() also needs lock.  so we need to figure out a no lock impl 
 for getFileLength() before HBase multi stream feature done. 
 [~saint@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-270) DFS Upgrade should process dfs.data.dirs in parallel


[ 
https://issues.apache.org/jira/browse/HDFS-270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065304#comment-14065304
 ] 

Allen Wittenauer commented on HDFS-270:
---

Is this effectively close-able now?

 DFS Upgrade should process dfs.data.dirs in parallel
 

 Key: HDFS-270
 URL: https://issues.apache.org/jira/browse/HDFS-270
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 0.20.2
Reporter: Stu Hood
Assignee: Hairong Kuang

 I just upgraded from 0.14.2 to 0.15.0, and things went very smoothly, if a 
 little slowly.
 The main reason the upgrade took so long was the block upgrades on the 
 datanodes. Each of our datanodes has 3 drives listed for the dfs.data.dir 
 parameter. From looking at the logs, it is fairly clear that the upgrade 
 procedure does not attempt to upgrade all listed dfs.data.dir's in parallel.
 I think even if all of your dfs.data.dir's are on the same physical device, 
 there would still be an advantage to performing the upgrade process in 
 parallel. The less downtime, the better: especially if it is potentially 20 
 minutes versus 60 minutes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-170) TestCrcCorruption sometimes fails


 [ 
https://issues.apache.org/jira/browse/HDFS-170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-170.
---

Resolution: Fixed

I'm going to close this as stale.  

 TestCrcCorruption sometimes fails
 -

 Key: HDFS-170
 URL: https://issues.apache.org/jira/browse/HDFS-170
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Doug Cutting
 Attachments: TEST-org.apache.hadoop.dfs.TestCrcCorruption.txt


 In current trunk, I'm seeing TestCrcCorruption fail about one in four times.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-150) Replication should be decoupled from heartbeat


 [ 
https://issues.apache.org/jira/browse/HDFS-150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-150.
---

Resolution: Fixed

Closing as fixed then!

 Replication should be decoupled from heartbeat
 --

 Key: HDFS-150
 URL: https://issues.apache.org/jira/browse/HDFS-150
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: Hadoop 80 node cluster
Reporter: Srikanth Kakani

 I did a simple experiment of shooting down one node in the cluster and 
 measure the time taken to replicate the under-replicated blocks.
 ~3 blocks were under replicated == ~400 / node  should take 200 minutes 
 to replicate completely given 1 minute heartbeat interval.
 My findings: it took around 220 minutes, which is reasonable.
 Bug: Replication is coupled with heartbeat. Heartbeat interval is based on 
 how much a namenode can handle. Repliaction should be based on how much a 
 datanode can handle.
 So given the default heartbeat interval of 20 seconds, we computed datanodes 
 can handle 2 replications in that interval based on which Namenodes give 2 
 blocks per heartbeat to replicate.
 What we propose is to keep the 20second/2blocks constant and hence a datanode 
 coming in with a heartbeat of 1 minute interval should be given 6 blocks to 
 replicate per heartbeat. In this case instead on taking 200 minutes it should 
 take 200/3 ~1 hour to replicate the entire node.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-270) DFS Upgrade should process dfs.data.dirs in parallel

[
https://issues.apache.org/jira/browse/HDFS-270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris Nauroth resolved HDFS-270.

Resolution: Won't Fix

I agree with resolving this. I'm resolving this as won't fix, but others can
feel free to reopen if anyone thinks something is still missing.

If we look at rolling upgrades, probably the closest analogous thing is
finalization of the upgrade, when we delete the trash block files that users
deleted before the upgrade was finalized. I just reviewed the code for this,
and we do the delete in a separate thread per volume.

DFS Upgrade should process dfs.data.dirs in parallel

Key: HDFS-270
URL: https://issues.apache.org/jira/browse/HDFS-270
Project: Hadoop HDFS
Issue Type: Sub-task
Components: datanode
Affects Versions: 0.20.2
Reporter: Stu Hood
Assignee: Hairong Kuang

I just upgraded from 0.14.2 to 0.15.0, and things went very smoothly, if a
little slowly.
The main reason the upgrade took so long was the block upgrades on the
datanodes. Each of our datanodes has 3 drives listed for the dfs.data.dir
parameter. From looking at the logs, it is fairly clear that the upgrade
procedure does not attempt to upgrade all listed dfs.data.dir's in parallel.
I think even if all of your dfs.data.dir's are on the same physical device,
there would still be an advantage to performing the upgrade process in
parallel. The less downtime, the better: especially if it is potentially 20
minutes versus 60 minutes.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-281) Explore usage of the sendfile api via java.nio.channels.FileChannel.transfer{To|From} for i/o in datanodes

[
https://issues.apache.org/jira/browse/HDFS-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Allen Wittenauer resolved HDFS-281.
---

Resolution: Won't Fix

I'm going to close this as a duplicate of HDFS-2246. While the path chosen was
different, the end result was essentially the same.

Explore usage of the sendfile api via
java.nio.channels.FileChannel.transfer{To|From} for i/o in datanodes
--

Key: HDFS-281
URL: https://issues.apache.org/jira/browse/HDFS-281
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: Arun C Murthy

We could potentially gain a lot of performance by using the *sendfile* system
call:
$ man sendfile
{noformat}
DESCRIPTION
This call copies data between one file descriptor and another.
Either or both of these file descriptors may refer to a socket (but see
below).
in_fd should be a file descriptor opened for reading and out_fd should
be a descriptor opened for writing. offset is a pointer to a variable
holding the input file pointer position from which sendfile() will
start reading data. When sendfile() returns, this variable will be set to the
offset of the byte following the last byte that was read. count is
the number of bytes to copy between file descriptors.
Because this copying is done within the kernel, sendfile() does not
need to spend time transferring data to and from user space.
{noformat}
The nio package offers this via the
java.nio.channels.FileChannel.transfer{To|From} apis:
http://java.sun.com/j2se/1.5.0/docs/api/java/nio/channels/FileChannel.html#transferFrom(java.nio.channels.ReadableByteChannel,%20long,%20long)
http://java.sun.com/j2se/1.5.0/docs/api/java/nio/channels/FileChannel.html#transferTo(long,%20long,%20java.nio.channels.WritableByteChannel)
From the javadocs:
{noformat}
This method is potentially much more efficient than a simple loop that
reads from this channel and writes to the target channel. Many operating
systems can transfer bytes directly from the filesystem cache to the target
channel without actually copying them.
{noformat}

Hence, this could well-worth exploring for doing io at the datanodes...

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6616) bestNode shouldn't always return the first DataNode


[ 
https://issues.apache.org/jira/browse/HDFS-6616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065368#comment-14065368
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6616:
---

Patch looks good.  Some comments:
- Let's set ExcludeDatanodesParam.NAME to excludedatanodes.
- We should also change WebHdfsFileSystem to use the exclude datanode feature 
so that it will retry will different datanodes.
- Please take a look the test failures.  I think they should not be related but 
why they failed?

 bestNode shouldn't always return the first DataNode
 ---

 Key: HDFS-6616
 URL: https://issues.apache.org/jira/browse/HDFS-6616
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
Priority: Minor
 Attachments: HDFS-6616.1.patch, HDFS-6616.patch


 When we are doing distcp between clusters, job failed:
 014-06-30 20:56:28,430 INFO org.apache.hadoop.tools.DistCp: FAIL 
 part-r-00101.avro : java.net.NoRouteToHostException: No route to host
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
 sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1491)
   at java.security.AccessController.doPrivileged(Native Method)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1485)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1139)
   at 
 java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379)
   at org.apache.hadoop.hdfs.HftpFileSystem.open(HftpFileSystem.java:322)
   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
   at org.apache.hadoop.tools.DistCp$CopyFilesMapper.copy(DistCp.java:419)
   at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:547)
   at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:314)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:365)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 The root reason is one of the DataNode can't access from outside, but inside 
 cluster, it's health.
 In NamenodeWebHdfsMethods.java:bestNode, it always return the first DataNode, 
 so even after the distcp retries, it still failed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6616) bestNode shouldn't always return the first DataNode

2014-07-17 Thread Jing Zhao (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065382#comment-14065382
 ] 

Jing Zhao commented on HDFS-6616:
-

bq. We should also change WebHdfsFileSystem to use the exclude datanode feature 
so that it will retry will different datanodes.

Yeah, I have the same comment. And the excludedatanodes should only be used by 
the internal retry logic in webhdfsfilesystem instead of an external API.

 bestNode shouldn't always return the first DataNode
 ---

 Key: HDFS-6616
 URL: https://issues.apache.org/jira/browse/HDFS-6616
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
Priority: Minor
 Attachments: HDFS-6616.1.patch, HDFS-6616.patch


 When we are doing distcp between clusters, job failed:
 014-06-30 20:56:28,430 INFO org.apache.hadoop.tools.DistCp: FAIL 
 part-r-00101.avro : java.net.NoRouteToHostException: No route to host
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
 sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1491)
   at java.security.AccessController.doPrivileged(Native Method)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1485)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1139)
   at 
 java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379)
   at org.apache.hadoop.hdfs.HftpFileSystem.open(HftpFileSystem.java:322)
   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
   at org.apache.hadoop.tools.DistCp$CopyFilesMapper.copy(DistCp.java:419)
   at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:547)
   at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:314)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:365)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 The root reason is one of the DataNode can't access from outside, but inside 
 cluster, it's health.
 In NamenodeWebHdfsMethods.java:bestNode, it always return the first DataNode, 
 so even after the distcp retries, it still failed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-310) Validate configuration parameters


[ 
https://issues.apache.org/jira/browse/HDFS-310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065384#comment-14065384
 ] 

Allen Wittenauer commented on HDFS-310:
---

Ping!

Since this discussion, there have been some changes in how well Hadoop 
processes configuration entries.  I don't think this is close-able, but I do 
think it should be re-opened for discussion.

I've got a simple question:  would it go a long way to have each config entry 
labeled with a simple type and if it null was option?  I'm thinking 
specifically of a day when we have Hadoop configuration coming from something 
like LDAP

 Validate configuration parameters
 -

 Key: HDFS-310
 URL: https://issues.apache.org/jira/browse/HDFS-310
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Robert Chansler

 Configuration parameters should be fully validated before name nodes or data 
 nodes begin service.
 Required parameters must be present.
 Required and optional parameters must have values of proper type and range.
 Undefined parameters must not be present.
 (I was recently observing some confusion whose root cause was a mis-spelled 
 parameter.)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-2256) we should add a wait for non-safe mode and call dfsadmin -report in start-dfs


[ 
https://issues.apache.org/jira/browse/HDFS-2256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065385#comment-14065385
 ] 

Allen Wittenauer commented on HDFS-2256:


Is start-dfs.sh the right place for this though?  

 we should add a wait for non-safe mode and call dfsadmin -report in start-dfs
 -

 Key: HDFS-2256
 URL: https://issues.apache.org/jira/browse/HDFS-2256
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: scripts
Reporter: Owen O'Malley
Assignee: Owen O'Malley

 I think we should add a call to wait for safe mode exit and print the dfs 
 report to show upgrades that are in progress.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6114) Block Scan log rolling will never happen if blocks written continuously leading to huge size of dncp_block_verification.log.curr


[ 
https://issues.apache.org/jira/browse/HDFS-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065388#comment-14065388
 ] 

Hadoop QA commented on HDFS-6114:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656200/HDFS-6114.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7373//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7373//console

This message is automatically generated.

 Block Scan log rolling will never happen if blocks written continuously 
 leading to huge size of dncp_block_verification.log.curr
 

 Key: HDFS-6114
 URL: https://issues.apache.org/jira/browse/HDFS-6114
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.3.0, 2.4.0
Reporter: Vinayakumar B
Assignee: Vinayakumar B
Priority: Critical
 Attachments: HDFS-6114.patch, HDFS-6114.patch, HDFS-6114.patch, 
 HDFS-6114.patch


 1. {{BlockPoolSliceScanner#scan()}} will not return until all the blocks are 
 scanned. 
 2. If the blocks (with size in several MBs) to datanode are written 
 continuously 
 then one iteration of {{BlockPoolSliceScanner#scan()}} will be continously 
 scanning the blocks
 3. These blocks will be deleted after some time (enough to get block scanned)
 4. As Block Scanning is throttled, So verification of all blocks will take so 
 much time.
 5. Rolling will never happen, so even though the total number of blocks in 
 datanode doesn't increases, entries ( which contains stale entries of deleted 
 blocks) in *dncp_block_verification.log.curr* continuously increases leading 
 to huge size.
 In one of our env, it grown more than 1TB where total number of blocks were 
 only ~45k.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6700) BlockPlacementPolicy shoud choose storage but not datanode for deletion


 [ 
https://issues.apache.org/jira/browse/HDFS-6700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-6700:
--

Attachment: h6700_20140717b.patch

h6700_20140717b.patch: fixes TestReplicationPolicy and 
TestReplicationPolicyWithNodeGroup.

The other test failures seem not related.

 BlockPlacementPolicy shoud choose storage but not datanode for deletion
 ---

 Key: HDFS-6700
 URL: https://issues.apache.org/jira/browse/HDFS-6700
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h6700_20140717.patch, h6700_20140717b.patch


 HDFS-2832 changed datanode storage model from a single storage, which may 
 correspond to multiple physical storage medias, to a collection of storages 
 with each storage corresponding to a physical storage media.
 BlockPlacementPolicy.chooseReplicaToDelete still chooses replica in term of 
 DatanodeDescriptor but not DatanodeStorageInfo.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-394) NameNode shoud give the DataNodes a parameter that specifies the backoff time for initial block reports


 [ 
https://issues.apache.org/jira/browse/HDFS-394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-394:
--

Summary: NameNode shoud give the DataNodes a parameter that specifies the 
backoff time for initial block reports  (was: NameNnode shoud give the 
Datanodes a parameter that specifies the backoff time for initial block reports)

 NameNode shoud give the DataNodes a parameter that specifies the backoff time 
 for initial block reports
 ---

 Key: HDFS-394
 URL: https://issues.apache.org/jira/browse/HDFS-394
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Sanjay Radia

 HADOOP-2326 adds a random backoff for the initial block reports.  This 
 however is static config parameter.
 It would be useful to have the NN tell the DN how much to backoff (i.e. 
 rather than a single configuration parameter for the backoff). This would 
 allow the system to adjust automatically to cluster size - smaller clusters 
 will startup faster than larger clusters.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6693) TestDFSAdminWithHA fails on windows

2014-07-17 Thread Vinayakumar B (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinayakumar B updated HDFS-6693:


   Resolution: Fixed
Fix Version/s: 2.6.0
   Status: Resolved  (was: Patch Available)

Thanks Everyone for the review. 
I have committed this to trunk and branch-2

 TestDFSAdminWithHA fails on windows
 ---

 Key: HDFS-6693
 URL: https://issues.apache.org/jira/browse/HDFS-6693
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test, tools
Reporter: Vinayakumar B
Assignee: Vinayakumar B
 Fix For: 2.6.0

 Attachments: HDFS-6693.patch


 TestDFSAdminWithHA fails on windows due to multiple reasons.
 1. Assertion fails due to using only ''\n in expected, where as in windows 
 line separator is \r\n. 
 2. miniDFSCluster is not shutdown after each test.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-163) HDFS shell commands not as expected


 [ 
https://issues.apache.org/jira/browse/HDFS-163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-163:
--

Labels: newbie  (was: )

 HDFS shell commands not as expected
 ---

 Key: HDFS-163
 URL: https://issues.apache.org/jira/browse/HDFS-163
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Robert Chansler
Assignee: Mahadev konar
  Labels: newbie

 Y! 1667824
 shell commands
 (a) -help
 * the usage line is wrong as it suggests that multiple commands  
 (options) can be put into one command line. it is not clear which  
 options can be comb ine with which.
 * it does not explain what a path is -- it is supposed to be a  
 URI -- what is the syntax?
 * it does not explain how to specify the file system URI  -- it  
 should suggest where to look to get this info.
 (b) -du, -ls -- other commands
 * breaks a file name contains a :  -- even if this not a file  
 in the command, but on in a directory the command reached
 (c) -get -getmerge does not work -- the arguments are taken, but the  
 output file is not created
 (d) -setrep -- lies about having set the replication factor  (for  
 local it should say it can do nothing)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-163) HDFS shell commands not as expected


[ 
https://issues.apache.org/jira/browse/HDFS-163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065404#comment-14065404
 ] 

Allen Wittenauer commented on HDFS-163:
---

I suspect most of these have been fixed, but someone should verify.

 HDFS shell commands not as expected
 ---

 Key: HDFS-163
 URL: https://issues.apache.org/jira/browse/HDFS-163
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Robert Chansler
Assignee: Mahadev konar
  Labels: newbie

 Y! 1667824
 shell commands
 (a) -help
 * the usage line is wrong as it suggests that multiple commands  
 (options) can be put into one command line. it is not clear which  
 options can be comb ine with which.
 * it does not explain what a path is -- it is supposed to be a  
 URI -- what is the syntax?
 * it does not explain how to specify the file system URI  -- it  
 should suggest where to look to get this info.
 (b) -du, -ls -- other commands
 * breaks a file name contains a :  -- even if this not a file  
 in the command, but on in a directory the command reached
 (c) -get -getmerge does not work -- the arguments are taken, but the  
 output file is not created
 (d) -setrep -- lies about having set the replication factor  (for  
 local it should say it can do nothing)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6693) TestDFSAdminWithHA fails on windows


[ 
https://issues.apache.org/jira/browse/HDFS-6693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065406#comment-14065406
 ] 

Hudson commented on HDFS-6693:
--

FAILURE: Integrated in Hadoop-trunk-Commit #5903 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5903/])
HDFS-6693. TestDFSAdminWithHA fails on windows ( Contributed by Vinayakumar B ) 
(vinayakumarb: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1611441)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSAdminWithHA.java


 TestDFSAdminWithHA fails on windows
 ---

 Key: HDFS-6693
 URL: https://issues.apache.org/jira/browse/HDFS-6693
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test, tools
Reporter: Vinayakumar B
Assignee: Vinayakumar B
 Fix For: 2.6.0

 Attachments: HDFS-6693.patch


 TestDFSAdminWithHA fails on windows due to multiple reasons.
 1. Assertion fails due to using only ''\n in expected, where as in windows 
 line separator is \r\n. 
 2. miniDFSCluster is not shutdown after each test.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-383) Modify datanode configs to specify minimum JVM heapsize


 [ 
https://issues.apache.org/jira/browse/HDFS-383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-383.
---

Resolution: Fixed

This was fixed with the introduction of HADOOP_DATANODE_OPTS and related env 
vars.  See also HADOOP-9902.

 Modify datanode configs to specify minimum JVM heapsize
 ---

 Key: HDFS-383
 URL: https://issues.apache.org/jira/browse/HDFS-383
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Robert Chansler

 Y! 1524346
 Currently the Hadoop DataNodes are running with the option -Xmx1000m. They
 should and/or be running with the option -Xms1000m (if 1000m is correct; it
 seems high?)
 This turns out to be a sticky request. The place where Hadoop DFS is getting
 the definition of how to define that 1000m is the hadoop-env file. Read the
 code from bin/hadoop, which is used to start all hadoop processes:
 ) JAVA_HEAP_MAX=-Xmx1000m 
 ) 
 ) # check envvars which might override default args
 ) if [ $HADOOP_HEAPSIZE !=  ]; then
 )   #echo run with heapsize $HADOOP_HEAPSIZE
 )   JAVA_HEAP_MAX=-Xmx$HADOOP_HEAPSIZEm
 )   #echo $JAVA_HEAP_MAX
 ) fi
 And here's the entry from hadoop-env.sh:
 ) # The maximum amount of heap to use, in MB. Default is 1000.
 ) export HADOOP_HEAPSIZE=1000
 The problem is that I believe we want to specify -Xms for datanodes ONLY. But
 the same script is used to start datanodes, tasktrackers, etc. This isn't
 trivially a matter of distributing different config files, the options 
 provided
 are coded into the bin/hadoop executable. So this is an enhancement request.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-157) dfs client -ls/-lsr outofmemory when one directory contained 2 million files.


[ 
https://issues.apache.org/jira/browse/HDFS-157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065417#comment-14065417
 ] 

Allen Wittenauer commented on HDFS-157:
---

Is this still an issue?  One fix is to use HADOOP_CLIENT_OPTS to increase the 
heap, but it is obviously desirable to have ls do something smarter.  I'm just 
not sure if it is possible.

 dfs client -ls/-lsr outofmemory when one directory contained 2 million files.
 -

 Key: HDFS-157
 URL: https://issues.apache.org/jira/browse/HDFS-157
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Koji Noguchi
Priority: Minor

 Heapsize was set to 1G. 
 It'll be nice if dfs client doesn't require that much memory when listing the 
 directory.
 Exception in thread IPC Client connection to namenode/11.11.11.111: 
 java.lang.OutOfMemoryError: GC overhead limit exceeded
   at java.util.regex.Pattern.compile(Pattern.java:846)
   at java.lang.String.replace(String.java:2208)
   at org.apache.hadoop.fs.Path.normalizePath(Path.java:147)
   at org.apache.hadoop.fs.Path.initialize(Path.java:137)
   at org.apache.hadoop.fs.Path.init(Path.java:126)
   at org.apache.hadoop.dfs.DFSFileInfo.readFields(DFSFileInfo.java:141)
   at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:230)
   at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:166)
   at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:214)
   at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:61)
   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:273)
 Exception in thread main java.lang.OutOfMemoryError: GC overhead limit 
 exceeded
   at java.util.Arrays.copyOfRange(Arrays.java:3209)
   at java.lang.String.init(String.java:216)
   at java.lang.StringBuffer.toString(StringBuffer.java:585)
   at java.net.URI.toString(URI.java:1907)
   at java.net.URI.init(URI.java:732)
   at org.apache.hadoop.fs.Path.initialize(Path.java:137)
   at org.apache.hadoop.fs.Path.init(Path.java:126)
   at org.apache.hadoop.fs.Path.makeQualified(Path.java:296)
   at org.apache.hadoop.dfs.DfsPath.init(DfsPath.java:35)
   at 
 org.apache.hadoop.dfs.DistributedFileSystem.listPaths(DistributedFileSystem.java:181)
   at org.apache.hadoop.fs.FsShell.ls(FsShell.java:405)
   at org.apache.hadoop.fs.FsShell.ls(FsShell.java:423)
   at org.apache.hadoop.fs.FsShell.ls(FsShell.java:423)
   at org.apache.hadoop.fs.FsShell.ls(FsShell.java:423)
   at org.apache.hadoop.fs.FsShell.ls(FsShell.java:399)
   at org.apache.hadoop.fs.FsShell.doall(FsShell.java:1054)
   at org.apache.hadoop.fs.FsShell.run(FsShell.java:1244)
   at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:187)
   at org.apache.hadoop.fs.FsShell.main(FsShell.java:1333)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-356) Short -metasave option


 [ 
https://issues.apache.org/jira/browse/HDFS-356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-356.
---

Resolution: Fixed

I'm going to close out given that a lot/most/all of this information is 
available via metrics2.  

 Short -metasave option 
 ---

 Key: HDFS-356
 URL: https://issues.apache.org/jira/browse/HDFS-356
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Koji Noguchi
Priority: Minor

 In Hadoop-2606 bug, dfsadmin -metasave was quite useful in digging into 
 namenode's state. 
 However, the log it created was over 100MBytes for each call.
 It would be nice to have a lighter version of -metasave that only prints out 
 the total counts. 
 (blocks waiting for replication, being replicated, waiting for deletion, ...)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-32) testFilePermissions in TestDFSShell should shut down the mini dfs cluster when there is an error


 [ 
https://issues.apache.org/jira/browse/HDFS-32?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-32.
--

Resolution: Fixed

Likely a stale issue and fixed long ago.

 testFilePermissions in TestDFSShell should shut down the mini dfs cluster 
 when there is an error
 

 Key: HDFS-32
 URL: https://issues.apache.org/jira/browse/HDFS-32
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Hairong Kuang

 The unit test testFilePermissions only shuts down the cluster when there is 
 no exception thrown. Should do so in a finally statement so the cluster 
 gets shut down when there is an exception.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-6422) getfattr in CLI doesn't throw exception or return non-0 return code when xattr doesn't exist

2014-07-17 Thread Charles Lamb (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-6422:
---

Attachment: HDFS-6422.006.patch

Fixed org.apache.hadoop.hdfs.web.resources.TestParam.testXAttrNameParam failure 
(the other failures were spurious). Also cleaned up one minor unnecessary 
whitespace change that was left in .005. Ready for review.

 getfattr in CLI doesn't throw exception or return non-0 return code when 
 xattr doesn't exist
 

 Key: HDFS-6422
 URL: https://issues.apache.org/jira/browse/HDFS-6422
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.5.0
Reporter: Charles Lamb
Assignee: Charles Lamb
Priority: Blocker
 Attachments: HDFS-6422.005.patch, HDFS-6422.006.patch, 
 HDFS-6422.1.patch, HDFS-6422.2.patch, HDFS-6422.3.patch, HDFS-6474.4.patch


 If you do
 hdfs dfs -getfattr -n user.blah /foo
 and user.blah doesn't exist, the command prints
 # file: /foo
 and a 0 return code.
 It should print an exception and return a non-0 return code instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-122) Namenode should let Datanode decide how to delete blocks.


[ 
https://issues.apache.org/jira/browse/HDFS-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065458#comment-14065458
 ] 

Allen Wittenauer commented on HDFS-122:
---

I suspect this a bit stale and can probably be closed.  Anyone?

 Namenode should let Datanode decide how to delete blocks.
 -

 Key: HDFS-122
 URL: https://issues.apache.org/jira/browse/HDFS-122
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Raghu Angadi

 See HADOOP-2576 and HADOOP-774 for more discussion.
 Namenode throttles the number of blocks it asks Datanode to delete. It does 
 this because it knows that Datanode deletes these blocks in the same thread 
 that heartbeats and does not want that thread to block for long. Managing 
 this is more memory and more code at Namenode.
 I think namenode should just ask Datanode to delete the blocks and Datanode 
 can decide how it deletes them. It would be datanode's responsibility to 
 properly delete the blocks however it sees fit. For e.g. it could delete them 
 in separate thread and not let heartbeats affected by this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-139) Datanode block deletions can get starved.


[ 
https://issues.apache.org/jira/browse/HDFS-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065455#comment-14065455
 ] 

Allen Wittenauer commented on HDFS-139:
---

I suspect we can close this as fixed, but someone should verify.

 Datanode block deletions can get starved.
 -

 Key: HDFS-139
 URL: https://issues.apache.org/jira/browse/HDFS-139
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Raghu Angadi

 See the relevant 
 [comment|https://issues.apache.org/jira/browse/HADOOP-2576?focusedCommentId=12561866#action_12561866]
  in HADOOP-2576.
 Namenode asks a datanode to delete blocks only if datanode is not asked to 
 transfer a block. Namenode should send both the blocks to transfer and to 
 delete. The problem gets worse if configured heartbeat is longer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-54) datanode not failing when read-only filesystem


 [ 
https://issues.apache.org/jira/browse/HDFS-54?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-54.
--

Resolution: Fixed

Fairly confident this has been fixed.

 datanode not failing when read-only filesystem
 --

 Key: HDFS-54
 URL: https://issues.apache.org/jira/browse/HDFS-54
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Koji Noguchi
Priority: Minor

 (This is not directly related to dfs -put hanging, but thought it should get 
 fixed.)
 Datanode is catching IOException but not shutting itself down.
 2008-02-02 00:10:24,237 ERROR org.apache.hadoop.dfs.DataNode: DataXceiver: 
 java.io.IOException: Read-only file system
 at java.io.UnixFileSystem.createFileExclusively(Native Method)
 at java.io.File.createNewFile(File.java:883)
 at 
 org.apache.hadoop.dfs.FSDataset$FSVolume.createTmpFile(FSDataset.java:329)
 at org.apache.hadoop.dfs.FSDataset.createTmpFile(FSDataset.java:606)
 at org.apache.hadoop.dfs.FSDataset.writeToBlock(FSDataset.java:582)
 at 
 org.apache.hadoop.dfs.DataNode$BlockReceiver.init(DataNode.java:1257)
 at 
 org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:901)
 at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:804)
 at java.lang.Thread.run(Thread.java:619)
 2008-02-02 00:16:29,996 INFO org.apache.hadoop.dfs.DataNode: Received block 
 blk_-7723120264171092160 from /11.11.11.11
 2008-02-02 00:41:40,409 INFO org.apache.hadoop.dfs.DataNode: Received block 
 blk_1939877544554342517 from /22.22.22.22
 2008-02-02 00:46:53,925 INFO org.apache.hadoop.dfs.DataNode: Received block 
 blk_-4102605170938551016 from /33.33.33.33



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-247) A tool to plot the locations of the blocks of a directory


[ 
https://issues.apache.org/jira/browse/HDFS-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065505#comment-14065505
 ] 

Allen Wittenauer commented on HDFS-247:
---

It would be neat to be able to do this from OIV.

 A tool to plot the locations of the blocks of a directory
 -

 Key: HDFS-247
 URL: https://issues.apache.org/jira/browse/HDFS-247
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Owen O'Malley
  Labels: newbie

 It would be very useful to have a command that we could give a hdfs directory 
 to, that would use fsck to find the block locations of the data files in that 
 directory and group them by host and display the distribution graphically. We 
 did this by hand and it was very for finding a skewed distribution that was 
 causing performance problems. The tool should also be able to group by rack 
 id and generate a similar plot.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-247) A tool to plot the locations of the blocks of a directory


 [ 
https://issues.apache.org/jira/browse/HDFS-247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated HDFS-247:
--

Labels: newbie  (was: )

 A tool to plot the locations of the blocks of a directory
 -

 Key: HDFS-247
 URL: https://issues.apache.org/jira/browse/HDFS-247
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Owen O'Malley
  Labels: newbie

 It would be very useful to have a command that we could give a hdfs directory 
 to, that would use fsck to find the block locations of the data files in that 
 directory and group them by host and display the distribution graphically. We 
 did this by hand and it was very for finding a skewed distribution that was 
 causing performance problems. The tool should also be able to group by rack 
 id and generate a similar plot.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6699) Secure Windows DFS read when client co-located on nodes with data (short-circuit reads)

2014-07-17 Thread Arpit Agarwal (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065536#comment-14065536
]

Arpit Agarwal commented on HDFS-6699:
-

Locking down the named mapping should be easy and we can write PoC
client/server code to verify that (I'll volunteer to help).

The builtin service accounts like Network Service, Local Service already get
SeCreateGlobalPrivilege by default. I am suggesting we grant
SeCreateGlobalPrivilege to the DataNode service user e.g. hdfs.

Secure Windows DFS read when client co-located on nodes with data
(short-circuit reads)
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-24) FSDataOutputStream should flush last partial CRC chunk


 [ 
https://issues.apache.org/jira/browse/HDFS-24?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-24.
--

Resolution: Fixed

I'd be greatly surprised if this wasn't fixed by now.

 FSDataOutputStream should flush last partial CRC chunk
 --

 Key: HDFS-24
 URL: https://issues.apache.org/jira/browse/HDFS-24
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: dhruba borthakur

 The FSDataOutputSteam.flush() api is supposed to flush all data to the 
 underlying stream. However, for LocalFileSystem, the flush APi does not flush 
 the last partial CRC chunk.
 One solution is described in HADOOP-2657: We should change FSOutputStream to 
 implement Seekable, and have the default implementation of seek throw an 
 IOException, then use this in CheckSumFileSystem to rewind and overwrite the 
 checksum. Then folks will only fail if they attempt to write more data after 
 they've flushed on a ChecksumFileSystem that doesn't support seek. I don't 
 think we will have any filesystems that both extend CheckSumFileSystem and 
 can't support seek. Only LocalFileSystem currently extends 
 CheckSumFileSystem, and it does support seek. So flush() shouldn't ever fail 
 for existing FileSystem's, but seek() will fail for most output streams 
 (probably all except local).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-2364) metaSave API is using Printwriter, It will eat all the IOExceptions.


[ 
https://issues.apache.org/jira/browse/HDFS-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065576#comment-14065576
 ] 

Allen Wittenauer commented on HDFS-2364:


Ping!

 metaSave API is using Printwriter, It will eat all the IOExceptions.
 

 Key: HDFS-2364
 URL: https://issues.apache.org/jira/browse/HDFS-2364
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Uma Maheswara Rao G
Assignee: Uma Maheswara Rao G

 It is important to note that
 PrintStream and PrintWriter do not throw IOExceptions! IOException is a
 non-Runtime exception, which means that your code must catch them or declare 
 it
 can throw them.
 The creators of Java realized that System.out and System.err would be very
 heavily used, and did not want to force inclusion of exception handling every 
 time
 you wanted to write System.out.println(4).
 Therefore, PrintStream and PrintWriter catch their own exceptions and set an
 error flag. If you are using one of these classes for real output in your 
 program
 (not merely using System.out.println()) you should call checkError() to see
 if an error has occurred.
 Because of this behavior, PrintStream and PrintWriter are not well suited for
 use other than System.out and System.err!
 Ref: http://www.cs.usfca.edu/~parrt/doc/java/JavaIO-notes.pdf



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-363) list of dead nodes with time information


 [ 
https://issues.apache.org/jira/browse/HDFS-363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-363.
---

Resolution: Duplicate

Closing as a dupe, then.

 list of dead nodes with time information
 

 Key: HDFS-363
 URL: https://issues.apache.org/jira/browse/HDFS-363
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Christian Kunz

 One of our namenodes ended up with 5% of dead datanodes.
 It occurred to us that it would be good to know when individual datanodes 
 became dead. If the list of dead datanodes on the GUI of the namenode 
 included a sortable column of the  'going-dead' times then it would be easier 
 to see whether there is potential danger of having lost some blocks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-43) Ignoring IOExceptions on close


 [ 
https://issues.apache.org/jira/browse/HDFS-43?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-43.
--

Resolution: Not a Problem

Closing this as not a problem.

 Ignoring IOExceptions on close
 --

 Key: HDFS-43
 URL: https://issues.apache.org/jira/browse/HDFS-43
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Owen O'Malley
Assignee: dhruba borthakur
Priority: Critical
 Attachments: closeStream.patch


 Currently in HDFS there are a lot of calls to IOUtils.closeStream that are 
 from finally blocks. I'm worried that this can lead to data corruption in the 
 file system. Take the first instance in DataNode.copyBlock: it writes the 
 block and then calls closeStream on the output stream. If there is an error 
 at the end of the file that is detected in the close, it will be *completely* 
 ignored. Note that logging the error is not enough, the error should be 
 thrown so that the client knows the failure happened.
 {code}
try {
  file1.write(...);
  file2.write(...);
} finally {
   IOUtils.closeStream(file);
   }
 {code}
 is *bad*. It must be rewritten as:
 {code}
try {
  file1.write(...);
  file2.write(...);
  file1.close(...);
  file2.close(...);
} catch (IOException ie) {
  IOUtils.closeStream(file1);
  IOUtils.closeStream(file2);
  throw ie;
}
 {code}
 I also think that IOUtils.closeStream should be renamed 
 IOUtils.cleanupFailedStream or something to make it clear it can only be used 
 after the write operation has failed and is being cleaned up.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-137) DataNode should clean up temporary files when writeBlock fails.


 [ 
https://issues.apache.org/jira/browse/HDFS-137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-137.
---

Resolution: Fixed

This has likely been fixed by now.

 DataNode should clean up temporary files when writeBlock fails.
 ---

 Key: HDFS-137
 URL: https://issues.apache.org/jira/browse/HDFS-137
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Raghu Angadi

 Once a datanode starts receiving a block and if it fails to complete 
 receiving the block, it leaves the temporary block files in the temp 
 directory. Because of this, same block can not be written to this node for 
 next one hour. 
 DataNode should really delete these files and allow the next attempt.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-374) HDFS needs to support a very large number of open files.

[
https://issues.apache.org/jira/browse/HDFS-374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Allen Wittenauer resolved HDFS-374.
---

Resolution: Fixed

I'm going to resolve this as stale. There is a good chance this issue might
still exist but isn't nearly the concern it once was. If so, please open a new
jira.

HDFS needs to support a very large number of open files.

Key: HDFS-374
URL: https://issues.apache.org/jira/browse/HDFS-374
Project: Hadoop HDFS
Issue Type: Improvement
Reporter: Jim Kellerman

Currently, DFSClient maintains one socket per open file. For most map/reduce
operations, this is not a problem because there just aren't many open files.
However, HBase has a very different usage model in which a single region
region server could have thousands (10**3 but less than 10**4) open files.
This can cause both datanodes and region servers to run out of file handles.
What I would like to see is one connection for each dfsClient, datanode pair.
This would reduce the number of connections to hundreds or tens of sockets.
The intent is not to process requests totally asychronously (overlapping
block reads and forcing the client to reassemble a whole message out of a
bunch of fragments), but rather to queue requests from the client to the
datanode and process them serially, differing from the current implementation
in that rather than use an exclusive socket for each file, only one socket is
in use between the client and a particular datanode.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-306) FSNamesystem.shutdown() should be a part of FSNamesystem.close()


 [ 
https://issues.apache.org/jira/browse/HDFS-306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-306.
---

Resolution: Fixed

I'm sure is this is fixed by now.

 FSNamesystem.shutdown() should be a part of FSNamesystem.close() 
 -

 Key: HDFS-306
 URL: https://issues.apache.org/jira/browse/HDFS-306
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Konstantin Shvachko

 FSNamesystem should either close() or shutdown(), but not both.
 Traditionally we used to have FSNamesystem.close().



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6268) Better sorting in NetworkTopology#pseudoSortByDistance when no local node is found

2014-07-17 Thread Ashwin Shankar (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-6268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065630#comment-14065630
 ] 

Ashwin Shankar commented on HDFS-6268:
--

Hi [~andrew.wang],
After applying your patch in our cluster, we see that all read requests for a 
block were still going to the same rack replica when there is no node local 
replica.This resulted in some containers getting stuck at LOCALIZING phase and 
eventually failing.
Looking at the patch, I see you are setting a seed to the RNG,which is 
basically the blockid,which gives the same
pseudo random order for a block. Hence the same rack replica gets bombarded for 
a block(when there is no nodelocal).
Do you see any problem if we don't have a seed  and randomize rack local nodes 
for a block ?

 Better sorting in NetworkTopology#pseudoSortByDistance when no local node is 
 found
 --

 Key: HDFS-6268
 URL: https://issues.apache.org/jira/browse/HDFS-6268
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.4.0
Reporter: Andrew Wang
Assignee: Andrew Wang
Priority: Minor
 Fix For: 3.0.0

 Attachments: hdfs-6268-1.patch, hdfs-6268-2.patch, hdfs-6268-3.patch, 
 hdfs-6268-4.patch, hdfs-6268-5.patch, hdfs-6268-branch-2.001.patch


 In NetworkTopology#pseudoSortByDistance, if no local node is found, it will 
 always place the first rack local node in the list in front.
 This became an issue when a dataset was loaded from a single datanode. This 
 datanode ended up being the first replica for all the blocks in the dataset. 
 When running an Impala query, the non-local reads when reading past a block 
 boundary were all hitting this node, meaning massive load skew.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-337) Name collision for AccessControlException.


 [ 
https://issues.apache.org/jira/browse/HDFS-337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-337.
---

Resolution: Won't Fix

Closing as won't fix, assuming we haven't already.

 Name collision for AccessControlException.
 --

 Key: HDFS-337
 URL: https://issues.apache.org/jira/browse/HDFS-337
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Konstantin Shvachko

 There is a name collision in 
 org.apache.hadoop.fs.permission.AccessControlException and 
 java.security.AccessControlException.
 Since  java.security.AccessControlException is not an IOException we cannot 
 throw it directly as we do with FileNotFoundException.
 Therefore, the only choice is to rename the hadoop AccessControlException to 
 e.g., PermissionException (or AccessDeniedException).
 To provide compatibility we can inherit PermissionException from 
 AccessControlException, and deprecate the latter.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-296) Serial streaming performance should be Math.min(ideal client performance, ideal serial hdfs performance)


 [ 
https://issues.apache.org/jira/browse/HDFS-296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-296.
---

Resolution: Incomplete

I'm going to close this as stale given how many changes have happened to both 
streaming and HDFS since this was filed.

 Serial streaming performance should be Math.min(ideal client performance, 
 ideal serial hdfs performance)
 

 Key: HDFS-296
 URL: https://issues.apache.org/jira/browse/HDFS-296
 Project: Hadoop HDFS
  Issue Type: Improvement
 Environment: Mac OS X  10.5.2, Java 6
Reporter: Sam Pullara

 I looked at all the code long and hard and this was my analysis (could be 
 wrong, I'm not an expert on this codebase):
 Current Serial HDFS performance = Average Datanode Performance
 Average Datanode Performance = Average Disk Performance (even if you have 
 more than one)
 We should have:
 Ideal Serial HDFS Performance = Sum of Ideal Datanode Performance
 Ideal Datanode Performance = Sum of disk performance
 When you read a single file serially from HDFS there are a number of 
 limitations that come into play:
 1) Blocks on multiple datanodes will be load balanced between them - 
 averaging the performance of the datanodes
 2) Blocks on multiple disks in a single datanode are load balanced between 
 them - averaging the performance of the disks
 I think that all this could be fixed if we actually prefetched fully read 
 blocks on the client until the client can no longer keep up with the data or 
 there is another bottleneck like network bandwidth.
 This seems like a reasonably common use case though not the typical MapReduce 
 case.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (HDFS-6632) Remove incompatibility introduced by HDFS-5321

2014-07-17 Thread Yongjun Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-6632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang reassigned HDFS-6632:
---

Assignee: Yongjun Zhang

 Remove incompatibility introduced by HDFS-5321
 --

 Key: HDFS-6632
 URL: https://issues.apache.org/jira/browse/HDFS-6632
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.1
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang

 Filing this JIRA per [~atm] and [~wheat9]'s discussion in HDFS-5321 (Thanks 
 both of you), to remove the incompatibility introduced by HDFS-5321. The idea 
 is to put dfs.http.port and dfs.https.port configurations back to branch-2.
   



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HDFS-338) When a block is severely under replicated at creation time, a request for block replication should be scheduled immediately


 [ 
https://issues.apache.org/jira/browse/HDFS-338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer resolved HDFS-338.
---

Resolution: Fixed

Closing this as stale.

 When a block is severely under replicated at creation time, a request for 
 block replication should be scheduled immediately
 ---

 Key: HDFS-338
 URL: https://issues.apache.org/jira/browse/HDFS-338
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Runping Qi

 During writing a block to data nodes, if the dfs client detects a bad data 
 node in the write pipeline, it will re-construct a new data pipeline, 
 excluding the detected bad data node. This implies that when the client 
 finishes writing the block, the number of the replicas for the block 
 may be lower than the intended replication factor. If the ratio of the number 
 of replicas to the intended replication factor is lower than
 certain threshold (say 0.68), then the client should send a request to the 
 name node to replicate that block immediately.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-358) DataBlockScanner (via periodic verification) could be improved to check for corrupt block length


[ 
https://issues.apache.org/jira/browse/HDFS-358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065666#comment-14065666
 ] 

Allen Wittenauer commented on HDFS-358:
---

Is this still an issue? I seem to recall having a discussion recently where 
this problem was still exhibited.

 DataBlockScanner (via periodic verification) could be improved to check for 
 corrupt block length
 

 Key: HDFS-358
 URL: https://issues.apache.org/jira/browse/HDFS-358
 Project: Hadoop HDFS
  Issue Type: Improvement
 Environment: All
Reporter: Lohit Vijayarenu

 DataBlockScanner should also check for truncated blocks and report them as 
 corrupt block to the NN



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6700) BlockPlacementPolicy shoud choose storage but not datanode for deletion


[ 
https://issues.apache.org/jira/browse/HDFS-6700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14065673#comment-14065673
 ] 

Hadoop QA commented on HDFS-6700:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12656321/h6700_20140717b.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7374//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7374//console

This message is automatically generated.

 BlockPlacementPolicy shoud choose storage but not datanode for deletion
 ---

 Key: HDFS-6700
 URL: https://issues.apache.org/jira/browse/HDFS-6700
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h6700_20140717.patch, h6700_20140717b.patch


 HDFS-2832 changed datanode storage model from a single storage, which may 
 correspond to multiple physical storage medias, to a collection of storages 
 with each storage corresponding to a physical storage media.
 BlockPlacementPolicy.chooseReplicaToDelete still chooses replica in term of 
 DatanodeDescriptor but not DatanodeStorageInfo.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HDFS-373) Name node should notify administrator if when struggling with replication