[jira] [Commented] (HDFS-7256) Encryption Key created in Java Key Store after Namenode start unavailable for EZ Creation
[ https://issues.apache.org/jira/browse/HDFS-7256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174804#comment-14174804 ] Yi Liu commented on HDFS-7256: -- Thanks [~xyao]. *For your question 1:* Please don't specify {{hadoop.security.crypto.jce.provider}}, it's a jce provider used for jce cryptocodec. Not for key provider uri. So please configure in hdfs-site.xml {code} property namedfs.encryption.key.provider.uri/name valuekms://http@localhost:16000/kms/value /property {code} And in kms-site.xml {code} property namehadoop.kms.key.provider.uri/name valuejceks://file@/home/hadoop/kms.keystore/value /property {code} When you use hadoop key shell, please specify {code} -provider kms://http@localhost:16000/kms {code} If you don't want specify {{-provider}} every time, please configure in core-site.xml {code} property namehadoop.security.key.provider.path/name valuekms://http@localhost:16000/kms/value /property {code} *For your question 2:* For the warning, you see it from kms log? If so, It's a warning and doesn't affect functionality, if kerberos is *not* enabled, the request sent to kms is without an user for the first time, but it will fail and trigger authenticatation again with the user name, then it successes. There was ever a bug (HADOOP-11151) to let request having an user name for the first time in non-secured mode, let me check in latest trunk whether it's fixed, if not, I can fix that. Encryption Key created in Java Key Store after Namenode start unavailable for EZ Creation -- Key: HDFS-7256 URL: https://issues.apache.org/jira/browse/HDFS-7256 Project: Hadoop HDFS Issue Type: Bug Components: encryption, security Affects Versions: 2.6.0 Reporter: Xiaoyu Yao Hit an error on RemoteException: Key ezkey1 doesn't exist. when creating EZ with a Key created after NN starts. Briefly check the code and found that the KeyProivder is loaded by FSN only at the NN start. My work around is to restart the NN which triggers the reload of Key Provider. Is this expected? Repro Steps: Create a new Key after NN and KMS starts hadoop/bin/hadoop key create ezkey1 -size 256 -provider jceks://file/home/hadoop/kms.keystore List Keys hadoop@SaturnVm:~/deploy$ hadoop/bin/hadoop key list -provider jceks://file/home/hadoop/kms.keystore -metadata Listing keys for KeyProvider: jceks://file/home/hadoop/kms.keystore ezkey1 : cipher: AES/CTR/NoPadding, length: 256, description: null, created: Thu Oct 16 18:51:30 EDT 2014, version: 1, attributes: null key2 : cipher: AES/CTR/NoPadding, length: 128, description: null, created: Tue Oct 14 19:44:09 EDT 2014, version: 1, attributes: null key1 : cipher: AES/CTR/NoPadding, length: 128, description: null, created: Tue Oct 14 17:52:36 EDT 2014, version: 1, attributes: null Create Encryption Zone hadoop/bin/hdfs dfs -mkdir /Ez1 hadoop@SaturnVm:~/deploy$ hadoop/bin/hdfs crypto -createZone -keyName ezkey1 -path /Ez1 RemoteException: Key ezkey1 doesn't exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7256) Encryption Key created in Java Key Store after Namenode start unavailable for EZ Creation
[ https://issues.apache.org/jira/browse/HDFS-7256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174813#comment-14174813 ] Yi Liu commented on HDFS-7256: -- [~xyao], ideally the hdfs encryption is recommended used in secured environment(keberos enabled), and in this case, there is no this warning. Furthermore {quote} The client runs with user 'hadoop'. The proxyuser and delegation token(use default) are set up in kms-site.xml. !-- proxyuser configuration for user named: hadoop-- property namehadoop.kms.proxyuser.hadoop.users/name value*/value /property {quote} Your use case is not the proxyuser, the reason is as I said in above comment. Encryption Key created in Java Key Store after Namenode start unavailable for EZ Creation -- Key: HDFS-7256 URL: https://issues.apache.org/jira/browse/HDFS-7256 Project: Hadoop HDFS Issue Type: Bug Components: encryption, security Affects Versions: 2.6.0 Reporter: Xiaoyu Yao Hit an error on RemoteException: Key ezkey1 doesn't exist. when creating EZ with a Key created after NN starts. Briefly check the code and found that the KeyProivder is loaded by FSN only at the NN start. My work around is to restart the NN which triggers the reload of Key Provider. Is this expected? Repro Steps: Create a new Key after NN and KMS starts hadoop/bin/hadoop key create ezkey1 -size 256 -provider jceks://file/home/hadoop/kms.keystore List Keys hadoop@SaturnVm:~/deploy$ hadoop/bin/hadoop key list -provider jceks://file/home/hadoop/kms.keystore -metadata Listing keys for KeyProvider: jceks://file/home/hadoop/kms.keystore ezkey1 : cipher: AES/CTR/NoPadding, length: 256, description: null, created: Thu Oct 16 18:51:30 EDT 2014, version: 1, attributes: null key2 : cipher: AES/CTR/NoPadding, length: 128, description: null, created: Tue Oct 14 19:44:09 EDT 2014, version: 1, attributes: null key1 : cipher: AES/CTR/NoPadding, length: 128, description: null, created: Tue Oct 14 17:52:36 EDT 2014, version: 1, attributes: null Create Encryption Zone hadoop/bin/hdfs dfs -mkdir /Ez1 hadoop@SaturnVm:~/deploy$ hadoop/bin/hdfs crypto -createZone -keyName ezkey1 -path /Ez1 RemoteException: Key ezkey1 doesn't exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7256) Encryption Key created in Java Key Store after Namenode start unavailable for EZ Creation
[ https://issues.apache.org/jira/browse/HDFS-7256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174818#comment-14174818 ] Yi Liu commented on HDFS-7256: -- {quote} Can you point me the link to fs-encryption/KMS user doc if there is a different one {quote} HDFS encryption is not included in 2.5.1 or before, so there is no on-line document, I mean you could compile the user doc using: {{mvn clean site; mvn site:stage -DstagingDirectory=/tmp/hadoop-site}} Encryption Key created in Java Key Store after Namenode start unavailable for EZ Creation -- Key: HDFS-7256 URL: https://issues.apache.org/jira/browse/HDFS-7256 Project: Hadoop HDFS Issue Type: Bug Components: encryption, security Affects Versions: 2.6.0 Reporter: Xiaoyu Yao Hit an error on RemoteException: Key ezkey1 doesn't exist. when creating EZ with a Key created after NN starts. Briefly check the code and found that the KeyProivder is loaded by FSN only at the NN start. My work around is to restart the NN which triggers the reload of Key Provider. Is this expected? Repro Steps: Create a new Key after NN and KMS starts hadoop/bin/hadoop key create ezkey1 -size 256 -provider jceks://file/home/hadoop/kms.keystore List Keys hadoop@SaturnVm:~/deploy$ hadoop/bin/hadoop key list -provider jceks://file/home/hadoop/kms.keystore -metadata Listing keys for KeyProvider: jceks://file/home/hadoop/kms.keystore ezkey1 : cipher: AES/CTR/NoPadding, length: 256, description: null, created: Thu Oct 16 18:51:30 EDT 2014, version: 1, attributes: null key2 : cipher: AES/CTR/NoPadding, length: 128, description: null, created: Tue Oct 14 19:44:09 EDT 2014, version: 1, attributes: null key1 : cipher: AES/CTR/NoPadding, length: 128, description: null, created: Tue Oct 14 17:52:36 EDT 2014, version: 1, attributes: null Create Encryption Zone hadoop/bin/hdfs dfs -mkdir /Ez1 hadoop@SaturnVm:~/deploy$ hadoop/bin/hdfs crypto -createZone -keyName ezkey1 -path /Ez1 RemoteException: Key ezkey1 doesn't exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7256) Encryption Key created in Java Key Store after Namenode start unavailable for EZ Creation
[ https://issues.apache.org/jira/browse/HDFS-7256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174859#comment-14174859 ] Yi Liu commented on HDFS-7256: -- For the warning, I found KMS client or HttpFS client always try {{KerberosAuthenticator}} first, then will trigger a fallback if server is not security enabled. So we can ignore that warning and it appears in non-secured mode. Encryption Key created in Java Key Store after Namenode start unavailable for EZ Creation -- Key: HDFS-7256 URL: https://issues.apache.org/jira/browse/HDFS-7256 Project: Hadoop HDFS Issue Type: Bug Components: encryption, security Affects Versions: 2.6.0 Reporter: Xiaoyu Yao Hit an error on RemoteException: Key ezkey1 doesn't exist. when creating EZ with a Key created after NN starts. Briefly check the code and found that the KeyProivder is loaded by FSN only at the NN start. My work around is to restart the NN which triggers the reload of Key Provider. Is this expected? Repro Steps: Create a new Key after NN and KMS starts hadoop/bin/hadoop key create ezkey1 -size 256 -provider jceks://file/home/hadoop/kms.keystore List Keys hadoop@SaturnVm:~/deploy$ hadoop/bin/hadoop key list -provider jceks://file/home/hadoop/kms.keystore -metadata Listing keys for KeyProvider: jceks://file/home/hadoop/kms.keystore ezkey1 : cipher: AES/CTR/NoPadding, length: 256, description: null, created: Thu Oct 16 18:51:30 EDT 2014, version: 1, attributes: null key2 : cipher: AES/CTR/NoPadding, length: 128, description: null, created: Tue Oct 14 19:44:09 EDT 2014, version: 1, attributes: null key1 : cipher: AES/CTR/NoPadding, length: 128, description: null, created: Tue Oct 14 17:52:36 EDT 2014, version: 1, attributes: null Create Encryption Zone hadoop/bin/hdfs dfs -mkdir /Ez1 hadoop@SaturnVm:~/deploy$ hadoop/bin/hdfs crypto -createZone -keyName ezkey1 -path /Ez1 RemoteException: Key ezkey1 doesn't exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7240) Object store in HDFS
[ https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174888#comment-14174888 ] Fengdong Yu commented on HDFS-7240: --- please look at here for a simple description: http://www.hortonworks.com/blog/ozone-object-store-hdfs/ Object store in HDFS Key: HDFS-7240 URL: https://issues.apache.org/jira/browse/HDFS-7240 Project: Hadoop HDFS Issue Type: New Feature Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey This jira proposes to add object store capabilities into HDFS. As part of the federation work (HDFS-1052) we separated block storage as a generic storage layer. Using the Block Pool abstraction, new kinds of namespaces can be built on top of the storage layer i.e. datanodes. In this jira I will explore building an object store using the datanode storage, but independent of namespace metadata. I will soon update with a detailed design document. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7256) Encryption Key created in Java Key Store after Namenode start unavailable for EZ Creation
[ https://issues.apache.org/jira/browse/HDFS-7256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174896#comment-14174896 ] Xiaoyu Yao commented on HDFS-7256: -- Thanks [~hitliuyi] again for the clarification. Three more follow up questions: 1. KMS and Hadoop Key Shell allows creating keys of length 128. But HDFS seems to have a hard limitation of AES-CTS 128 only. Is this expected? hadoop@hadoopdev:~/deploy$ hadoop/bin/hadoop key list -metadata Listing keys for KeyProvider: KMSClientProvider[http://localhost:16000/kms/v1/] key2 : cipher: AES/CTR/NoPadding, length: 256, description: null, created: Thu Oct 16 22:42:20 PDT 2014, version: 1, attributes: [key.acl.name=key2] key1 : cipher: AES/CTR/NoPadding, length: 128, description: null, created: Thu Oct 16 14:28:53 PDT 2014, version: 1, attributes: null hadoop@hadoopdev:~/deploy$ hadoop/bin/hdfs crypto -createZone -path /ez2 -keyName key2 RemoteException: java.util.concurrent.ExecutionException: java.io.IOException: java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: java.security.InvalidKeyException: Illegal key size 2. Thanks for pointing me the 'hadoop.security.key.provider.path'. That's exactly what I'm looking for. However, I did not find it as it is hard coded in KeyProviderFactory.java, which is different from other security configuration keys in CommonConfigurationKeysPublic.java. If this key is targeted for public usage, I would suggest to put it in CommonConfigurationKeysPublic.java and also include in the hadoop key shell help message. 3. The document mentioned that copy file between EZs with different EZ-keys or copy file form EZ to non-EZ directory are not allowed. But my test shows it works completely fine. Is this explicitly blocked or just not recommended? Encryption Key created in Java Key Store after Namenode start unavailable for EZ Creation -- Key: HDFS-7256 URL: https://issues.apache.org/jira/browse/HDFS-7256 Project: Hadoop HDFS Issue Type: Bug Components: encryption, security Affects Versions: 2.6.0 Reporter: Xiaoyu Yao Hit an error on RemoteException: Key ezkey1 doesn't exist. when creating EZ with a Key created after NN starts. Briefly check the code and found that the KeyProivder is loaded by FSN only at the NN start. My work around is to restart the NN which triggers the reload of Key Provider. Is this expected? Repro Steps: Create a new Key after NN and KMS starts hadoop/bin/hadoop key create ezkey1 -size 256 -provider jceks://file/home/hadoop/kms.keystore List Keys hadoop@SaturnVm:~/deploy$ hadoop/bin/hadoop key list -provider jceks://file/home/hadoop/kms.keystore -metadata Listing keys for KeyProvider: jceks://file/home/hadoop/kms.keystore ezkey1 : cipher: AES/CTR/NoPadding, length: 256, description: null, created: Thu Oct 16 18:51:30 EDT 2014, version: 1, attributes: null key2 : cipher: AES/CTR/NoPadding, length: 128, description: null, created: Tue Oct 14 19:44:09 EDT 2014, version: 1, attributes: null key1 : cipher: AES/CTR/NoPadding, length: 128, description: null, created: Tue Oct 14 17:52:36 EDT 2014, version: 1, attributes: null Create Encryption Zone hadoop/bin/hdfs dfs -mkdir /Ez1 hadoop@SaturnVm:~/deploy$ hadoop/bin/hdfs crypto -createZone -keyName ezkey1 -path /Ez1 RemoteException: Key ezkey1 doesn't exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7242) Code improvement for FSN#checkUnreadableBySuperuser
[ https://issues.apache.org/jira/browse/HDFS-7242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174963#comment-14174963 ] Vinayakumar B commented on HDFS-7242: - +1, patch looks good. Good catch [~hitliuyi]. Will commit the patch soon Code improvement for FSN#checkUnreadableBySuperuser --- Key: HDFS-7242 URL: https://issues.apache.org/jira/browse/HDFS-7242 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.6.0 Reporter: Yi Liu Assignee: Yi Liu Priority: Minor Attachments: HDFS-7242.001.patch _checkUnreadableBySuperuser_ is to check whether super user can access specific path. The code logic is not efficient. It does iteration check for all user, actually we just need to check _super user_ and can save few cpu cycle. {code} private void checkUnreadableBySuperuser(FSPermissionChecker pc, INode inode, int snapshotId) throws IOException { for (XAttr xattr : dir.getXAttrs(inode, snapshotId)) { if (XAttrHelper.getPrefixName(xattr). equals(SECURITY_XATTR_UNREADABLE_BY_SUPERUSER)) { if (pc.isSuperUser()) { throw new AccessControlException(Access is denied for + pc.getUser() + since the superuser is not allowed to + perform this operation.); } } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7251) Hadoop fs -put documentation issue
[ https://issues.apache.org/jira/browse/HDFS-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174967#comment-14174967 ] liu chang commented on HDFS-7251: - e...A directory is also a file in *nix system. Hadoop fs -put documentation issue -- Key: HDFS-7251 URL: https://issues.apache.org/jira/browse/HDFS-7251 Project: Hadoop HDFS Issue Type: Task Components: nfs Reporter: Sai Srikanth Priority: Minor cmd Hadoop fs -put documentation, in most of the version, it was given that source should be file. https://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-common/FileSystemShell.html#put Usage: hdfs dfs -put localsrc ... dst Copy single src, or multiple srcs from local file system to the destination file system. Also reads input from stdin and writes to destination file system. hdfs dfs -put localfile /user/hadoop/hadoopfile hdfs dfs -put localfile1 localfile2 /user/hadoop/hadoopdir hdfs dfs -put localfile hdfs://nn.example.com/hadoop/hadoopfile hdfs dfs -put - hdfs://nn.example.com/hadoop/hadoopfile Reads the input from stdin. I have tested with the directory as a source and it worked fine. I think the documentation need to updated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6995) Block should be placed in the client's 'rack-local' node if 'client-local' node is not available
[ https://issues.apache.org/jira/browse/HDFS-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174986#comment-14174986 ] Hudson commented on HDFS-6995: -- FAILURE: Integrated in Hadoop-trunk-Commit #6278 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6278/]) HDFS-6995. Block should be placed in the client's 'rack-local' node if 'client-local' node is not available (vinayakumarb) (vinayakumarb: rev cba1f9e3896c0526fa748cd1bb13470d5fae584a) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDefaultBlockPlacementPolicy.java Block should be placed in the client's 'rack-local' node if 'client-local' node is not available Key: HDFS-6995 URL: https://issues.apache.org/jira/browse/HDFS-6995 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Vinayakumar B Assignee: Vinayakumar B Fix For: 2.6.0 Attachments: HDFS-6995-001.patch, HDFS-6995-002.patch, HDFS-6995-003.patch, HDFS-6995-004.patch, HDFS-6995-005.patch, HDFS-6995-006.patch, HDFS-6995-007.patch HDFS cluster is rack aware. Client is in different node than of datanode, but Same rack contains one or more datanodes. In this case first preference should be given to select 'rack-local' node. Currently, since no Node in clusterMap corresponds to client's location, blockplacement policy choosing a *random* node as local node and proceeding for further placements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6995) Block should be placed in the client's 'rack-local' node if 'client-local' node is not available
[ https://issues.apache.org/jira/browse/HDFS-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174990#comment-14174990 ] Uma Maheswara Rao G commented on HDFS-6995: --- I am in the business trip to US. Expect my delayed responses in this period. Block should be placed in the client's 'rack-local' node if 'client-local' node is not available Key: HDFS-6995 URL: https://issues.apache.org/jira/browse/HDFS-6995 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Vinayakumar B Assignee: Vinayakumar B Fix For: 2.6.0 Attachments: HDFS-6995-001.patch, HDFS-6995-002.patch, HDFS-6995-003.patch, HDFS-6995-004.patch, HDFS-6995-005.patch, HDFS-6995-006.patch, HDFS-6995-007.patch HDFS cluster is rack aware. Client is in different node than of datanode, but Same rack contains one or more datanodes. In this case first preference should be given to select 'rack-local' node. Currently, since no Node in clusterMap corresponds to client's location, blockplacement policy choosing a *random* node as local node and proceeding for further placements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7242) Code improvement for FSN#checkUnreadableBySuperuser
[ https://issues.apache.org/jira/browse/HDFS-7242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174995#comment-14174995 ] Hudson commented on HDFS-7242: -- FAILURE: Integrated in Hadoop-trunk-Commit #6279 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6279/]) HDFS-7242. Code improvement for FSN#checkUnreadableBySuperuser. (Contributed by Yi Liu) (vinayakumarb: rev 1c3ff0b7c892b9d70737c375fb6f4a6fc6dd6d81) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Code improvement for FSN#checkUnreadableBySuperuser --- Key: HDFS-7242 URL: https://issues.apache.org/jira/browse/HDFS-7242 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.6.0 Reporter: Yi Liu Assignee: Yi Liu Priority: Minor Attachments: HDFS-7242.001.patch _checkUnreadableBySuperuser_ is to check whether super user can access specific path. The code logic is not efficient. It does iteration check for all user, actually we just need to check _super user_ and can save few cpu cycle. {code} private void checkUnreadableBySuperuser(FSPermissionChecker pc, INode inode, int snapshotId) throws IOException { for (XAttr xattr : dir.getXAttrs(inode, snapshotId)) { if (XAttrHelper.getPrefixName(xattr). equals(SECURITY_XATTR_UNREADABLE_BY_SUPERUSER)) { if (pc.isSuperUser()) { throw new AccessControlException(Access is denied for + pc.getUser() + since the superuser is not allowed to + perform this operation.); } } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7242) Code improvement for FSN#checkUnreadableBySuperuser
[ https://issues.apache.org/jira/browse/HDFS-7242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174996#comment-14174996 ] Vinayakumar B commented on HDFS-7242: - Committed to trunk and branch-2. Thanks [~hitliuyi] for the patch. and [~clamb] for the review. Code improvement for FSN#checkUnreadableBySuperuser --- Key: HDFS-7242 URL: https://issues.apache.org/jira/browse/HDFS-7242 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.6.0 Reporter: Yi Liu Assignee: Yi Liu Priority: Minor Attachments: HDFS-7242.001.patch _checkUnreadableBySuperuser_ is to check whether super user can access specific path. The code logic is not efficient. It does iteration check for all user, actually we just need to check _super user_ and can save few cpu cycle. {code} private void checkUnreadableBySuperuser(FSPermissionChecker pc, INode inode, int snapshotId) throws IOException { for (XAttr xattr : dir.getXAttrs(inode, snapshotId)) { if (XAttrHelper.getPrefixName(xattr). equals(SECURITY_XATTR_UNREADABLE_BY_SUPERUSER)) { if (pc.isSuperUser()) { throw new AccessControlException(Access is denied for + pc.getUser() + since the superuser is not allowed to + perform this operation.); } } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7242) Code improvement for FSN#checkUnreadableBySuperuser
[ https://issues.apache.org/jira/browse/HDFS-7242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-7242: Resolution: Fixed Fix Version/s: 2.7.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Code improvement for FSN#checkUnreadableBySuperuser --- Key: HDFS-7242 URL: https://issues.apache.org/jira/browse/HDFS-7242 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.6.0 Reporter: Yi Liu Assignee: Yi Liu Priority: Minor Fix For: 2.7.0 Attachments: HDFS-7242.001.patch _checkUnreadableBySuperuser_ is to check whether super user can access specific path. The code logic is not efficient. It does iteration check for all user, actually we just need to check _super user_ and can save few cpu cycle. {code} private void checkUnreadableBySuperuser(FSPermissionChecker pc, INode inode, int snapshotId) throws IOException { for (XAttr xattr : dir.getXAttrs(inode, snapshotId)) { if (XAttrHelper.getPrefixName(xattr). equals(SECURITY_XATTR_UNREADABLE_BY_SUPERUSER)) { if (pc.isSuperUser()) { throw new AccessControlException(Access is denied for + pc.getUser() + since the superuser is not allowed to + perform this operation.); } } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7252) small refinement to the use of isInAnEZ in FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175003#comment-14175003 ] Vinayakumar B commented on HDFS-7252: - +1, Patch looks good to me. small refinement to the use of isInAnEZ in FSNamesystem --- Key: HDFS-7252 URL: https://issues.apache.org/jira/browse/HDFS-7252 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Yi Liu Assignee: Yi Liu Priority: Trivial Attachments: HDFS-7252.001.patch, HDFS-7252.002.patch In {{FSN#startFileInt}}, _EncryptionZoneManager#getEncryptionZoneForPath_ is invoked 3 times (_dir.isInAnEZ(iip)_, _dir.getEZForPath(iip)_, _dir.getKeyName(iip)_) in following code, actually we just need one. {code} if (dir.isInAnEZ(iip)) { EncryptionZone zone = dir.getEZForPath(iip); protocolVersion = chooseProtocolVersion(zone, supportedVersions); suite = zone.getSuite(); ezKeyName = dir.getKeyName(iip); Preconditions.checkNotNull(protocolVersion); Preconditions.checkNotNull(suite); Preconditions.checkArgument(!suite.equals(CipherSuite.UNKNOWN), Chose an UNKNOWN CipherSuite!); Preconditions.checkNotNull(ezKeyName); } {code} Also there are 2 times in following code, but just need one {code} if (dir.isInAnEZ(iip)) { // The path is now within an EZ, but we're missing encryption parameters if (suite == null || edek == null) { throw new RetryStartFileException(); } // Path is within an EZ and we have provided encryption parameters. // Make sure that the generated EDEK matches the settings of the EZ. String ezKeyName = dir.getKeyName(iip); if (!ezKeyName.equals(edek.getEncryptionKeyName())) { throw new RetryStartFileException(); } feInfo = new FileEncryptionInfo(suite, version, edek.getEncryptedKeyVersion().getMaterial(), edek.getEncryptedKeyIv(), ezKeyName, edek.getEncryptionKeyVersionName()); Preconditions.checkNotNull(feInfo); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7252) small refinement to the use of isInAnEZ in FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175006#comment-14175006 ] Vinayakumar B commented on HDFS-7252: - Committed to trunk and branch-2. Thanks [~hitliuyi] for the patch and [~clamb] for the review. small refinement to the use of isInAnEZ in FSNamesystem --- Key: HDFS-7252 URL: https://issues.apache.org/jira/browse/HDFS-7252 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Yi Liu Assignee: Yi Liu Priority: Trivial Fix For: 2.7.0 Attachments: HDFS-7252.001.patch, HDFS-7252.002.patch In {{FSN#startFileInt}}, _EncryptionZoneManager#getEncryptionZoneForPath_ is invoked 3 times (_dir.isInAnEZ(iip)_, _dir.getEZForPath(iip)_, _dir.getKeyName(iip)_) in following code, actually we just need one. {code} if (dir.isInAnEZ(iip)) { EncryptionZone zone = dir.getEZForPath(iip); protocolVersion = chooseProtocolVersion(zone, supportedVersions); suite = zone.getSuite(); ezKeyName = dir.getKeyName(iip); Preconditions.checkNotNull(protocolVersion); Preconditions.checkNotNull(suite); Preconditions.checkArgument(!suite.equals(CipherSuite.UNKNOWN), Chose an UNKNOWN CipherSuite!); Preconditions.checkNotNull(ezKeyName); } {code} Also there are 2 times in following code, but just need one {code} if (dir.isInAnEZ(iip)) { // The path is now within an EZ, but we're missing encryption parameters if (suite == null || edek == null) { throw new RetryStartFileException(); } // Path is within an EZ and we have provided encryption parameters. // Make sure that the generated EDEK matches the settings of the EZ. String ezKeyName = dir.getKeyName(iip); if (!ezKeyName.equals(edek.getEncryptionKeyName())) { throw new RetryStartFileException(); } feInfo = new FileEncryptionInfo(suite, version, edek.getEncryptedKeyVersion().getMaterial(), edek.getEncryptedKeyIv(), ezKeyName, edek.getEncryptionKeyVersionName()); Preconditions.checkNotNull(feInfo); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7252) small refinement to the use of isInAnEZ in FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-7252: Resolution: Fixed Fix Version/s: 2.7.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) small refinement to the use of isInAnEZ in FSNamesystem --- Key: HDFS-7252 URL: https://issues.apache.org/jira/browse/HDFS-7252 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Yi Liu Assignee: Yi Liu Priority: Trivial Fix For: 2.7.0 Attachments: HDFS-7252.001.patch, HDFS-7252.002.patch In {{FSN#startFileInt}}, _EncryptionZoneManager#getEncryptionZoneForPath_ is invoked 3 times (_dir.isInAnEZ(iip)_, _dir.getEZForPath(iip)_, _dir.getKeyName(iip)_) in following code, actually we just need one. {code} if (dir.isInAnEZ(iip)) { EncryptionZone zone = dir.getEZForPath(iip); protocolVersion = chooseProtocolVersion(zone, supportedVersions); suite = zone.getSuite(); ezKeyName = dir.getKeyName(iip); Preconditions.checkNotNull(protocolVersion); Preconditions.checkNotNull(suite); Preconditions.checkArgument(!suite.equals(CipherSuite.UNKNOWN), Chose an UNKNOWN CipherSuite!); Preconditions.checkNotNull(ezKeyName); } {code} Also there are 2 times in following code, but just need one {code} if (dir.isInAnEZ(iip)) { // The path is now within an EZ, but we're missing encryption parameters if (suite == null || edek == null) { throw new RetryStartFileException(); } // Path is within an EZ and we have provided encryption parameters. // Make sure that the generated EDEK matches the settings of the EZ. String ezKeyName = dir.getKeyName(iip); if (!ezKeyName.equals(edek.getEncryptionKeyName())) { throw new RetryStartFileException(); } feInfo = new FileEncryptionInfo(suite, version, edek.getEncryptedKeyVersion().getMaterial(), edek.getEncryptedKeyIv(), ezKeyName, edek.getEncryptionKeyVersionName()); Preconditions.checkNotNull(feInfo); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7252) small refinement to the use of isInAnEZ in FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175016#comment-14175016 ] Hudson commented on HDFS-7252: -- FAILURE: Integrated in Hadoop-trunk-Commit #6280 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6280/]) HDFS-7252. small refinement to the use of isInAnEZ in FSNamesystem. (Yi Liu via vinayakumarb) (vinayakumarb: rev 368743140dd076ecd5af309c1ed83c5ae2d59fc8) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt small refinement to the use of isInAnEZ in FSNamesystem --- Key: HDFS-7252 URL: https://issues.apache.org/jira/browse/HDFS-7252 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Yi Liu Assignee: Yi Liu Priority: Trivial Fix For: 2.7.0 Attachments: HDFS-7252.001.patch, HDFS-7252.002.patch In {{FSN#startFileInt}}, _EncryptionZoneManager#getEncryptionZoneForPath_ is invoked 3 times (_dir.isInAnEZ(iip)_, _dir.getEZForPath(iip)_, _dir.getKeyName(iip)_) in following code, actually we just need one. {code} if (dir.isInAnEZ(iip)) { EncryptionZone zone = dir.getEZForPath(iip); protocolVersion = chooseProtocolVersion(zone, supportedVersions); suite = zone.getSuite(); ezKeyName = dir.getKeyName(iip); Preconditions.checkNotNull(protocolVersion); Preconditions.checkNotNull(suite); Preconditions.checkArgument(!suite.equals(CipherSuite.UNKNOWN), Chose an UNKNOWN CipherSuite!); Preconditions.checkNotNull(ezKeyName); } {code} Also there are 2 times in following code, but just need one {code} if (dir.isInAnEZ(iip)) { // The path is now within an EZ, but we're missing encryption parameters if (suite == null || edek == null) { throw new RetryStartFileException(); } // Path is within an EZ and we have provided encryption parameters. // Make sure that the generated EDEK matches the settings of the EZ. String ezKeyName = dir.getKeyName(iip); if (!ezKeyName.equals(edek.getEncryptionKeyName())) { throw new RetryStartFileException(); } feInfo = new FileEncryptionInfo(suite, version, edek.getEncryptedKeyVersion().getMaterial(), edek.getEncryptedKeyIv(), ezKeyName, edek.getEncryptionKeyVersionName()); Preconditions.checkNotNull(feInfo); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7252) small refinement to the use of isInAnEZ in FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175066#comment-14175066 ] Hudson commented on HDFS-7252: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1929 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1929/]) HDFS-7252. small refinement to the use of isInAnEZ in FSNamesystem. (Yi Liu via vinayakumarb) (vinayakumarb: rev 368743140dd076ecd5af309c1ed83c5ae2d59fc8) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java small refinement to the use of isInAnEZ in FSNamesystem --- Key: HDFS-7252 URL: https://issues.apache.org/jira/browse/HDFS-7252 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Yi Liu Assignee: Yi Liu Priority: Trivial Fix For: 2.7.0 Attachments: HDFS-7252.001.patch, HDFS-7252.002.patch In {{FSN#startFileInt}}, _EncryptionZoneManager#getEncryptionZoneForPath_ is invoked 3 times (_dir.isInAnEZ(iip)_, _dir.getEZForPath(iip)_, _dir.getKeyName(iip)_) in following code, actually we just need one. {code} if (dir.isInAnEZ(iip)) { EncryptionZone zone = dir.getEZForPath(iip); protocolVersion = chooseProtocolVersion(zone, supportedVersions); suite = zone.getSuite(); ezKeyName = dir.getKeyName(iip); Preconditions.checkNotNull(protocolVersion); Preconditions.checkNotNull(suite); Preconditions.checkArgument(!suite.equals(CipherSuite.UNKNOWN), Chose an UNKNOWN CipherSuite!); Preconditions.checkNotNull(ezKeyName); } {code} Also there are 2 times in following code, but just need one {code} if (dir.isInAnEZ(iip)) { // The path is now within an EZ, but we're missing encryption parameters if (suite == null || edek == null) { throw new RetryStartFileException(); } // Path is within an EZ and we have provided encryption parameters. // Make sure that the generated EDEK matches the settings of the EZ. String ezKeyName = dir.getKeyName(iip); if (!ezKeyName.equals(edek.getEncryptionKeyName())) { throw new RetryStartFileException(); } feInfo = new FileEncryptionInfo(suite, version, edek.getEncryptedKeyVersion().getMaterial(), edek.getEncryptedKeyIv(), ezKeyName, edek.getEncryptionKeyVersionName()); Preconditions.checkNotNull(feInfo); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7242) Code improvement for FSN#checkUnreadableBySuperuser
[ https://issues.apache.org/jira/browse/HDFS-7242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175065#comment-14175065 ] Hudson commented on HDFS-7242: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1929 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1929/]) HDFS-7242. Code improvement for FSN#checkUnreadableBySuperuser. (Contributed by Yi Liu) (vinayakumarb: rev 1c3ff0b7c892b9d70737c375fb6f4a6fc6dd6d81) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java Code improvement for FSN#checkUnreadableBySuperuser --- Key: HDFS-7242 URL: https://issues.apache.org/jira/browse/HDFS-7242 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.6.0 Reporter: Yi Liu Assignee: Yi Liu Priority: Minor Fix For: 2.7.0 Attachments: HDFS-7242.001.patch _checkUnreadableBySuperuser_ is to check whether super user can access specific path. The code logic is not efficient. It does iteration check for all user, actually we just need to check _super user_ and can save few cpu cycle. {code} private void checkUnreadableBySuperuser(FSPermissionChecker pc, INode inode, int snapshotId) throws IOException { for (XAttr xattr : dir.getXAttrs(inode, snapshotId)) { if (XAttrHelper.getPrefixName(xattr). equals(SECURITY_XATTR_UNREADABLE_BY_SUPERUSER)) { if (pc.isSuperUser()) { throw new AccessControlException(Access is denied for + pc.getUser() + since the superuser is not allowed to + perform this operation.); } } } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6995) Block should be placed in the client's 'rack-local' node if 'client-local' node is not available
[ https://issues.apache.org/jira/browse/HDFS-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175067#comment-14175067 ] Hudson commented on HDFS-6995: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1929 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1929/]) HDFS-6995. Block should be placed in the client's 'rack-local' node if 'client-local' node is not available (vinayakumarb) (vinayakumarb: rev cba1f9e3896c0526fa748cd1bb13470d5fae584a) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestDefaultBlockPlacementPolicy.java Block should be placed in the client's 'rack-local' node if 'client-local' node is not available Key: HDFS-6995 URL: https://issues.apache.org/jira/browse/HDFS-6995 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.5.0 Reporter: Vinayakumar B Assignee: Vinayakumar B Fix For: 2.6.0 Attachments: HDFS-6995-001.patch, HDFS-6995-002.patch, HDFS-6995-003.patch, HDFS-6995-004.patch, HDFS-6995-005.patch, HDFS-6995-006.patch, HDFS-6995-007.patch HDFS cluster is rack aware. Client is in different node than of datanode, but Same rack contains one or more datanodes. In this case first preference should be given to select 'rack-local' node. Currently, since no Node in clusterMap corresponds to client's location, blockplacement policy choosing a *random* node as local node and proceeding for further placements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7256) Encryption Key created in Java Key Store after Namenode start unavailable for EZ Creation
[ https://issues.apache.org/jira/browse/HDFS-7256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175101#comment-14175101 ] Yi Liu commented on HDFS-7256: -- Thanks [~xyao] for trying this. Responses to your comments: *1.* I think you are using java JCE crypto codec (If openssl is not configured or incorrect version, JCE will be used), by default, JCE only supports 128bits, if you want to use 256bits, you need to download additional thing from Oracle. *2.* Ideally {{hadoop.security.key.provider.path}} is better in _CommonConfigurationKeysPublic_, it's committed early and we do not modified it later. *3.* You are talking about *rename* which is not allowed between EZs with different EZ-keys or from EZ to non-EZ directly, but {{cp}} is allowed. Encryption Key created in Java Key Store after Namenode start unavailable for EZ Creation -- Key: HDFS-7256 URL: https://issues.apache.org/jira/browse/HDFS-7256 Project: Hadoop HDFS Issue Type: Bug Components: encryption, security Affects Versions: 2.6.0 Reporter: Xiaoyu Yao Hit an error on RemoteException: Key ezkey1 doesn't exist. when creating EZ with a Key created after NN starts. Briefly check the code and found that the KeyProivder is loaded by FSN only at the NN start. My work around is to restart the NN which triggers the reload of Key Provider. Is this expected? Repro Steps: Create a new Key after NN and KMS starts hadoop/bin/hadoop key create ezkey1 -size 256 -provider jceks://file/home/hadoop/kms.keystore List Keys hadoop@SaturnVm:~/deploy$ hadoop/bin/hadoop key list -provider jceks://file/home/hadoop/kms.keystore -metadata Listing keys for KeyProvider: jceks://file/home/hadoop/kms.keystore ezkey1 : cipher: AES/CTR/NoPadding, length: 256, description: null, created: Thu Oct 16 18:51:30 EDT 2014, version: 1, attributes: null key2 : cipher: AES/CTR/NoPadding, length: 128, description: null, created: Tue Oct 14 19:44:09 EDT 2014, version: 1, attributes: null key1 : cipher: AES/CTR/NoPadding, length: 128, description: null, created: Tue Oct 14 17:52:36 EDT 2014, version: 1, attributes: null Create Encryption Zone hadoop/bin/hdfs dfs -mkdir /Ez1 hadoop@SaturnVm:~/deploy$ hadoop/bin/hdfs crypto -createZone -keyName ezkey1 -path /Ez1 RemoteException: Key ezkey1 doesn't exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7256) Encryption Key created in Java Key Store after Namenode start unavailable for EZ Creation
[ https://issues.apache.org/jira/browse/HDFS-7256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175105#comment-14175105 ] Yi Liu commented on HDFS-7256: -- For *rename*, actually only allowed in same EZ, not allowed between EZs even with same EZ-keys. Encryption Key created in Java Key Store after Namenode start unavailable for EZ Creation -- Key: HDFS-7256 URL: https://issues.apache.org/jira/browse/HDFS-7256 Project: Hadoop HDFS Issue Type: Bug Components: encryption, security Affects Versions: 2.6.0 Reporter: Xiaoyu Yao Hit an error on RemoteException: Key ezkey1 doesn't exist. when creating EZ with a Key created after NN starts. Briefly check the code and found that the KeyProivder is loaded by FSN only at the NN start. My work around is to restart the NN which triggers the reload of Key Provider. Is this expected? Repro Steps: Create a new Key after NN and KMS starts hadoop/bin/hadoop key create ezkey1 -size 256 -provider jceks://file/home/hadoop/kms.keystore List Keys hadoop@SaturnVm:~/deploy$ hadoop/bin/hadoop key list -provider jceks://file/home/hadoop/kms.keystore -metadata Listing keys for KeyProvider: jceks://file/home/hadoop/kms.keystore ezkey1 : cipher: AES/CTR/NoPadding, length: 256, description: null, created: Thu Oct 16 18:51:30 EDT 2014, version: 1, attributes: null key2 : cipher: AES/CTR/NoPadding, length: 128, description: null, created: Tue Oct 14 19:44:09 EDT 2014, version: 1, attributes: null key1 : cipher: AES/CTR/NoPadding, length: 128, description: null, created: Tue Oct 14 17:52:36 EDT 2014, version: 1, attributes: null Create Encryption Zone hadoop/bin/hdfs dfs -mkdir /Ez1 hadoop@SaturnVm:~/deploy$ hadoop/bin/hdfs crypto -createZone -keyName ezkey1 -path /Ez1 RemoteException: Key ezkey1 doesn't exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7204) balancer doesn't run as a daemon
[ https://issues.apache.org/jira/browse/HDFS-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175107#comment-14175107 ] Yongjun Zhang commented on HDFS-7204: - Thanks [~aw], I created HADOOP-11208. balancer doesn't run as a daemon Key: HDFS-7204 URL: https://issues.apache.org/jira/browse/HDFS-7204 Project: Hadoop HDFS Issue Type: Bug Components: scripts Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Allen Wittenauer Priority: Blocker Labels: newbie Attachments: HDFS-7204-01.patch, HDFS-7204.patch From HDFS-7184, minor issues with balancer: * daemon isn't set to true in hdfs to enable daemonization * start-balancer script has usage instead of hadoop_usage -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7257) Add the time of last HA state transition to NN's /jmx page
Charles Lamb created HDFS-7257: -- Summary: Add the time of last HA state transition to NN's /jmx page Key: HDFS-7257 URL: https://issues.apache.org/jira/browse/HDFS-7257 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor It would be useful to some monitoring apps to expose the last HA transition time in the NN's /jmx page. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7184) Allow data migration tool to run as a daemon
[ https://issues.apache.org/jira/browse/HDFS-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175245#comment-14175245 ] Benoy Antony commented on HDFS-7184: Thanks [~aw]. I'll commit this today. Allow data migration tool to run as a daemon Key: HDFS-7184 URL: https://issues.apache.org/jira/browse/HDFS-7184 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer mover, scripts Reporter: Benoy Antony Assignee: Benoy Antony Priority: Minor Attachments: HDFS-7184.patch, HDFS-7184.patch Just like balancer, it is sometimes required to run data migration tool in a daemon mode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7180) NFSv3 gateway frequently gets stuck
[ https://issues.apache.org/jira/browse/HDFS-7180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175267#comment-14175267 ] Brandon Li commented on HDFS-7180: -- Thanks for confirming the issue. The reordered write is an NFS client behavior, which we have no control. But we can throttle the data client data ingestion to gateway, which will be in the patch. NFSv3 gateway frequently gets stuck --- Key: HDFS-7180 URL: https://issues.apache.org/jira/browse/HDFS-7180 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.5.0 Environment: Linux, Fedora 19 x86-64 Reporter: Eric Zhiqiang Ma Assignee: Brandon Li Priority: Critical We are using Hadoop 2.5.0 (HDFS only) and start and mount the NFSv3 gateway on one node in the cluster to let users upload data with rsync. However, we find the NFSv3 daemon seems frequently get stuck while the HDFS seems working well. (hdfds dfs -ls and etc. works just well). The last stuck we found is after around 1 day running and several hundreds GBs of data uploaded. The NFSv3 daemon is started on one node and on the same node the NFS is mounted. From the node where the NFS is mounted: dmsg shows like this: [1859245.368108] nfs: server localhost not responding, still trying [1859245.368111] nfs: server localhost not responding, still trying [1859245.368115] nfs: server localhost not responding, still trying [1859245.368119] nfs: server localhost not responding, still trying [1859245.368123] nfs: server localhost not responding, still trying [1859245.368127] nfs: server localhost not responding, still trying [1859245.368131] nfs: server localhost not responding, still trying [1859245.368135] nfs: server localhost not responding, still trying [1859245.368138] nfs: server localhost not responding, still trying [1859245.368142] nfs: server localhost not responding, still trying [1859245.368146] nfs: server localhost not responding, still trying [1859245.368150] nfs: server localhost not responding, still trying [1859245.368153] nfs: server localhost not responding, still trying The mounted directory can not be `ls` and `df -hT` gets stuck too. The latest lines from the nfs3 log in the hadoop logs directory: 2014-10-02 05:43:20,452 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated user map size: 35 2014-10-02 05:43:20,461 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated group map size: 54 2014-10-02 05:44:40,374 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:44:40,732 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:46:06,535 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:46:26,075 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:47:56,420 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:48:56,477 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:51:46,750 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:53:23,809 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:53:24,508 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:55:57,334 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:57:07,428 INFO org.apache.hadoop.hdfs.nfs.nfs3.OpenFileCtx: Have to change stable write to unstable write:FILE_SYNC 2014-10-02 05:58:32,609 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Update cache now 2014-10-02 05:58:32,610 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Not doing static UID/GID mapping because '/etc/nfs.map' does not exist. 2014-10-02 05:58:32,620 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated user map size: 35 2014-10-02 05:58:32,628 INFO org.apache.hadoop.nfs.nfs3.IdUserGroup: Updated group map size: 54 2014-10-02 06:01:32,098 WARN org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 60062ms (threshold=3ms); ack: seqno: -2 status: SUCCESS status: ERROR downstreamAckTimeNanos: 0, targets: [10.0.3.172:50010, 10.0.3.176:50010] 2014-10-02 06:01:32,099 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block BP-1960069741-10.0.3.170-1410430543652:blk_1074363564_623643 java.io.IOException:
[jira] [Created] (HDFS-7258) CacheReplicationMonitor rescan schedule log should use DEBUG level instead of INFO level
Xiaoyu Yao created HDFS-7258: Summary: CacheReplicationMonitor rescan schedule log should use DEBUG level instead of INFO level Key: HDFS-7258 URL: https://issues.apache.org/jira/browse/HDFS-7258 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: Xiaoyu Yao Priority: Minor CacheReplicationMonitor rescan scheduler adds two INFO log entries every 30 seconds to HDSF NN log as shown below. This should be a DEBUG level log to avoid flooding the namenode log. 2014-10-17 07:52:30,265 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 3 milliseconds 2014-10-17 07:52:30,265 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s). 2014-10-17 07:53:00,265 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds 2014-10-17 07:53:00,266 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s). 2014-10-17 07:53:30,267 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds 2014-10-17 07:53:30,267 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s). 2014-10-17 07:54:00,267 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds 2014-10-17 07:54:00,268 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s). 2014-10-17 07:54:30,268 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds 2014-10-17 07:54:30,269 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s). 2014-10-17 07:55:00,269 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 3 milliseconds 2014-10-17 07:55:00,269 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s). 2014-10-17 07:55:30,268 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 3 milliseconds 2014-10-17 07:55:30,269 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s). 2014-10-17 07:56:00,269 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds 2014-10-17 07:56:00,270 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s). 2014-10-17 07:56:30,270 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds 2014-10-17 07:56:30,271 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s). 2014-10-17 07:57:00,271 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 3 milliseconds 2014-10-17 07:57:00,272 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s). 2014-10-17 07:57:30,271 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 3 milliseconds 2014-10-17 07:57:30,272 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s). 2014-10-17 07:58:00,271 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 3 milliseconds 2014-10-17 07:58:00,271 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s). 2014-10-17 07:58:30,271 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 3 milliseconds -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7258) CacheReplicationMonitor rescan schedule log should use DEBUG level instead of INFO level
[ https://issues.apache.org/jira/browse/HDFS-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-7258: - Description: CacheReplicationMonitor rescan scheduler adds two INFO log entries every 30 seconds to HDSF NN log as shown below. This should be a DEBUG level log to avoid flooding the namenode log. {code} 2014-10-17 07:52:30,265 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 3 milliseconds 2014-10-17 07:52:30,265 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s). 2014-10-17 07:53:00,265 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds 2014-10-17 07:53:00,266 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s). 2014-10-17 07:53:30,267 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds 2014-10-17 07:53:30,267 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s). 2014-10-17 07:54:00,267 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds 2014-10-17 07:54:00,268 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s). 2014-10-17 07:54:30,268 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds 2014-10-17 07:54:30,269 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s). 2014-10-17 07:55:00,269 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 3 milliseconds 2014-10-17 07:55:00,269 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s). 2014-10-17 07:55:30,268 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 3 milliseconds 2014-10-17 07:55:30,269 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s). 2014-10-17 07:56:00,269 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds 2014-10-17 07:56:00,270 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s). 2014-10-17 07:56:30,270 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds 2014-10-17 07:56:30,271 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s). 2014-10-17 07:57:00,271 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 3 milliseconds 2014-10-17 07:57:00,272 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s). 2014-10-17 07:57:30,271 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 3 milliseconds 2014-10-17 07:57:30,272 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s). 2014-10-17 07:58:00,271 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 3 milliseconds 2014-10-17 07:58:00,271 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s). 2014-10-17 07:58:30,271 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 3 milliseconds {/code} was: CacheReplicationMonitor rescan scheduler adds two INFO log entries every 30 seconds to HDSF NN log as shown below. This should be a DEBUG level log to avoid flooding the namenode log. 2014-10-17 07:52:30,265 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 3 milliseconds 2014-10-17 07:52:30,265 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s). 2014-10-17 07:53:00,265 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds 2014-10-17 07:53:00,266 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s). 2014-10-17 07:53:30,267 INFO
[jira] [Updated] (HDFS-7258) CacheReplicationMonitor rescan schedule log should use DEBUG level instead of INFO level
[ https://issues.apache.org/jira/browse/HDFS-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoyu Yao updated HDFS-7258: - Description: CacheReplicationMonitor rescan scheduler adds two INFO log entries every 30 seconds to HDSF NN log as shown below. This should be a DEBUG level log to avoid flooding the namenode log. 2014-10-17 07:52:30,265 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 3 milliseconds 2014-10-17 07:52:30,265 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s). 2014-10-17 07:53:00,265 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds 2014-10-17 07:53:00,266 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s). 2014-10-17 07:53:30,267 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds 2014-10-17 07:53:30,267 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s). 2014-10-17 07:54:00,267 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds 2014-10-17 07:54:00,268 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s). 2014-10-17 07:54:30,268 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds 2014-10-17 07:54:30,269 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s). 2014-10-17 07:55:00,269 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 3 milliseconds 2014-10-17 07:55:00,269 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s). 2014-10-17 07:55:30,268 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 3 milliseconds 2014-10-17 07:55:30,269 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s). 2014-10-17 07:56:00,269 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds 2014-10-17 07:56:00,270 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s). 2014-10-17 07:56:30,270 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds 2014-10-17 07:56:30,271 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s). 2014-10-17 07:57:00,271 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 3 milliseconds 2014-10-17 07:57:00,272 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s). 2014-10-17 07:57:30,271 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 3 milliseconds 2014-10-17 07:57:30,272 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s). 2014-10-17 07:58:00,271 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 3 milliseconds 2014-10-17 07:58:00,271 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s). 2014-10-17 07:58:30,271 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 3 milliseconds was: CacheReplicationMonitor rescan scheduler adds two INFO log entries every 30 seconds to HDSF NN log as shown below. This should be a DEBUG level log to avoid flooding the namenode log. {code} 2014-10-17 07:52:30,265 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 3 milliseconds 2014-10-17 07:52:30,265 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s). 2014-10-17 07:53:00,265 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds 2014-10-17 07:53:00,266 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s). 2014-10-17 07:53:30,267 INFO
[jira] [Commented] (HDFS-7225) Failed DataNode lookup can crash NameNode with NullPointerException
[ https://issues.apache.org/jira/browse/HDFS-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175327#comment-14175327 ] Zhe Zhang commented on HDFS-7225: - [~andrew.wang] Thanks for reviewing. bq. I also wonder why we have this inconsistency in the first place in InvalidateBlocks. Isn't a better fix to properly update InvalidateBlocks when the set of DNs changes? I believe this is how UnderReplicatedBlocks works, and I think generally we expect the internal state of the active NN to be consistent. The root cause of this inconsistency is similar to that of HDFS-6289, i.e., a DN restarts with a reformatted (or new) volume. Thus a different {{datanodeUuid}} is registered for an existing transfer address, replacing the existing entry in {{datanodeMap}}. Therefore when a block invalidation request is scheduled, the DN lookup (using an old {{datanodeUuid}}) could return null. Here's what I think this situation should be handled: # When a DN lookup ({{datanodeManager.getDatanode(dn)}}) returns null in {{invalidateWorkForOneNode}}, we know this DN has been registered with a newer {{datanodeUuid}}. Skip the block invalidation work on this DN for this time. #* We should keep the block invalidation work under the old {{datanodeUuid}} for a certain amount of time so they still have a chance to be executed if the old volume is attached and registered again in the future # Count the number of times that a DN has been skipped in block invalidation work. If it's above a certain threshold, clear all entries in {{InvalidateBlocks}} under this DN. #* In the very rare case where a volume comes back after a long time, we should just wipe out all blocks on it ([~andrew.wang] do you know if HDFS is already doing it?) # To reduce the chance of going into this situation in the first place, we should improve the ordering of executing invalidation and replication tasks (HDFS-7211) Failed DataNode lookup can crash NameNode with NullPointerException --- Key: HDFS-7225 URL: https://issues.apache.org/jira/browse/HDFS-7225 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-7225-v1.patch {{BlockManager#invalidateWorkForOneNode}} looks up a DataNode by the {{datanodeUuid}} and passes the resultant {{DatanodeDescriptor}} to {{InvalidateBlocks#invalidateWork}}. However, if a wrong or outdated {{datanodeUuid}} is used, a null pointer will be passed to {{invalidateWork}} which will use it to lookup in a {{TreeMap}}. Since the key type is {{DatanodeDescriptor}}, key comparison is based on the IP address. A null key will crash the NameNode with an NPE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-3107) HDFS truncate
[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-3107: -- Attachment: HDFS-3107.patch A more snapshot-friendly version of Plamen's patch. * Changed parameter types for some internal methods to accomodate snapshot changes. * Moved {{collectBlocksBeyondMax()}} into INodeFile in order to use it in and simplify {{unprotectedTruncate()}}. * added disk usage verification to the test. HDFS truncate - Key: HDFS-3107 URL: https://issues.apache.org/jira/browse/HDFS-3107 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Reporter: Lei Chang Assignee: Plamen Jeliazkov Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, editsStored, editsStored, editsStored.xml Original Estimate: 1,344h Remaining Estimate: 1,344h Systems with transaction support often need to undo changes made to the underlying storage when a transaction is aborted. Currently HDFS does not support truncate (a standard Posix operation) which is a reverse operation of append, which makes upper layer applications use ugly workarounds (such as keeping track of the discarded byte range per file in a separate metadata store, and periodically running a vacuum process to rewrite compacted files) to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7215) Add gc log to NFS gateway
[ https://issues.apache.org/jira/browse/HDFS-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-7215: - Attachment: HDFS-7215.001.patch Add gc log to NFS gateway - Key: HDFS-7215 URL: https://issues.apache.org/jira/browse/HDFS-7215 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-7215.001.patch Like NN/DN, a GC log would help debug issues in NFS gateway. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7215) Add JvmPauseMonitor to NFS gateway
[ https://issues.apache.org/jira/browse/HDFS-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-7215: - Priority: Minor (was: Major) Add JvmPauseMonitor to NFS gateway -- Key: HDFS-7215 URL: https://issues.apache.org/jira/browse/HDFS-7215 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Affects Versions: 2.2.0 Reporter: Brandon Li Assignee: Brandon Li Priority: Minor Attachments: HDFS-7215.001.patch Like NN/DN, a GC log would help debug issues in NFS gateway. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7215) Add gc log to NFS gateway
[ https://issues.apache.org/jira/browse/HDFS-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-7215: - Status: Patch Available (was: Reopened) Add gc log to NFS gateway - Key: HDFS-7215 URL: https://issues.apache.org/jira/browse/HDFS-7215 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-7215.001.patch Like NN/DN, a GC log would help debug issues in NFS gateway. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7215) Add gc log to NFS gateway
[ https://issues.apache.org/jira/browse/HDFS-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-7215: - Affects Version/s: 2.2.0 Add gc log to NFS gateway - Key: HDFS-7215 URL: https://issues.apache.org/jira/browse/HDFS-7215 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Affects Versions: 2.2.0 Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-7215.001.patch Like NN/DN, a GC log would help debug issues in NFS gateway. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7215) Add JvmPauseMonitor to NFS gateway
[ https://issues.apache.org/jira/browse/HDFS-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-7215: - Summary: Add JvmPauseMonitor to NFS gateway (was: Add gc log to NFS gateway) Add JvmPauseMonitor to NFS gateway -- Key: HDFS-7215 URL: https://issues.apache.org/jira/browse/HDFS-7215 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Affects Versions: 2.2.0 Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-7215.001.patch Like NN/DN, a GC log would help debug issues in NFS gateway. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7232) Populate hostname in httpfs audit log
[ https://issues.apache.org/jira/browse/HDFS-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoran Dimitrijevic updated HDFS-7232: - Status: Open (was: Patch Available) i'll resubmit the same patch since apache build system seems to generate unrelated issues. Populate hostname in httpfs audit log - Key: HDFS-7232 URL: https://issues.apache.org/jira/browse/HDFS-7232 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Zoran Dimitrijevic Assignee: Zoran Dimitrijevic Priority: Trivial Attachments: HDFS-7232.patch Currently httpfs audit logs do not log the request's IP address. Since they use hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/conf/httpfs-log4j.properties which already contains hostname, it would be nice to add code to populate it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7232) Populate hostname in httpfs audit log
[ https://issues.apache.org/jira/browse/HDFS-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoran Dimitrijevic updated HDFS-7232: - Status: Patch Available (was: Open) Populate hostname in httpfs audit log - Key: HDFS-7232 URL: https://issues.apache.org/jira/browse/HDFS-7232 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Zoran Dimitrijevic Assignee: Zoran Dimitrijevic Priority: Trivial Attachments: HDFS-7232.patch Currently httpfs audit logs do not log the request's IP address. Since they use hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/conf/httpfs-log4j.properties which already contains hostname, it would be nice to add code to populate it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7259) Unresponseive NFS mount point due to deferred COMMIT response
Brandon Li created HDFS-7259: Summary: Unresponseive NFS mount point due to deferred COMMIT response Key: HDFS-7259 URL: https://issues.apache.org/jira/browse/HDFS-7259 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.2.0 Reporter: Brandon Li Assignee: Brandon Li Since the gateway can't commit random write, it caches the COMMIT requests in a queue and send back response only when the data can be committed or stream timeout (failure in the latter case). This could cause problems two patterns: (1) file uploading failure (2) the mount dir is stuck on the same client, but other NFS clients can still access NFS gateway. The error pattern (2) is because there are too many COMMIT requests pending, so the NFS client can't send any other requests(e.g., for ls) to NFS gateway with its pending requests limit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7259) Unresponseive NFS mount point due to deferred COMMIT response
[ https://issues.apache.org/jira/browse/HDFS-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-7259: - Attachment: HDFS-7259.001.patch Unresponseive NFS mount point due to deferred COMMIT response - Key: HDFS-7259 URL: https://issues.apache.org/jira/browse/HDFS-7259 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.2.0 Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-7259.001.patch Since the gateway can't commit random write, it caches the COMMIT requests in a queue and send back response only when the data can be committed or stream timeout (failure in the latter case). This could cause problems two patterns: (1) file uploading failure (2) the mount dir is stuck on the same client, but other NFS clients can still access NFS gateway. The error pattern (2) is because there are too many COMMIT requests pending, so the NFS client can't send any other requests(e.g., for ls) to NFS gateway with its pending requests limit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7259) Unresponseive NFS mount point due to deferred COMMIT response
[ https://issues.apache.org/jira/browse/HDFS-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-7259: - Status: Patch Available (was: Open) Unresponseive NFS mount point due to deferred COMMIT response - Key: HDFS-7259 URL: https://issues.apache.org/jira/browse/HDFS-7259 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.2.0 Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-7259.001.patch Since the gateway can't commit random write, it caches the COMMIT requests in a queue and send back response only when the data can be committed or stream timeout (failure in the latter case). This could cause problems two patterns: (1) file uploading failure (2) the mount dir is stuck on the same client, but other NFS clients can still access NFS gateway. The error pattern (2) is because there are too many COMMIT requests pending, so the NFS client can't send any other requests(e.g., for ls) to NFS gateway with its pending requests limit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7259) Unresponseive NFS mount point due to deferred COMMIT response
[ https://issues.apache.org/jira/browse/HDFS-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175510#comment-14175510 ] Brandon Li commented on HDFS-7259: -- Uploaded a patch. The basic idea is to provide a configurable property nfs.large.file.upload. It's turned on by default: 1. if client asks to commit non-sequential trunk of data, NFS gateway return success with the hope that client will send the prerequisite writes. 2. if client asks to commit a sequential trunk(means it can be flushed to HDFS), NFS gateway return a special error NFS3ERR_JUKEBOX indicating the client needs to retry. Meanwhile, NFS gateway keeps flush data to HDFS and do sync eventually. The reason to let client wait is that, we want the client to wait for the last commit. Otherwise, client thinks file upload finished (e.g., cp command returns success) but NFS could be still flushing staged data to HDFS. However, we don't know which one is the last commit. We make the assumption that a commit after sequencial writes may be the last. Unresponseive NFS mount point due to deferred COMMIT response - Key: HDFS-7259 URL: https://issues.apache.org/jira/browse/HDFS-7259 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.2.0 Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-7259.001.patch Since the gateway can't commit random write, it caches the COMMIT requests in a queue and send back response only when the data can be committed or stream timeout (failure in the latter case). This could cause problems two patterns: (1) file uploading failure (2) the mount dir is stuck on the same client, but other NFS clients can still access NFS gateway. The error pattern (2) is because there are too many COMMIT requests pending, so the NFS client can't send any other requests(e.g., for ls) to NFS gateway with its pending requests limit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7259) Unresponseive NFS mount point due to deferred COMMIT response
[ https://issues.apache.org/jira/browse/HDFS-7259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175529#comment-14175529 ] Hadoop QA commented on HDFS-7259: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12675571/HDFS-7259.001.patch against trunk revision 209b169. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-nfs hadoop-hdfs-project/hadoop-hdfs-nfs: org.apache.hadoop.hdfs.nfs.nfs3.TestWrites {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8443//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8443//console This message is automatically generated. Unresponseive NFS mount point due to deferred COMMIT response - Key: HDFS-7259 URL: https://issues.apache.org/jira/browse/HDFS-7259 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.2.0 Reporter: Brandon Li Assignee: Brandon Li Attachments: HDFS-7259.001.patch Since the gateway can't commit random write, it caches the COMMIT requests in a queue and send back response only when the data can be committed or stream timeout (failure in the latter case). This could cause problems two patterns: (1) file uploading failure (2) the mount dir is stuck on the same client, but other NFS clients can still access NFS gateway. The error pattern (2) is because there are too many COMMIT requests pending, so the NFS client can't send any other requests(e.g., for ls) to NFS gateway with its pending requests limit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7260) Make DFSOutputStream.MAX_PACKETS configurable
Tsz Wo Nicholas Sze created HDFS-7260: - Summary: Make DFSOutputStream.MAX_PACKETS configurable Key: HDFS-7260 URL: https://issues.apache.org/jira/browse/HDFS-7260 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h7260_20141017.patch DFSOutputStream.MAX_PACKETS is hard coded to 80. In some case, a smaller value is preferred for reducing memory usage. Let's make it configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7260) Make DFSOutputStream.MAX_PACKETS configurable
[ https://issues.apache.org/jira/browse/HDFS-7260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7260: -- Attachment: h7260_20141017.patch h7260_20141017.patch: adds dfs.client.write.max-packets. Make DFSOutputStream.MAX_PACKETS configurable - Key: HDFS-7260 URL: https://issues.apache.org/jira/browse/HDFS-7260 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h7260_20141017.patch DFSOutputStream.MAX_PACKETS is hard coded to 80. In some case, a smaller value is preferred for reducing memory usage. Let's make it configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7261) storageMap is accessed without synchronization in DatanodeDescriptor#updateHeartbeatState()
Ted Yu created HDFS-7261: Summary: storageMap is accessed without synchronization in DatanodeDescriptor#updateHeartbeatState() Key: HDFS-7261 URL: https://issues.apache.org/jira/browse/HDFS-7261 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Priority: Minor Here is the code: {code} failedStorageInfos = new HashSetDatanodeStorageInfo( storageMap.values()); {code} In other places, the lock on DatanodeDescriptor.storageMap is held: {code} synchronized (storageMap) { final CollectionDatanodeStorageInfo storages = storageMap.values(); return storages.toArray(new DatanodeStorageInfo[storages.size()]); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7260) Make DFSOutputStream.MAX_PACKETS configurable
[ https://issues.apache.org/jira/browse/HDFS-7260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175539#comment-14175539 ] Jing Zhao commented on HDFS-7260: - +1 pending Jenkins. Make DFSOutputStream.MAX_PACKETS configurable - Key: HDFS-7260 URL: https://issues.apache.org/jira/browse/HDFS-7260 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h7260_20141017.patch DFSOutputStream.MAX_PACKETS is hard coded to 80. In some case, a smaller value is preferred for reducing memory usage. Let's make it configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7260) Make DFSOutputStream.MAX_PACKETS configurable
[ https://issues.apache.org/jira/browse/HDFS-7260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7260: -- Status: Patch Available (was: Open) Make DFSOutputStream.MAX_PACKETS configurable - Key: HDFS-7260 URL: https://issues.apache.org/jira/browse/HDFS-7260 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h7260_20141017.patch DFSOutputStream.MAX_PACKETS is hard coded to 80. In some case, a smaller value is preferred for reducing memory usage. Let's make it configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7262) WebHDFS should use proxy doAs support in DelegationTokenAuthenticationHandler for token-management calls
Vinod Kumar Vavilapalli created HDFS-7262: - Summary: WebHDFS should use proxy doAs support in DelegationTokenAuthenticationHandler for token-management calls Key: HDFS-7262 URL: https://issues.apache.org/jira/browse/HDFS-7262 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Reporter: Vinod Kumar Vavilapalli HADOOP-11207 adds support for proxy users to perform delegation-token management operations and YARN uses them. HDFS didn't need them because the doAs functionality is explicitly implemented in webHDFS (JspHelper.getUGI()) - this should be changed to use the common code to avoid duplication. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7232) Populate hostname in httpfs audit log
[ https://issues.apache.org/jira/browse/HDFS-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoran Dimitrijevic updated HDFS-7232: - Attachment: HDFS-7232.patch Populate hostname in httpfs audit log - Key: HDFS-7232 URL: https://issues.apache.org/jira/browse/HDFS-7232 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Zoran Dimitrijevic Assignee: Zoran Dimitrijevic Priority: Trivial Attachments: HDFS-7232.patch Currently httpfs audit logs do not log the request's IP address. Since they use hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/conf/httpfs-log4j.properties which already contains hostname, it would be nice to add code to populate it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7232) Populate hostname in httpfs audit log
[ https://issues.apache.org/jira/browse/HDFS-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoran Dimitrijevic updated HDFS-7232: - Status: Open (was: Patch Available) Populate hostname in httpfs audit log - Key: HDFS-7232 URL: https://issues.apache.org/jira/browse/HDFS-7232 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Zoran Dimitrijevic Assignee: Zoran Dimitrijevic Priority: Trivial Attachments: HDFS-7232.patch Currently httpfs audit logs do not log the request's IP address. Since they use hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/conf/httpfs-log4j.properties which already contains hostname, it would be nice to add code to populate it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7232) Populate hostname in httpfs audit log
[ https://issues.apache.org/jira/browse/HDFS-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoran Dimitrijevic updated HDFS-7232: - Attachment: (was: HDFS-7232.patch) Populate hostname in httpfs audit log - Key: HDFS-7232 URL: https://issues.apache.org/jira/browse/HDFS-7232 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Zoran Dimitrijevic Assignee: Zoran Dimitrijevic Priority: Trivial Attachments: HDFS-7232.patch Currently httpfs audit logs do not log the request's IP address. Since they use hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/conf/httpfs-log4j.properties which already contains hostname, it would be nice to add code to populate it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7232) Populate hostname in httpfs audit log
[ https://issues.apache.org/jira/browse/HDFS-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoran Dimitrijevic updated HDFS-7232: - Status: Patch Available (was: Open) Populate hostname in httpfs audit log - Key: HDFS-7232 URL: https://issues.apache.org/jira/browse/HDFS-7232 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Zoran Dimitrijevic Assignee: Zoran Dimitrijevic Priority: Trivial Attachments: HDFS-7232.patch Currently httpfs audit logs do not log the request's IP address. Since they use hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/conf/httpfs-log4j.properties which already contains hostname, it would be nice to add code to populate it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7215) Add JvmPauseMonitor to NFS gateway
[ https://issues.apache.org/jira/browse/HDFS-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175601#comment-14175601 ] Hadoop QA commented on HDFS-7215: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12675557/HDFS-7215.001.patch against trunk revision a6aa6e4. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-nfs hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-nfs: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing The following test timeouts occurred in hadoop-common-project/hadoop-nfs hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-nfs: org.apache.hadoop.hdfs.TestFileAppend2 {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8442//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8442//console This message is automatically generated. Add JvmPauseMonitor to NFS gateway -- Key: HDFS-7215 URL: https://issues.apache.org/jira/browse/HDFS-7215 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Affects Versions: 2.2.0 Reporter: Brandon Li Assignee: Brandon Li Priority: Minor Attachments: HDFS-7215.001.patch Like NN/DN, a GC log would help debug issues in NFS gateway. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7215) Add JvmPauseMonitor to NFS gateway
[ https://issues.apache.org/jira/browse/HDFS-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175603#comment-14175603 ] Jing Zhao commented on HDFS-7215: - Instead of starting the monitor in RpcProgramNfs3, it may be better to start it in Nfs3/Nfs3Base. Other than this the patch looks good to me. Add JvmPauseMonitor to NFS gateway -- Key: HDFS-7215 URL: https://issues.apache.org/jira/browse/HDFS-7215 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Affects Versions: 2.2.0 Reporter: Brandon Li Assignee: Brandon Li Priority: Minor Attachments: HDFS-7215.001.patch Like NN/DN, a GC log would help debug issues in NFS gateway. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7232) Populate hostname in httpfs audit log
[ https://issues.apache.org/jira/browse/HDFS-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175606#comment-14175606 ] Hadoop QA commented on HDFS-7232: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12675587/HDFS-7232.patch against trunk revision 209b169. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8445//console This message is automatically generated. Populate hostname in httpfs audit log - Key: HDFS-7232 URL: https://issues.apache.org/jira/browse/HDFS-7232 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0 Reporter: Zoran Dimitrijevic Assignee: Zoran Dimitrijevic Priority: Trivial Attachments: HDFS-7232.patch Currently httpfs audit logs do not log the request's IP address. Since they use hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/conf/httpfs-log4j.properties which already contains hostname, it would be nice to add code to populate it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7260) Make DFSOutputStream.MAX_PACKETS configurable
[ https://issues.apache.org/jira/browse/HDFS-7260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175605#comment-14175605 ] Hadoop QA commented on HDFS-7260: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12675576/h7260_20141017.patch against trunk revision 209b169. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 javac{color}. The applied patch generated 1265 javac compiler warnings (more than the trunk's current 1 warnings). {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 9 warning messages. See https://builds.apache.org/job/PreCommit-HDFS-Build/8444//artifact/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs-httpfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8444//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/8444//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8444//console This message is automatically generated. Make DFSOutputStream.MAX_PACKETS configurable - Key: HDFS-7260 URL: https://issues.apache.org/jira/browse/HDFS-7260 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h7260_20141017.patch DFSOutputStream.MAX_PACKETS is hard coded to 80. In some case, a smaller value is preferred for reducing memory usage. Let's make it configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7263) Snapshot read of an appended file returns more bytes than the file length.
Konstantin Shvachko created HDFS-7263: - Summary: Snapshot read of an appended file returns more bytes than the file length. Key: HDFS-7263 URL: https://issues.apache.org/jira/browse/HDFS-7263 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.5.0 Reporter: Konstantin Shvachko The following sequence of steps will produce extra bytes, that should not be visible, because they are not in the snapshot. * Create a file of size L, where {{L % blockSize != 0}}. * Create a snapshot * Append bytes to the file * Read file in the snapshot (not the current file) * You will see the bytes are read beoynd the original file size L -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7263) Snapshot read of an appended file returns more bytes than the file length.
[ https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-7263: -- Attachment: TestSnapshotRead.java Here is the test that fails with current implementation. {{DFSInputStream}} should be fixed to take into account the actual file length and not try to read beyond it. {code} - realLen = (int) Math.min(realLen, locatedBlocks.getFileLength()); + realLen = (int) Math.min(realLen, locatedBlocks.getFileLength() - pos); {code} Snapshot read of an appended file returns more bytes than the file length. -- Key: HDFS-7263 URL: https://issues.apache.org/jira/browse/HDFS-7263 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.5.0 Reporter: Konstantin Shvachko Attachments: TestSnapshotRead.java The following sequence of steps will produce extra bytes, that should not be visible, because they are not in the snapshot. * Create a file of size L, where {{L % blockSize != 0}}. * Create a snapshot * Append bytes to the file * Read file in the snapshot (not the current file) * You will see the bytes are read beoynd the original file size L -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7263) Snapshot read of an appended file returns more bytes than the file length.
[ https://issues.apache.org/jira/browse/HDFS-7263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Luo reassigned HDFS-7263: - Assignee: Tao Luo Snapshot read of an appended file returns more bytes than the file length. -- Key: HDFS-7263 URL: https://issues.apache.org/jira/browse/HDFS-7263 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.5.0 Reporter: Konstantin Shvachko Assignee: Tao Luo Attachments: TestSnapshotRead.java The following sequence of steps will produce extra bytes, that should not be visible, because they are not in the snapshot. * Create a file of size L, where {{L % blockSize != 0}}. * Create a snapshot * Append bytes to the file * Read file in the snapshot (not the current file) * You will see the bytes are read beoynd the original file size L -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7215) Add JvmPauseMonitor to NFS gateway
[ https://issues.apache.org/jira/browse/HDFS-7215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175676#comment-14175676 ] Brandon Li commented on HDFS-7215: -- Starting in RpcProgramNfs3 is because it has a shutdown hook which removes RPC registration from portmap and can shutdown other related daemons. Add JvmPauseMonitor to NFS gateway -- Key: HDFS-7215 URL: https://issues.apache.org/jira/browse/HDFS-7215 Project: Hadoop HDFS Issue Type: Improvement Components: nfs Affects Versions: 2.2.0 Reporter: Brandon Li Assignee: Brandon Li Priority: Minor Attachments: HDFS-7215.001.patch Like NN/DN, a GC log would help debug issues in NFS gateway. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7226) TestDNFencing.testQueueingWithAppend failed often in latest test
[ https://issues.apache.org/jira/browse/HDFS-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-7226: Attachment: HDFS-7226.002.patch TestDNFencing.testQueueingWithAppend failed often in latest test Key: HDFS-7226 URL: https://issues.apache.org/jira/browse/HDFS-7226 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-7226.001.patch, HDFS-7226.002.patch Using tool from HADOOP-11045, got the following report: {code} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j PreCommit-HDFS-Build -n 1 Recently FAILED builds in url: https://builds.apache.org//job/PreCommit-HDFS-Build THERE ARE 9 builds (out of 9) that have failed tests in the past 1 days, as listed below: .. Among 9 runs examined, all failed tests #failedRuns: testName: 7: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend 6: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress 3: org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testFailedOpen 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testSyncBatching .. {code} TestDNFencingWithReplication.testFencingStress was reported as HDFS-7221. Creating this jira for TestDNFencing.testQueueingWithAppend. Symptom: {code} Failed org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend Failing for the past 1 build (Since Failed#8390 ) Took 2.9 sec. Error Message expected:18 but was:12 Stacktrace java.lang.AssertionError: expected:18 but was:12 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend(TestDNFencing.java:448) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7226) TestDNFencing.testQueueingWithAppend failed often in latest test
[ https://issues.apache.org/jira/browse/HDFS-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-7226: Attachment: (was: HDFS-7226.002.patch) TestDNFencing.testQueueingWithAppend failed often in latest test Key: HDFS-7226 URL: https://issues.apache.org/jira/browse/HDFS-7226 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-7226.001.patch Using tool from HADOOP-11045, got the following report: {code} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j PreCommit-HDFS-Build -n 1 Recently FAILED builds in url: https://builds.apache.org//job/PreCommit-HDFS-Build THERE ARE 9 builds (out of 9) that have failed tests in the past 1 days, as listed below: .. Among 9 runs examined, all failed tests #failedRuns: testName: 7: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend 6: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress 3: org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testFailedOpen 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testSyncBatching .. {code} TestDNFencingWithReplication.testFencingStress was reported as HDFS-7221. Creating this jira for TestDNFencing.testQueueingWithAppend. Symptom: {code} Failed org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend Failing for the past 1 build (Since Failed#8390 ) Took 2.9 sec. Error Message expected:18 but was:12 Stacktrace java.lang.AssertionError: expected:18 but was:12 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend(TestDNFencing.java:448) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7226) TestDNFencing.testQueueingWithAppend failed often in latest test
[ https://issues.apache.org/jira/browse/HDFS-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-7226: Attachment: HDFS-7226.002.patch TestDNFencing.testQueueingWithAppend failed often in latest test Key: HDFS-7226 URL: https://issues.apache.org/jira/browse/HDFS-7226 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-7226.001.patch, HDFS-7226.002.patch Using tool from HADOOP-11045, got the following report: {code} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j PreCommit-HDFS-Build -n 1 Recently FAILED builds in url: https://builds.apache.org//job/PreCommit-HDFS-Build THERE ARE 9 builds (out of 9) that have failed tests in the past 1 days, as listed below: .. Among 9 runs examined, all failed tests #failedRuns: testName: 7: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend 6: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress 3: org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testFailedOpen 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testSyncBatching .. {code} TestDNFencingWithReplication.testFencingStress was reported as HDFS-7221. Creating this jira for TestDNFencing.testQueueingWithAppend. Symptom: {code} Failed org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend Failing for the past 1 build (Since Failed#8390 ) Took 2.9 sec. Error Message expected:18 but was:12 Stacktrace java.lang.AssertionError: expected:18 but was:12 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend(TestDNFencing.java:448) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6581) Write to single replica in memory
[ https://issues.apache.org/jira/browse/HDFS-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-6581: --- Fix Version/s: (was: 3.0.0) 2.6.0 Write to single replica in memory - Key: HDFS-6581 URL: https://issues.apache.org/jira/browse/HDFS-6581 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, hdfs-client, namenode Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 2.6.0 Attachments: HDFS-6581.merge.01.patch, HDFS-6581.merge.02.patch, HDFS-6581.merge.03.patch, HDFS-6581.merge.04.patch, HDFS-6581.merge.05.patch, HDFS-6581.merge.06.patch, HDFS-6581.merge.07.patch, HDFS-6581.merge.08.patch, HDFS-6581.merge.09.patch, HDFS-6581.merge.10.patch, HDFS-6581.merge.11.patch, HDFS-6581.merge.12.patch, HDFS-6581.merge.14.patch, HDFS-6581.merge.15.patch, HDFSWriteableReplicasInMemory.pdf, Test-Plan-for-HDFS-6581-Memory-Storage.pdf, Test-Plan-for-HDFS-6581-Memory-Storage.pdf Per discussion with the community on HDFS-5851, we will implement writing to a single replica in DN memory via DataTransferProtocol. This avoids some of the issues with short-circuit writes, which we can revisit at a later time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7226) TestDNFencing.testQueueingWithAppend failed often in latest test
[ https://issues.apache.org/jira/browse/HDFS-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175730#comment-14175730 ] Yongjun Zhang commented on HDFS-7226: - HI [~jingzhao], Thanks for your earlier review comments. I looked at it further and I just submitted patch rev 002. * I confirmed that the extra 3 messages are because of the heartbeat incurred by the {{triggerBlockReportForTests}} call. So I incorporate these 3 in the expected number of messages in the new rev; * In addition, I found that the closeStream() call in the second block of code demonstrates the problem in a different way, because it doesn't call hflush as the first block does . I fixed this issue in two different ways, and made 3 cases in the same test. See the detailed comment in the patch code. Would you please help taking a look again? thanks a lot. TestDNFencing.testQueueingWithAppend failed often in latest test Key: HDFS-7226 URL: https://issues.apache.org/jira/browse/HDFS-7226 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-7226.001.patch, HDFS-7226.002.patch Using tool from HADOOP-11045, got the following report: {code} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j PreCommit-HDFS-Build -n 1 Recently FAILED builds in url: https://builds.apache.org//job/PreCommit-HDFS-Build THERE ARE 9 builds (out of 9) that have failed tests in the past 1 days, as listed below: .. Among 9 runs examined, all failed tests #failedRuns: testName: 7: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend 6: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress 3: org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testFailedOpen 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testSyncBatching .. {code} TestDNFencingWithReplication.testFencingStress was reported as HDFS-7221. Creating this jira for TestDNFencing.testQueueingWithAppend. Symptom: {code} Failed org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend Failing for the past 1 build (Since Failed#8390 ) Took 2.9 sec. Error Message expected:18 but was:12 Stacktrace java.lang.AssertionError: expected:18 but was:12 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend(TestDNFencing.java:448) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7221) TestDNFencingWithReplication fails consistently
[ https://issues.apache.org/jira/browse/HDFS-7221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175737#comment-14175737 ] Yongjun Zhang commented on HDFS-7221: - HI [~mingma], thanks a lot for your input, it's very helpful! TestDNFencingWithReplication fails consistently --- Key: HDFS-7221 URL: https://issues.apache.org/jira/browse/HDFS-7221 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 2.6.0 Reporter: Charles Lamb Assignee: Charles Lamb Priority: Minor Attachments: HDFS-7221.001.patch, HDFS-7221.002.patch TestDNFencingWithReplication consistently fails with a timeout, both in jenkins runs and on my local machine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7264) Tha last datanode in a pipeline should send a heartbeat when there is no traffic
Tsz Wo Nicholas Sze created HDFS-7264: - Summary: Tha last datanode in a pipeline should send a heartbeat when there is no traffic Key: HDFS-7264 URL: https://issues.apache.org/jira/browse/HDFS-7264 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze When the client is writing slowly, the client will send a heartbeat to signal that the connection is still alive. This case works fine. However, when a client is writing fast but some of the datanodes in the pipeline are busy, a PacketResponder may get a timeout since no ack is sent from the upstream datanode. We suggest that the last datanode in a pipeline should send a heartbeat when there is no traffic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7264) Tha last datanode in a pipeline should send a heartbeat when there is no traffic
[ https://issues.apache.org/jira/browse/HDFS-7264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7264: -- Attachment: h7264_20141017.patch h7264_20141017.patch: change - last datanode in a pipeline to send heartbeat; and - the other datanodes in the pipeline to just forward heartbeat without checking the ack queue. Tha last datanode in a pipeline should send a heartbeat when there is no traffic Key: HDFS-7264 URL: https://issues.apache.org/jira/browse/HDFS-7264 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h7264_20141017.patch When the client is writing slowly, the client will send a heartbeat to signal that the connection is still alive. This case works fine. However, when a client is writing fast but some of the datanodes in the pipeline are busy, a PacketResponder may get a timeout since no ack is sent from the upstream datanode. We suggest that the last datanode in a pipeline should send a heartbeat when there is no traffic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7264) Tha last datanode in a pipeline should send a heartbeat when there is no traffic
[ https://issues.apache.org/jira/browse/HDFS-7264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7264: -- Status: Patch Available (was: Open) Tha last datanode in a pipeline should send a heartbeat when there is no traffic Key: HDFS-7264 URL: https://issues.apache.org/jira/browse/HDFS-7264 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h7264_20141017.patch When the client is writing slowly, the client will send a heartbeat to signal that the connection is still alive. This case works fine. However, when a client is writing fast but some of the datanodes in the pipeline are busy, a PacketResponder may get a timeout since no ack is sent from the upstream datanode. We suggest that the last datanode in a pipeline should send a heartbeat when there is no traffic. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7226) TestDNFencing.testQueueingWithAppend failed often in latest test
[ https://issues.apache.org/jira/browse/HDFS-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175774#comment-14175774 ] Hadoop QA commented on HDFS-7226: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12675624/HDFS-7226.002.patch against trunk revision c3de241. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The test build failed in hadoop-hdfs-project/hadoop-hdfs {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8448//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8448//console This message is automatically generated. TestDNFencing.testQueueingWithAppend failed often in latest test Key: HDFS-7226 URL: https://issues.apache.org/jira/browse/HDFS-7226 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.6.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-7226.001.patch, HDFS-7226.002.patch Using tool from HADOOP-11045, got the following report: {code} [yzhang@localhost jenkinsftf]$ ./determine-flaky-tests-hadoop.py -j PreCommit-HDFS-Build -n 1 Recently FAILED builds in url: https://builds.apache.org//job/PreCommit-HDFS-Build THERE ARE 9 builds (out of 9) that have failed tests in the past 1 days, as listed below: .. Among 9 runs examined, all failed tests #failedRuns: testName: 7: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend 6: org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication.testFencingStress 3: org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot.testOpenFilesWithMultipleSnapshots 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testFailedOpen 1: org.apache.hadoop.hdfs.server.namenode.TestEditLog.testSyncBatching .. {code} TestDNFencingWithReplication.testFencingStress was reported as HDFS-7221. Creating this jira for TestDNFencing.testQueueingWithAppend. Symptom: {code} Failed org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend Failing for the past 1 build (Since Failed#8390 ) Took 2.9 sec. Error Message expected:18 but was:12 Stacktrace java.lang.AssertionError: expected:18 but was:12 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing.testQueueingWithAppend(TestDNFencing.java:448) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7260) Make DFSOutputStream.MAX_PACKETS configurable
[ https://issues.apache.org/jira/browse/HDFS-7260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175775#comment-14175775 ] Hadoop QA commented on HDFS-7260: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12675576/h7260_20141017.patch against trunk revision 209b169. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestLargeDirectoryDelete org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication org.apache.hadoop.hdfs.server.namenode.ha.TestDNFencing {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8446//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8446//console This message is automatically generated. Make DFSOutputStream.MAX_PACKETS configurable - Key: HDFS-7260 URL: https://issues.apache.org/jira/browse/HDFS-7260 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h7260_20141017.patch DFSOutputStream.MAX_PACKETS is hard coded to 80. In some case, a smaller value is preferred for reducing memory usage. Let's make it configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7260) Make DFSOutputStream.MAX_PACKETS configurable
[ https://issues.apache.org/jira/browse/HDFS-7260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175783#comment-14175783 ] Tsz Wo Nicholas Sze commented on HDFS-7260: --- The failed tests are not related to this. I did not add any new test since the change is obvious. Make DFSOutputStream.MAX_PACKETS configurable - Key: HDFS-7260 URL: https://issues.apache.org/jira/browse/HDFS-7260 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h7260_20141017.patch DFSOutputStream.MAX_PACKETS is hard coded to 80. In some case, a smaller value is preferred for reducing memory usage. Let's make it configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7260) Make DFSOutputStream.MAX_PACKETS configurable
[ https://issues.apache.org/jira/browse/HDFS-7260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175785#comment-14175785 ] Hudson commented on HDFS-7260: -- SUCCESS: Integrated in Hadoop-trunk-Commit #6286 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6286/]) HDFS-7260. Change DFSOutputStream.MAX_PACKETS to be configurable. (szetszwo: rev 2e140523d3ccb27809cde4a55e95f7e0006c028f) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java Make DFSOutputStream.MAX_PACKETS configurable - Key: HDFS-7260 URL: https://issues.apache.org/jira/browse/HDFS-7260 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h7260_20141017.patch DFSOutputStream.MAX_PACKETS is hard coded to 80. In some case, a smaller value is preferred for reducing memory usage. Let's make it configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6658) Namenode memory optimization - Block replicas list
[ https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175788#comment-14175788 ] Konstantin Shvachko commented on HDFS-6658: --- Pretty neat data structure, Amir. Coud be an improvement to the current structure, introduced way back in HADOOP-1687. With BitSet you will need about 12K of contiquous space in RAM for every 100,000 block report. Sounds reasonable. The only concern is that removing large number of files, which is typically done when NN gets close to its capacity, does not free memory used by the removed replicas. It can be reused for new references, but not anything else. Unless some type of garbage collector is introduced. Would be interesting to see how it behaves on a cluster over time. Namenode memory optimization - Block replicas list --- Key: HDFS-6658 URL: https://issues.apache.org/jira/browse/HDFS-6658 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.4.1 Reporter: Amir Langer Assignee: Amir Langer Attachments: BlockListOptimizationComparison.xlsx, HDFS-6658.patch, Namenode Memory Optimizations - Block replicas list.docx Part of the memory consumed by every BlockInfo object in the Namenode is a linked list of block references for every DatanodeStorageInfo (called triplets). We propose to change the way we store the list in memory. Using primitive integer indexes instead of object references will reduce the memory needed for every block replica (when compressed oops is disabled) and in our new design the list overhead will be per DatanodeStorageInfo and not per block replica. see attached design doc. for details and evaluation results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6921) Add LazyPersist flag to FileStatus
[ https://issues.apache.org/jira/browse/HDFS-6921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-6921: --- Fix Version/s: (was: 3.0.0) 2.6.0 Add LazyPersist flag to FileStatus -- Key: HDFS-6921 URL: https://issues.apache.org/jira/browse/HDFS-6921 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: HDFS-6581 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 2.6.0 Attachments: HDFS-6921.01.patch, HDFS-6921.02.patch A new flag will be added to FileStatus to indicate that a file can be lazily persisted to disk i.e. trading reduced durability for better write performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6923) Propagate LazyPersist flag to DNs via DataTransferProtocol
[ https://issues.apache.org/jira/browse/HDFS-6923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-6923: --- Fix Version/s: (was: 3.0.0) 2.6.0 Propagate LazyPersist flag to DNs via DataTransferProtocol -- Key: HDFS-6923 URL: https://issues.apache.org/jira/browse/HDFS-6923 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: HDFS-6581 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 2.6.0 Attachments: HDFS-6923.01.patch, HDFS-6923.02.patch If the LazyPersist flag is set in the file properties, the DFSClient will propagate it to the DataNode via DataTransferProtocol. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7260) Make DFSOutputStream.MAX_PACKETS configurable
[ https://issues.apache.org/jira/browse/HDFS-7260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-7260: -- Resolution: Fixed Fix Version/s: 2.6.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thank Jing for reviewing the patch. I have committed this. Make DFSOutputStream.MAX_PACKETS configurable - Key: HDFS-7260 URL: https://issues.apache.org/jira/browse/HDFS-7260 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Fix For: 2.6.0 Attachments: h7260_20141017.patch DFSOutputStream.MAX_PACKETS is hard coded to 80. In some case, a smaller value is preferred for reducing memory usage. Let's make it configurable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6924) Add new RAM_DISK storage type
[ https://issues.apache.org/jira/browse/HDFS-6924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-6924: --- Fix Version/s: (was: 3.0.0) 2.6.0 Add new RAM_DISK storage type - Key: HDFS-6924 URL: https://issues.apache.org/jira/browse/HDFS-6924 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: HDFS-6581 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 2.6.0 Attachments: HDFS-6924.01.patch Add a new RAM_DISK storage type which could be backed by tmpfs/ramfs on Linux or alternative RAM disk on other platforms. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6925) DataNode should attempt to place replicas on transient storage first if lazyPersist flag is received
[ https://issues.apache.org/jira/browse/HDFS-6925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-6925: --- Fix Version/s: (was: 3.0.0) 2.6.0 DataNode should attempt to place replicas on transient storage first if lazyPersist flag is received Key: HDFS-6925 URL: https://issues.apache.org/jira/browse/HDFS-6925 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: HDFS-6581 Environment: If the LazyPersist flag is received via DataTransferProtocol then DN should attempt to place the files on RAM disk first, and failing that on regular disk. Support for lazily moving replicas from RAM disk to persistent storage will be added later. Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 2.6.0 Attachments: HDFS-6925.01.patch, HDFS-6925.02.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6926) DN support for saving replicas to persistent storage and evicting in-memory replicas
[ https://issues.apache.org/jira/browse/HDFS-6926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-6926: --- Fix Version/s: (was: 3.0.0) 2.6.0 DN support for saving replicas to persistent storage and evicting in-memory replicas Key: HDFS-6926 URL: https://issues.apache.org/jira/browse/HDFS-6926 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: HDFS-6581 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 2.6.0 Attachments: HDFS-6926.01.patch Add the following: # A lazy writer on the DN to move replicas from RAM disk to persistent storage. # 'Evict' persisted replicas from RAM disk to make space for new blocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6927) Add unit tests
[ https://issues.apache.org/jira/browse/HDFS-6927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-6927: --- Fix Version/s: (was: 3.0.0) 2.6.0 Add unit tests -- Key: HDFS-6927 URL: https://issues.apache.org/jira/browse/HDFS-6927 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: HDFS-6581 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 2.6.0 Attachments: HDFS-6927.01.patch Add a bunch of unit tests to cover flag persistence, propagation to DN, ability to write replicas to RAM disk, lazy writes to disk and eviction from RAM disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6929) NN periodically unlinks lazy persist files with missing replicas from namespace
[ https://issues.apache.org/jira/browse/HDFS-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-6929: --- Fix Version/s: (was: 3.0.0) 2.6.0 NN periodically unlinks lazy persist files with missing replicas from namespace --- Key: HDFS-6929 URL: https://issues.apache.org/jira/browse/HDFS-6929 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 2.6.0 Attachments: HDFS-6929.01.patch, HDFS-6929.02.patch Occasional data loss is expected when using the lazy persist flag due to node restarts. The NN will optionally unlink lazy persist files from the namespace to avoid them from showing up as corrupt files. This behavior can be turned off with a global option. In the future this may be made a per-file option controllable by the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6960) Bugfix in LazyWriter, fix test case and some refactoring
[ https://issues.apache.org/jira/browse/HDFS-6960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-6960: --- Fix Version/s: (was: 3.0.0) 2.6.0 Bugfix in LazyWriter, fix test case and some refactoring Key: HDFS-6960 URL: https://issues.apache.org/jira/browse/HDFS-6960 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, test Affects Versions: HDFS-6581 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 2.6.0 Attachments: HDFS-6960.01.patch, HDFS-6960.02.patch LazyWriter has a bug. While saving the replica to disk we would save it under {{current/lazyPersist/}}. Instead it should be saved under the appropriate subdirectory e.g. {{current/lazyPersist/subdir1/subdir0/}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6928) 'hdfs put' command should accept lazyPersist flag for testing
[ https://issues.apache.org/jira/browse/HDFS-6928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-6928: --- Fix Version/s: (was: 3.0.0) 2.6.0 'hdfs put' command should accept lazyPersist flag for testing - Key: HDFS-6928 URL: https://issues.apache.org/jira/browse/HDFS-6928 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: HDFS-6581 Reporter: Tassapol Athiapinya Assignee: Arpit Agarwal Fix For: 2.6.0 Attachments: HDFS-6928.01.patch, HDFS-6928.02.patch, HDFS-6928.03.patch Add a '-l' flag to 'hdfs put' which creates the file with the LAZY_PERSIST option. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6931) Move lazily persisted replicas to finalized directory on DN startup
[ https://issues.apache.org/jira/browse/HDFS-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-6931: --- Fix Version/s: (was: 3.0.0) 2.6.0 Move lazily persisted replicas to finalized directory on DN startup --- Key: HDFS-6931 URL: https://issues.apache.org/jira/browse/HDFS-6931 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 2.6.0 Attachments: HDFS-6931.01.patch On restart the DN should move replicas from the {{current/lazyPersist/}} directory to {{current/finalized}}. Duplicate replicas of the same block should be deleted from RAM disk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6950) Add Additional unit tests for HDFS-6581
[ https://issues.apache.org/jira/browse/HDFS-6950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-6950: --- Fix Version/s: (was: 3.0.0) 2.6.0 Add Additional unit tests for HDFS-6581 --- Key: HDFS-6950 URL: https://issues.apache.org/jira/browse/HDFS-6950 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Fix For: 2.6.0 Attachments: HDFS-6950.0.patch, HDFS-6950.1.patch, HDFS-6950.2.patch Create additional unit tests for HDFS-6581 in addition to existing ones in HDFS-6927. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6930) Improve replica eviction from RAM disk
[ https://issues.apache.org/jira/browse/HDFS-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-6930: --- Fix Version/s: (was: 3.0.0) 2.6.0 Improve replica eviction from RAM disk -- Key: HDFS-6930 URL: https://issues.apache.org/jira/browse/HDFS-6930 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: HDFS-6581 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 2.6.0 Attachments: HDFS-6930.01.patch, HDFS-6930.02.patch The current replica eviction scheme is inefficient since it performs multiple file operations in the context of block allocation. A better implementation would be asynchronous eviction when free space on RAM disk falls below a low watermark to make block allocation faster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6991) Notify NN of evicted block before deleting it from RAM disk
[ https://issues.apache.org/jira/browse/HDFS-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-6991: --- Fix Version/s: (was: 3.0.0) 2.6.0 Notify NN of evicted block before deleting it from RAM disk --- Key: HDFS-6991 URL: https://issues.apache.org/jira/browse/HDFS-6991 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Affects Versions: HDFS-6581 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 2.6.0 Attachments: HDFS-6991.01.patch, HDFS-6991.02.patch, HDFS-6991.03.patch Couple of bug fixes required around eviction: # When evicting a block from RAM disk to persistent storage, the DN should schedule an incremental block report for a 'received' replica on persistent storage. # {{BlockManager.processReportedBlock}} needs a fix to correctly update the storage ID to reflect the block moving from RAM_DISK to DISK. Found by [~xyao] via HDFS-6950. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6978) Directory scanner should correctly reconcile blocks on RAM disk
[ https://issues.apache.org/jira/browse/HDFS-6978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-6978: --- Fix Version/s: (was: 3.0.0) 2.6.0 Directory scanner should correctly reconcile blocks on RAM disk --- Key: HDFS-6978 URL: https://issues.apache.org/jira/browse/HDFS-6978 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: HDFS-6581 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 2.6.0 Attachments: HDFS-6978.01.patch, HDFS-6978.02.patch It used to be very unlikely that the directory scanner encountered two replicas of the same block on different volumes. With memory storage, it is very likely to hit this with the following sequence of events: # Block is written to RAM disk # Lazy writer saves a copy on persistent volume # DN attempts to evict the original replica from RAM disk, file deletion fails as the replica is in use. # Directory scanner finds a replica on both RAM disk and persistent storage. The directory scanner should never delete the block on persistent storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6977) Delete all copies when a block is deleted from the block space
[ https://issues.apache.org/jira/browse/HDFS-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-6977: --- Fix Version/s: (was: 3.0.0) 2.6.0 Delete all copies when a block is deleted from the block space -- Key: HDFS-6977 URL: https://issues.apache.org/jira/browse/HDFS-6977 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: HDFS-6581 Reporter: Nathan Yao Assignee: Arpit Agarwal Fix For: 2.6.0 Attachments: HDFS-6977.01.patch, HDFS-6977.02.patch, HDFS-6977.03.patch When a block is deleted from RAM disk we should also delete the copies written to lazyPersist/. Reported by [~xyao] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7066) LazyWriter#evictBlocks misses a null check for replicaState
[ https://issues.apache.org/jira/browse/HDFS-7066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-7066: --- Fix Version/s: (was: 3.0.0) 2.6.0 LazyWriter#evictBlocks misses a null check for replicaState --- Key: HDFS-7066 URL: https://issues.apache.org/jira/browse/HDFS-7066 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: HDFS-6581 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Priority: Minor Fix For: 2.6.0 Attachments: HDFS-7066.0.patch LazyWriter#evictBlocks (added for HDFS-6581) misses a null check for replicaState. As a result, there are many NPEs in the debug log under certain conditions. {code} 2014-09-15 14:27:10,820 DEBUG impl.FsDatasetImpl (FsDatasetImpl.java:evictBlocks(2335)) - Evicting block null 2014-09-15 14:27:10,821 WARN impl.FsDatasetImpl (FsDatasetImpl.java:run(2409)) - Ignoring exception in LazyWriter: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl$LazyWriter.evictBlocks(FsDatasetImpl.java:2343) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl$LazyWriter.run(FsDatasetImpl.java:2396) at java.lang.Thread.run(Thread.java:745) {code} The proposed fix is to break if there is no candidate available to evict. {code} while (iterations++ MAX_BLOCK_EVICTIONS_PER_ITERATION transientFreeSpaceBelowThreshold()) { LazyWriteReplicaTracker.ReplicaState replicaState = lazyWriteReplicaTracker.getNextCandidateForEviction(); if (replicaState == null) { break; } if (LOG.isDebugEnabled()) { LOG.debug(Evicting block + replicaState); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7064) Fix unit test failures in HDFS-6581 branch
[ https://issues.apache.org/jira/browse/HDFS-7064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-7064: --- Fix Version/s: (was: 3.0.0) 2.6.0 Fix unit test failures in HDFS-6581 branch -- Key: HDFS-7064 URL: https://issues.apache.org/jira/browse/HDFS-7064 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: HDFS-6581 Reporter: Arpit Agarwal Assignee: Xiaoyu Yao Fix For: 2.6.0 Attachments: HDFS-7064.0.patch, HDFS-7064.1.patch, HDFS-7064.2.patch Fix test failures in the HDFS-6581 feature branch. Jenkins flagged the following failures. https://builds.apache.org/job/PreCommit-HDFS-Build/8025//testReport/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7079) Few more unit test fixes for HDFS-6581
[ https://issues.apache.org/jira/browse/HDFS-7079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-7079: --- Fix Version/s: (was: 3.0.0) 2.6.0 Few more unit test fixes for HDFS-6581 -- Key: HDFS-7079 URL: https://issues.apache.org/jira/browse/HDFS-7079 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: HDFS-6581 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 2.6.0 Attachments: HDFS-7079.03.patch, HDFS-7079.04.patch Fix a few more test cases flagged by Jenkins: # TestFsShellCopy # TestCopy -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7080) Fix finalize and upgrade unit test failures
[ https://issues.apache.org/jira/browse/HDFS-7080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-7080: --- Fix Version/s: (was: 3.0.0) 2.6.0 Fix finalize and upgrade unit test failures --- Key: HDFS-7080 URL: https://issues.apache.org/jira/browse/HDFS-7080 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: HDFS-6581 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 2.6.0 Attachments: HDFS-7080.01.patch, HDFS-7080.02.patch Fix following test failures in the branch: # TestDFSFinalize # TestDFSUpgrade -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7084) FsDatasetImpl#copyBlockFiles debug log can be improved
[ https://issues.apache.org/jira/browse/HDFS-7084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-7084: --- Fix Version/s: (was: 3.0.0) 2.6.0 FsDatasetImpl#copyBlockFiles debug log can be improved -- Key: HDFS-7084 URL: https://issues.apache.org/jira/browse/HDFS-7084 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: HDFS-6581 Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Priority: Minor Fix For: 2.6.0 Attachments: HDFS-7084.0.patch addBlock: Moved should be replaced with Copied or lazyPersistReplica : Copied to avoid confusion. {code} static File[] copyBlockFiles(long blockId, long genStamp, File srcMeta, File srcFile, File destRoot) { ... if (LOG.isDebugEnabled()) { LOG.debug(addBlock: Moved + srcMeta + to + dstMeta); LOG.debug(addBlock: Moved + srcFile + to + dstFile); } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7108) Fix unit test failures in SimulatedFsDataset
[ https://issues.apache.org/jira/browse/HDFS-7108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-7108: --- Fix Version/s: (was: 3.0.0) 2.6.0 Fix unit test failures in SimulatedFsDataset Key: HDFS-7108 URL: https://issues.apache.org/jira/browse/HDFS-7108 Project: Hadoop HDFS Issue Type: Sub-task Components: test Affects Versions: HDFS-6581 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 2.6.0 Attachments: HDFS-7108.01.patch HDFS-7100 introduced a few unit test failures due to UnsupportedOperationException in {{SimulatedFsDataset.getVolume}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7091) Add forwarding constructor for INodeFile for existing callers
[ https://issues.apache.org/jira/browse/HDFS-7091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-7091: --- Fix Version/s: (was: 3.0.0) 2.6.0 Add forwarding constructor for INodeFile for existing callers - Key: HDFS-7091 URL: https://issues.apache.org/jira/browse/HDFS-7091 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode, test Affects Versions: HDFS-6581 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Priority: Minor Fix For: 2.6.0 Attachments: HDFS-7091.01.patch Since HDFS-6584 is in trunk we are hitting quite a few merge conflicts. Many of the conflicts can be avoided by some minor updates to the branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7100) Make eviction scheme pluggable
[ https://issues.apache.org/jira/browse/HDFS-7100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-7100: --- Fix Version/s: (was: 3.0.0) 2.6.0 Make eviction scheme pluggable -- Key: HDFS-7100 URL: https://issues.apache.org/jira/browse/HDFS-7100 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: HDFS-6581 Reporter: Arpit Agarwal Assignee: Arpit Agarwal Fix For: 2.6.0 Attachments: HDFS-7100.01.patch We can make the eviction scheme pluggable to help evaluate multiple schemes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6990) Add unit test for evict/delete RAM_DISK block with open handle
[ https://issues.apache.org/jira/browse/HDFS-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HDFS-6990: --- Fix Version/s: (was: 3.0.0) 2.6.0 Add unit test for evict/delete RAM_DISK block with open handle -- Key: HDFS-6990 URL: https://issues.apache.org/jira/browse/HDFS-6990 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Fix For: 2.6.0 Attachments: HDFS-6990.0.patch, HDFS-6990.1.patch, HDFS-6990.2.patch, HDFS-6990.3.patch This is to verify: * Evict RAM_DISK block with open handle should fall back to DISK. * Delete RAM_DISK block (persisted) with open handle should mark the block to be deleted upon handle close. Simply open handle to file in DFS name space won't work as expected. We need a local FS file handle to the block file. The only meaningful case is for Short Circuit Read. This JIRA is to validate/enable the two cases with SCR enabled MiniDFSCluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)