[jira] [Updated] (HDFS-6982) nntop: top-like tool for name node users
[ https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated HDFS-6982: -- Status: Patch Available (was: Open) > nntop: top-like tool for name node users > - > > Key: HDFS-6982 > URL: https://issues.apache.org/jira/browse/HDFS-6982 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Maysam Yabandeh >Assignee: Maysam Yabandeh > Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, HDFS-6982.v3.patch, > HDFS-6982.v4.patch, HDFS-6982.v5.patch, HDFS-6982.v6.patch, > HDFS-6982.v7.patch, nntop-design-v1.pdf > > > In this jira we motivate the need for nntop, a tool that, similarly to what > top does in Linux, gives the list of top users of the HDFS name node and > gives insight about which users are sending majority of each traffic type to > the name node. This information turns out to be the most critical when the > name node is under pressure and the HDFS admin needs to know which user is > hammering the name node and with what kind of requests. Here we present the > design of nntop which has been in production at Twitter in the past 10 > months. nntop proved to have low cpu overhead (< 2% in a cluster of 4K > nodes), low memory footprint (less than a few MB), and quite efficient for > the write path (only two hash lookup for updating a metric). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6982) nntop: top-like tool for name node users
[ https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated HDFS-6982: -- Status: Open (was: Patch Available) > nntop: top-like tool for name node users > - > > Key: HDFS-6982 > URL: https://issues.apache.org/jira/browse/HDFS-6982 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Maysam Yabandeh >Assignee: Maysam Yabandeh > Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, HDFS-6982.v3.patch, > HDFS-6982.v4.patch, HDFS-6982.v5.patch, HDFS-6982.v6.patch, > HDFS-6982.v7.patch, nntop-design-v1.pdf > > > In this jira we motivate the need for nntop, a tool that, similarly to what > top does in Linux, gives the list of top users of the HDFS name node and > gives insight about which users are sending majority of each traffic type to > the name node. This information turns out to be the most critical when the > name node is under pressure and the HDFS admin needs to know which user is > hammering the name node and with what kind of requests. Here we present the > design of nntop which has been in production at Twitter in the past 10 > months. nntop proved to have low cpu overhead (< 2% in a cluster of 4K > nodes), low memory footprint (less than a few MB), and quite efficient for > the write path (only two hash lookup for updating a metric). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7385) ThreadLocal used in FSEditLog class lead FSImage permission mess up
[ https://issues.apache.org/jira/browse/HDFS-7385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209383#comment-14209383 ] Vinayakumar B commented on HDFS-7385: - Hi [~jiangyu1211], Good find. Instead of {{op.setAclEntries(null)}} and {{op.setXAttrs(null)}}, how about introducing a {{reset()}} method in both {{AddOp}} and {{MkdirOp}} which will reset all values to null. This method can be called as soon as get from the ThreadLocal cache and later setters can set the value whatever they want. This will avoid, any such mistakes in future. Ex: {code} MkdirOp op = MkdirOp.getInstance(cache.get()) .reset() .setInodeId(newNode.getId()) .setPath(path) .setTimestamp(newNode.getModificationTime()) .setPermissionStatus(permissions); {code} > ThreadLocal used in FSEditLog class lead FSImage permission mess up > > > Key: HDFS-7385 > URL: https://issues.apache.org/jira/browse/HDFS-7385 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0, 2.5.0 >Reporter: jiangyu >Assignee: jiangyu >Priority: Critical > Attachments: HDFS-7385.patch > > > We migrated our NameNodes from low configuration to high configuration > machines last week. Firstly,we imported the current directory including > fsimage and editlog files from original ActiveNameNode to new ActiveNameNode > and started the New NameNode, then changed the configuration of all > datanodes and restarted all of datanodes , then blockreport to new NameNodes > at once and send heartbeat after that. >Everything seemed perfect, but after we restarted Resoucemanager , > most of the users compained that their jobs couldn't be executed for the > reason of permission problem. > We applied Acls in our clusters, and after migrated we found most of > the directories and files which were not set Acls before now had the > properties of Acls. That is the reason why users could not execute their > jobs.So we had to change most of the files permission to a+r and directories > permission to a+rx to make sure the jobs can be executed. > After searching this problem for some days, i found there is a bug in > FSEditLog.java. The ThreadLocal variable cache in FSEditLog don’t set the > proper value in logMkdir and logOpenFile functions. Here is the code of > logMkdir: > public void logMkDir(String path, INode newNode) { > PermissionStatus permissions = newNode.getPermissionStatus(); > MkdirOp op = MkdirOp.getInstance(cache.get()) > .setInodeId(newNode.getId()) > .setPath(path) > .setTimestamp(newNode.getModificationTime()) > .setPermissionStatus(permissions); > AclFeature f = newNode.getAclFeature(); > if (f != null) { > op.setAclEntries(AclStorage.readINodeLogicalAcl(newNode)); > } > logEdit(op); > } > For example, if we mkdir with Acls through one handler(Thread indeed), > we set the AclEntries to the op from the cache. After that, if we mkdir > without any Acls setting and set through the same handler, the AclEnties from > the cache is the same with the last one which set the Acls, and because the > newNode have no AclFeature, we don’t have any chance to change it. Then the > editlog is wrong,record the wrong Acls. After the Standby load the editlogs > from journalnodes and apply them to memory in SNN then savenamespace and > transfer the wrong fsimage to ANN, all the fsimages get wrong. The only > solution is to save namespace from ANN and you can get the right fsimage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7386) Replace check "port number < 1024" with shared isPrivilegedPort method
[ https://issues.apache.org/jira/browse/HDFS-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209378#comment-14209378 ] Yongjun Zhang commented on HDFS-7386: - Many thanks Chris! I just uploaded rev 002 to address both of your comments. Really appreciate your help on reviewing the patch. I searched the code base for 1024 but not 1023:-) Very nice that you pointed out the place I missed. > Replace check "port number < 1024" with shared isPrivilegedPort method > --- > > Key: HDFS-7386 > URL: https://issues.apache.org/jira/browse/HDFS-7386 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Trivial > Attachments: HDFS-7386.001.patch, HDFS-7386.002.patch > > > Per discussion in HDFS-7382, I'm filing this jira as a follow-up, to replace > check "port number < 1024" with shared isPrivilegedPort method. > Thanks [~cnauroth] for the work on HDFS-7382 and suggestion there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7386) Replace check "port number < 1024" with shared isPrivilegedPort method
[ https://issues.apache.org/jira/browse/HDFS-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-7386: Attachment: HDFS-7386.002.patch > Replace check "port number < 1024" with shared isPrivilegedPort method > --- > > Key: HDFS-7386 > URL: https://issues.apache.org/jira/browse/HDFS-7386 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Trivial > Attachments: HDFS-7386.001.patch, HDFS-7386.002.patch > > > Per discussion in HDFS-7382, I'm filing this jira as a follow-up, to replace > check "port number < 1024" with shared isPrivilegedPort method. > Thanks [~cnauroth] for the work on HDFS-7382 and suggestion there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7392) org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever
[ https://issues.apache.org/jira/browse/HDFS-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209299#comment-14209299 ] Yi Liu commented on HDFS-7392: -- [~vacekf], I can't get exact issue from your description, do you get issue in real environment? If so, you can write the repro steps in comments. > org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever > - > > Key: HDFS-7392 > URL: https://issues.apache.org/jira/browse/HDFS-7392 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Frantisek Vacek >Priority: Critical > Attachments: 1.png, 2.png > > > In some specific circumstances, > org.apache.hadoop.hdfs.DistributedFileSystem.open(invalid URI) never timeouts > and last forever. > What are specific circumstances: > 1) HDFS URI (hdfs://share.merck.com:8020/someDir/someFile.txt) should point > to valid IP address but without name node service running on it. > 2) There should be at least 2 IP addresses for such a URI. See output below: > {quote} > [~/proj/quickbox]$ nslookup share.merck.com > Server: 127.0.1.1 > Address:127.0.1.1#53 > share.merck.com canonical name = > internal-gicprg-share-merck-com-1538706884.us-east-1.elb.amazonaws.com. > Name: internal-gicprg-share-merck-com-1538706884.us-east-1.elb.amazonaws.com > Address: 54.40.29.223 > Name: internal-gicprg-share-merck-com-1538706884.us-east-1.elb.amazonaws.com > Address: 54.40.29.65 > {quote} > In such a case the org.apache.hadoop.ipc.Client.Connection.updateAddress() > returns sometimes true (even if address didn't actually changed see img. 1) > and the timeoutFailures counter is set to 0 (see img. 2). The > maxRetriesOnSocketTimeouts (45) is never reached and connection attempt is > repeated forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7056) Snapshot support for truncate
[ https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Plamen Jeliazkov updated HDFS-7056: --- Attachment: HDFS-3107-HDFS-7056-combined.patch > Snapshot support for truncate > - > > Key: HDFS-7056 > URL: https://issues.apache.org/jira/browse/HDFS-7056 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.0.0 >Reporter: Konstantin Shvachko >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107-HDFS-7056-combined.patch, > HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, > HDFS-3107-HDFS-7056-combined.patch, HDFS-7056.patch, HDFS-7056.patch, > HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, > HDFSSnapshotWithTruncateDesign.docx > > > Implementation of truncate in HDFS-3107 does not allow truncating files which > are in a snapshot. It is desirable to be able to truncate and still keep the > old file state of the file in the snapshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7056) Snapshot support for truncate
[ https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Plamen Jeliazkov updated HDFS-7056: --- Attachment: (was: HDFS-3107-HDFS-7056-combined.patch) > Snapshot support for truncate > - > > Key: HDFS-7056 > URL: https://issues.apache.org/jira/browse/HDFS-7056 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.0.0 >Reporter: Konstantin Shvachko >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107-HDFS-7056-combined.patch, > HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, > HDFS-3107-HDFS-7056-combined.patch, HDFS-7056.patch, HDFS-7056.patch, > HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, > HDFSSnapshotWithTruncateDesign.docx > > > Implementation of truncate in HDFS-3107 does not allow truncating files which > are in a snapshot. It is desirable to be able to truncate and still keep the > old file state of the file in the snapshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7056) Snapshot support for truncate
[ https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209290#comment-14209290 ] Plamen Jeliazkov commented on HDFS-7056: Seems Jenkins grabbed the HDFS-7056 patch. I will re-attach the combined patch so bot can run. > Snapshot support for truncate > - > > Key: HDFS-7056 > URL: https://issues.apache.org/jira/browse/HDFS-7056 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.0.0 >Reporter: Konstantin Shvachko >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107-HDFS-7056-combined.patch, > HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, > HDFS-3107-HDFS-7056-combined.patch, HDFS-7056.patch, HDFS-7056.patch, > HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, > HDFSSnapshotWithTruncateDesign.docx > > > Implementation of truncate in HDFS-3107 does not allow truncating files which > are in a snapshot. It is desirable to be able to truncate and still keep the > old file state of the file in the snapshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6938) Cleanup javac warnings in FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209237#comment-14209237 ] Haohui Mai commented on HDFS-6938: -- Turned out that the patch is missing in 2.6. I've cherry-picked the patch into branch-2. > Cleanup javac warnings in FSNamesystem > -- > > Key: HDFS-6938 > URL: https://issues.apache.org/jira/browse/HDFS-6938 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Charles Lamb >Assignee: Charles Lamb >Priority: Trivial > Fix For: 2.7.0 > > Attachments: HDFS-6938.001.patch > > > Clean up some unused code/compiler warnings post fs-encryption merge. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6938) Cleanup javac warnings in FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-6938: - Affects Version/s: (was: 3.0.0) > Cleanup javac warnings in FSNamesystem > -- > > Key: HDFS-6938 > URL: https://issues.apache.org/jira/browse/HDFS-6938 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Charles Lamb >Assignee: Charles Lamb >Priority: Trivial > Fix For: 2.7.0 > > Attachments: HDFS-6938.001.patch > > > Clean up some unused code/compiler warnings post fs-encryption merge. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6938) Cleanup javac warnings in FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-6938: - Target Version/s: (was: 3.0.0) > Cleanup javac warnings in FSNamesystem > -- > > Key: HDFS-6938 > URL: https://issues.apache.org/jira/browse/HDFS-6938 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: Charles Lamb >Assignee: Charles Lamb >Priority: Trivial > Fix For: 2.7.0 > > Attachments: HDFS-6938.001.patch > > > Clean up some unused code/compiler warnings post fs-encryption merge. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6938) Cleanup javac warnings in FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-6938: - Fix Version/s: (was: 2.6.0) 2.7.0 > Cleanup javac warnings in FSNamesystem > -- > > Key: HDFS-6938 > URL: https://issues.apache.org/jira/browse/HDFS-6938 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 3.0.0 >Reporter: Charles Lamb >Assignee: Charles Lamb >Priority: Trivial > Fix For: 2.7.0 > > Attachments: HDFS-6938.001.patch > > > Clean up some unused code/compiler warnings post fs-encryption merge. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7386) Replace check "port number < 1024" with shared isPrivilegedPort method
[ https://issues.apache.org/jira/browse/HDFS-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209227#comment-14209227 ] Chris Nauroth commented on HDFS-7386: - Thanks for the patch, Yongjun. This looks good. Here are just a few comments: # Let's JavaDoc the new {{SecurityUtil#isPrivilegedPort}} method. # There is one more place that we can use this new method, in {{SecureDataNodeStarter#getSecureResources}}. In this case, you'll want to negate the return value of {{SecurityUtil#isPrivilegedPort}}. > Replace check "port number < 1024" with shared isPrivilegedPort method > --- > > Key: HDFS-7386 > URL: https://issues.apache.org/jira/browse/HDFS-7386 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Trivial > Attachments: HDFS-7386.001.patch > > > Per discussion in HDFS-7382, I'm filing this jira as a follow-up, to replace > check "port number < 1024" with shared isPrivilegedPort method. > Thanks [~cnauroth] for the work on HDFS-7382 and suggestion there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7385) ThreadLocal used in FSEditLog class lead FSImage permission mess up
[ https://issues.apache.org/jira/browse/HDFS-7385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209172#comment-14209172 ] Yi Liu commented on HDFS-7385: -- [~jiangyu1211], {{OP_ADD}} is for create/append file, although you see the name "logOpenFile" Please add the test case as soon as possible, I will help to review and try to push it into 2.6, since I think the issue is critical, although the fix is easy. > ThreadLocal used in FSEditLog class lead FSImage permission mess up > > > Key: HDFS-7385 > URL: https://issues.apache.org/jira/browse/HDFS-7385 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0, 2.5.0 >Reporter: jiangyu >Assignee: jiangyu > Attachments: HDFS-7385.patch > > > We migrated our NameNodes from low configuration to high configuration > machines last week. Firstly,we imported the current directory including > fsimage and editlog files from original ActiveNameNode to new ActiveNameNode > and started the New NameNode, then changed the configuration of all > datanodes and restarted all of datanodes , then blockreport to new NameNodes > at once and send heartbeat after that. >Everything seemed perfect, but after we restarted Resoucemanager , > most of the users compained that their jobs couldn't be executed for the > reason of permission problem. > We applied Acls in our clusters, and after migrated we found most of > the directories and files which were not set Acls before now had the > properties of Acls. That is the reason why users could not execute their > jobs.So we had to change most of the files permission to a+r and directories > permission to a+rx to make sure the jobs can be executed. > After searching this problem for some days, i found there is a bug in > FSEditLog.java. The ThreadLocal variable cache in FSEditLog don’t set the > proper value in logMkdir and logOpenFile functions. Here is the code of > logMkdir: > public void logMkDir(String path, INode newNode) { > PermissionStatus permissions = newNode.getPermissionStatus(); > MkdirOp op = MkdirOp.getInstance(cache.get()) > .setInodeId(newNode.getId()) > .setPath(path) > .setTimestamp(newNode.getModificationTime()) > .setPermissionStatus(permissions); > AclFeature f = newNode.getAclFeature(); > if (f != null) { > op.setAclEntries(AclStorage.readINodeLogicalAcl(newNode)); > } > logEdit(op); > } > For example, if we mkdir with Acls through one handler(Thread indeed), > we set the AclEntries to the op from the cache. After that, if we mkdir > without any Acls setting and set through the same handler, the AclEnties from > the cache is the same with the last one which set the Acls, and because the > newNode have no AclFeature, we don’t have any chance to change it. Then the > editlog is wrong,record the wrong Acls. After the Standby load the editlogs > from journalnodes and apply them to memory in SNN then savenamespace and > transfer the wrong fsimage to ANN, all the fsimages get wrong. The only > solution is to save namespace from ANN and you can get the right fsimage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6982) nntop: top-like tool for name node users
[ https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maysam Yabandeh updated HDFS-6982: -- Attachment: HDFS-6982.v7.patch [~andrew.wang], submitting the new patch revised based on your last comments. A couple of explanations: bq. Seems like 30min would be a more human-friendly number than 25min also The idea was to increase the periods in exponential manner: 5^0,5^1, 5^2 bq. It also seems like an unnecessary step to also have to specify the TopAuditLogger in the conf, if a user already specified dfs.namenode.top.periods.min. If there are periods set, let's just also create the TopAuditLogger. I am inclined towards redundantly specifying the audit logger in the conf. I think it would also avoid confusion for future readers if we spell out the registered audit loggers. > nntop: top-like tool for name node users > - > > Key: HDFS-6982 > URL: https://issues.apache.org/jira/browse/HDFS-6982 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Maysam Yabandeh >Assignee: Maysam Yabandeh > Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, HDFS-6982.v3.patch, > HDFS-6982.v4.patch, HDFS-6982.v5.patch, HDFS-6982.v6.patch, > HDFS-6982.v7.patch, nntop-design-v1.pdf > > > In this jira we motivate the need for nntop, a tool that, similarly to what > top does in Linux, gives the list of top users of the HDFS name node and > gives insight about which users are sending majority of each traffic type to > the name node. This information turns out to be the most critical when the > name node is under pressure and the HDFS admin needs to know which user is > hammering the name node and with what kind of requests. Here we present the > design of nntop which has been in production at Twitter in the past 10 > months. nntop proved to have low cpu overhead (< 2% in a cluster of 4K > nodes), low memory footprint (less than a few MB), and quite efficient for > the write path (only two hash lookup for updating a metric). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6982) nntop: top-like tool for name node users
[ https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209205#comment-14209205 ] Hadoop QA commented on HDFS-6982: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12681232/HDFS-6982.v7.patch against trunk revision 9f0319b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8727//console This message is automatically generated. > nntop: top-like tool for name node users > - > > Key: HDFS-6982 > URL: https://issues.apache.org/jira/browse/HDFS-6982 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Maysam Yabandeh >Assignee: Maysam Yabandeh > Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, HDFS-6982.v3.patch, > HDFS-6982.v4.patch, HDFS-6982.v5.patch, HDFS-6982.v6.patch, > HDFS-6982.v7.patch, nntop-design-v1.pdf > > > In this jira we motivate the need for nntop, a tool that, similarly to what > top does in Linux, gives the list of top users of the HDFS name node and > gives insight about which users are sending majority of each traffic type to > the name node. This information turns out to be the most critical when the > name node is under pressure and the HDFS admin needs to know which user is > hammering the name node and with what kind of requests. Here we present the > design of nntop which has been in production at Twitter in the past 10 > months. nntop proved to have low cpu overhead (< 2% in a cluster of 4K > nodes), low memory footprint (less than a few MB), and quite efficient for > the write path (only two hash lookup for updating a metric). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7386) Replace check "port number < 1024" with shared isPrivilegedPort method
[ https://issues.apache.org/jira/browse/HDFS-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209204#comment-14209204 ] Hadoop QA commented on HDFS-7386: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12681184/HDFS-7386.001.patch against trunk revision d7150a1. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestDFSUpgradeFromImage {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8725//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8725//console This message is automatically generated. > Replace check "port number < 1024" with shared isPrivilegedPort method > --- > > Key: HDFS-7386 > URL: https://issues.apache.org/jira/browse/HDFS-7386 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Trivial > Attachments: HDFS-7386.001.patch > > > Per discussion in HDFS-7382, I'm filing this jira as a follow-up, to replace > check "port number < 1024" with shared isPrivilegedPort method. > Thanks [~cnauroth] for the work on HDFS-7382 and suggestion there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7345) Local Reconstruction Codes (LRC)
[ https://issues.apache.org/jira/browse/HDFS-7345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng updated HDFS-7345: Description: HDFS-7285 proposes to support Erasure Coding inside HDFS, supports multiple Erasure Coding codecs via pluggable framework and implements Reed Solomon code by default. This is to support a more advanced coding mechanism, Local Reconstruction Codes (LRC). As discussed in the paper (https://www.usenix.org/system/files/conference/atc12/atc12-final181_0.pdf), LRC reduces the number of erasure coding fragments that need to be read when reconstructing data fragments that are offline, while still keeping the storage overhead low. The important benefits of LRC are that it reduces the bandwidth and I/Os required for repair reads over prior codes, while still allowing a significant reduction in storage overhead. The implementation would also consider how to distribute the calculating of local and global parity blocks to other relevant DataNodes. (was: HDFS-7285 proposes to support Erasure Coding inside HDFS, supports multiple Erasure Coding codecs via pluggable framework and implements Reed Solomon code by default. This is to support a more advanced coding mechanism, Local Reconstruction Codes (LRC). As discussed in the paper (https://www.usenix.org/system/files/conference/atc12/atc12-final181_0.pdf), LRC reduces the number of erasure coding fragments that need to be read when reconstructing data fragments that are offline, while still keeping the storage overhead low. The important benefits of LRC are that it reduces the bandwidth and I/Os required for repair reads over prior codes, while still allowing a significant reduction in storage overhead. Intel ISA library also supports LRC in its update and can also be leveraged. The implementation would also consider how to distribute the calculating of local and global parity blocks to other relevant DataNodes.) > Local Reconstruction Codes (LRC) > > > Key: HDFS-7345 > URL: https://issues.apache.org/jira/browse/HDFS-7345 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Kai Zheng > > HDFS-7285 proposes to support Erasure Coding inside HDFS, supports multiple > Erasure Coding codecs via pluggable framework and implements Reed Solomon > code by default. This is to support a more advanced coding mechanism, Local > Reconstruction Codes (LRC). As discussed in the paper > (https://www.usenix.org/system/files/conference/atc12/atc12-final181_0.pdf), > LRC reduces the number of erasure coding fragments that need to be read when > reconstructing data fragments that are offline, while still keeping the > storage overhead low. The important benefits of LRC are that it reduces the > bandwidth and I/Os required for repair reads over prior codes, while still > allowing a significant reduction in storage overhead. The implementation > would also consider how to distribute the calculating of local and global > parity blocks to other relevant DataNodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7385) ThreadLocal used in FSEditLog class lead FSImage permission mess up
[ https://issues.apache.org/jira/browse/HDFS-7385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-7385: - Priority: Critical (was: Major) > ThreadLocal used in FSEditLog class lead FSImage permission mess up > > > Key: HDFS-7385 > URL: https://issues.apache.org/jira/browse/HDFS-7385 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0, 2.5.0 >Reporter: jiangyu >Assignee: jiangyu >Priority: Critical > Attachments: HDFS-7385.patch > > > We migrated our NameNodes from low configuration to high configuration > machines last week. Firstly,we imported the current directory including > fsimage and editlog files from original ActiveNameNode to new ActiveNameNode > and started the New NameNode, then changed the configuration of all > datanodes and restarted all of datanodes , then blockreport to new NameNodes > at once and send heartbeat after that. >Everything seemed perfect, but after we restarted Resoucemanager , > most of the users compained that their jobs couldn't be executed for the > reason of permission problem. > We applied Acls in our clusters, and after migrated we found most of > the directories and files which were not set Acls before now had the > properties of Acls. That is the reason why users could not execute their > jobs.So we had to change most of the files permission to a+r and directories > permission to a+rx to make sure the jobs can be executed. > After searching this problem for some days, i found there is a bug in > FSEditLog.java. The ThreadLocal variable cache in FSEditLog don’t set the > proper value in logMkdir and logOpenFile functions. Here is the code of > logMkdir: > public void logMkDir(String path, INode newNode) { > PermissionStatus permissions = newNode.getPermissionStatus(); > MkdirOp op = MkdirOp.getInstance(cache.get()) > .setInodeId(newNode.getId()) > .setPath(path) > .setTimestamp(newNode.getModificationTime()) > .setPermissionStatus(permissions); > AclFeature f = newNode.getAclFeature(); > if (f != null) { > op.setAclEntries(AclStorage.readINodeLogicalAcl(newNode)); > } > logEdit(op); > } > For example, if we mkdir with Acls through one handler(Thread indeed), > we set the AclEntries to the op from the cache. After that, if we mkdir > without any Acls setting and set through the same handler, the AclEnties from > the cache is the same with the last one which set the Acls, and because the > newNode have no AclFeature, we don’t have any chance to change it. Then the > editlog is wrong,record the wrong Acls. After the Standby load the editlogs > from journalnodes and apply them to memory in SNN then savenamespace and > transfer the wrong fsimage to ANN, all the fsimages get wrong. The only > solution is to save namespace from ANN and you can get the right fsimage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7391) Renable SSLv2Hello in HttpFS
[ https://issues.apache.org/jira/browse/HDFS-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arun C Murthy updated HDFS-7391: Resolution: Fixed Fix Version/s: 2.6.0 Status: Resolved (was: Patch Available) I just committed this to branch-2.6. Thanks [~rkanter]! [~kasha] - I wasn't sure if you wanted this in branch-2.5, so I haven't cherry-picked it. Please do so if you want to. Thanks. > Renable SSLv2Hello in HttpFS > > > Key: HDFS-7391 > URL: https://issues.apache.org/jira/browse/HDFS-7391 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 2.6.0, 2.5.2 >Reporter: Robert Kanter >Assignee: Robert Kanter >Priority: Blocker > Fix For: 2.6.0 > > Attachments: HDFS-7391-branch-2.5.patch, HDFS-7391.patch > > > We should re-enable "SSLv2Hello", which is required for older clients (e.g. > Java 6 with openssl 0.9.8x) so they can't connect without it. Just to be > clear, it does not mean SSLv2, which is insecure. > I couldn't simply do an addendum patch on HDFS-7274 because it's already been > closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7385) ThreadLocal used in FSEditLog class lead FSImage permission mess up
[ https://issues.apache.org/jira/browse/HDFS-7385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209162#comment-14209162 ] jiangyu commented on HDFS-7385: --- [~hitliuyi], it also occur when open files, the same reason of using the ThreadLocal variable cache as mkdir . I will add test case later on. > ThreadLocal used in FSEditLog class lead FSImage permission mess up > > > Key: HDFS-7385 > URL: https://issues.apache.org/jira/browse/HDFS-7385 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0, 2.5.0 >Reporter: jiangyu >Assignee: jiangyu > Attachments: HDFS-7385.patch > > > We migrated our NameNodes from low configuration to high configuration > machines last week. Firstly,we imported the current directory including > fsimage and editlog files from original ActiveNameNode to new ActiveNameNode > and started the New NameNode, then changed the configuration of all > datanodes and restarted all of datanodes , then blockreport to new NameNodes > at once and send heartbeat after that. >Everything seemed perfect, but after we restarted Resoucemanager , > most of the users compained that their jobs couldn't be executed for the > reason of permission problem. > We applied Acls in our clusters, and after migrated we found most of > the directories and files which were not set Acls before now had the > properties of Acls. That is the reason why users could not execute their > jobs.So we had to change most of the files permission to a+r and directories > permission to a+rx to make sure the jobs can be executed. > After searching this problem for some days, i found there is a bug in > FSEditLog.java. The ThreadLocal variable cache in FSEditLog don’t set the > proper value in logMkdir and logOpenFile functions. Here is the code of > logMkdir: > public void logMkDir(String path, INode newNode) { > PermissionStatus permissions = newNode.getPermissionStatus(); > MkdirOp op = MkdirOp.getInstance(cache.get()) > .setInodeId(newNode.getId()) > .setPath(path) > .setTimestamp(newNode.getModificationTime()) > .setPermissionStatus(permissions); > AclFeature f = newNode.getAclFeature(); > if (f != null) { > op.setAclEntries(AclStorage.readINodeLogicalAcl(newNode)); > } > logEdit(op); > } > For example, if we mkdir with Acls through one handler(Thread indeed), > we set the AclEntries to the op from the cache. After that, if we mkdir > without any Acls setting and set through the same handler, the AclEnties from > the cache is the same with the last one which set the Acls, and because the > newNode have no AclFeature, we don’t have any chance to change it. Then the > editlog is wrong,record the wrong Acls. After the Standby load the editlogs > from journalnodes and apply them to memory in SNN then savenamespace and > transfer the wrong fsimage to ANN, all the fsimages get wrong. The only > solution is to save namespace from ANN and you can get the right fsimage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7391) Renable SSLv2Hello in HttpFS
[ https://issues.apache.org/jira/browse/HDFS-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209161#comment-14209161 ] Hudson commented on HDFS-7391: -- FAILURE: Integrated in Hadoop-trunk-Commit #6529 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6529/]) HDFS-7391. Renable SSLv2Hello in HttpFS. Contributed by Robert Kanter. (acmurthy: rev 9f0319bba1788e4c579ce533b14c0deab63f28ee) * hadoop-hdfs-project/hadoop-hdfs-httpfs/src/main/tomcat/ssl-server.xml * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Renable SSLv2Hello in HttpFS > > > Key: HDFS-7391 > URL: https://issues.apache.org/jira/browse/HDFS-7391 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 2.6.0, 2.5.2 >Reporter: Robert Kanter >Assignee: Robert Kanter >Priority: Blocker > Attachments: HDFS-7391-branch-2.5.patch, HDFS-7391.patch > > > We should re-enable "SSLv2Hello", which is required for older clients (e.g. > Java 6 with openssl 0.9.8x) so they can't connect without it. Just to be > clear, it does not mean SSLv2, which is insecure. > I couldn't simply do an addendum patch on HDFS-7274 because it's already been > closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7056) Snapshot support for truncate
[ https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Plamen Jeliazkov updated HDFS-7056: --- Attachment: HDFS-7056.patch HDFS-3107-HDFS-7056-combined.patch Trunk had moved on since I generated my patch. Refreshed both combined and regular patch. Made some additional changes with help from Konstantin: # Patch failed with compilation error due to HDFS-7381. Updated patch to account for new BlockIdManager in trunk. # Renamed parameter of commitBlockSynchronization(), from "lastblock" to "oldBlock". We did this change because commitBlockSynchronization() handles both regular recovery and copy-on-write truncate now. # Removed a call in commitBlockSynchronization() to getStoredBlock() because we could get it from iFile.getLastBlock(). # Fixed TestCommitBlockSynchronization again by making mocked INodeFile return BlockInfoUC when calling getLastBlock() on it. > Snapshot support for truncate > - > > Key: HDFS-7056 > URL: https://issues.apache.org/jira/browse/HDFS-7056 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.0.0 >Reporter: Konstantin Shvachko >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107-HDFS-7056-combined.patch, > HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, > HDFS-3107-HDFS-7056-combined.patch, HDFS-7056.patch, HDFS-7056.patch, > HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, > HDFSSnapshotWithTruncateDesign.docx > > > Implementation of truncate in HDFS-3107 does not allow truncating files which > are in a snapshot. It is desirable to be able to truncate and still keep the > old file state of the file in the snapshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7391) Renable SSLv2Hello in HttpFS
[ https://issues.apache.org/jira/browse/HDFS-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209148#comment-14209148 ] Arun C Murthy commented on HDFS-7391: - [~kasha] you just saved me some typing chores, I'll take care of this. Thanks! > Renable SSLv2Hello in HttpFS > > > Key: HDFS-7391 > URL: https://issues.apache.org/jira/browse/HDFS-7391 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 2.6.0, 2.5.2 >Reporter: Robert Kanter >Assignee: Robert Kanter >Priority: Blocker > Attachments: HDFS-7391-branch-2.5.patch, HDFS-7391.patch > > > We should re-enable "SSLv2Hello", which is required for older clients (e.g. > Java 6 with openssl 0.9.8x) so they can't connect without it. Just to be > clear, it does not mean SSLv2, which is insecure. > I couldn't simply do an addendum patch on HDFS-7274 because it's already been > closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7385) ThreadLocal used in FSEditLog class lead FSImage permission mess up
[ https://issues.apache.org/jira/browse/HDFS-7385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209118#comment-14209118 ] Yi Liu commented on HDFS-7385: -- [~jiangyu1211], good find, I think it's a critical issue. It should occur if multi-ops of create file (mkdir) happens using same thread. Please add a test case to reproduce it, it's not hard. > ThreadLocal used in FSEditLog class lead FSImage permission mess up > > > Key: HDFS-7385 > URL: https://issues.apache.org/jira/browse/HDFS-7385 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0, 2.5.0 >Reporter: jiangyu >Assignee: jiangyu > Attachments: HDFS-7385.patch > > > We migrated our NameNodes from low configuration to high configuration > machines last week. Firstly,we imported the current directory including > fsimage and editlog files from original ActiveNameNode to new ActiveNameNode > and started the New NameNode, then changed the configuration of all > datanodes and restarted all of datanodes , then blockreport to new NameNodes > at once and send heartbeat after that. >Everything seemed perfect, but after we restarted Resoucemanager , > most of the users compained that their jobs couldn't be executed for the > reason of permission problem. > We applied Acls in our clusters, and after migrated we found most of > the directories and files which were not set Acls before now had the > properties of Acls. That is the reason why users could not execute their > jobs.So we had to change most of the files permission to a+r and directories > permission to a+rx to make sure the jobs can be executed. > After searching this problem for some days, i found there is a bug in > FSEditLog.java. The ThreadLocal variable cache in FSEditLog don’t set the > proper value in logMkdir and logOpenFile functions. Here is the code of > logMkdir: > public void logMkDir(String path, INode newNode) { > PermissionStatus permissions = newNode.getPermissionStatus(); > MkdirOp op = MkdirOp.getInstance(cache.get()) > .setInodeId(newNode.getId()) > .setPath(path) > .setTimestamp(newNode.getModificationTime()) > .setPermissionStatus(permissions); > AclFeature f = newNode.getAclFeature(); > if (f != null) { > op.setAclEntries(AclStorage.readINodeLogicalAcl(newNode)); > } > logEdit(op); > } > For example, if we mkdir with Acls through one handler(Thread indeed), > we set the AclEntries to the op from the cache. After that, if we mkdir > without any Acls setting and set through the same handler, the AclEnties from > the cache is the same with the last one which set the Acls, and because the > newNode have no AclFeature, we don’t have any chance to change it. Then the > editlog is wrong,record the wrong Acls. After the Standby load the editlogs > from journalnodes and apply them to memory in SNN then savenamespace and > transfer the wrong fsimage to ANN, all the fsimages get wrong. The only > solution is to save namespace from ANN and you can get the right fsimage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7385) ThreadLocal used in FSEditLog class lead FSImage permission mess up
[ https://issues.apache.org/jira/browse/HDFS-7385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-7385: - Target Version/s: 2.6.0 (was: 2.4.0, 2.5.0) > ThreadLocal used in FSEditLog class lead FSImage permission mess up > > > Key: HDFS-7385 > URL: https://issues.apache.org/jira/browse/HDFS-7385 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.0, 2.5.0 >Reporter: jiangyu >Assignee: jiangyu > Attachments: HDFS-7385.patch > > > We migrated our NameNodes from low configuration to high configuration > machines last week. Firstly,we imported the current directory including > fsimage and editlog files from original ActiveNameNode to new ActiveNameNode > and started the New NameNode, then changed the configuration of all > datanodes and restarted all of datanodes , then blockreport to new NameNodes > at once and send heartbeat after that. >Everything seemed perfect, but after we restarted Resoucemanager , > most of the users compained that their jobs couldn't be executed for the > reason of permission problem. > We applied Acls in our clusters, and after migrated we found most of > the directories and files which were not set Acls before now had the > properties of Acls. That is the reason why users could not execute their > jobs.So we had to change most of the files permission to a+r and directories > permission to a+rx to make sure the jobs can be executed. > After searching this problem for some days, i found there is a bug in > FSEditLog.java. The ThreadLocal variable cache in FSEditLog don’t set the > proper value in logMkdir and logOpenFile functions. Here is the code of > logMkdir: > public void logMkDir(String path, INode newNode) { > PermissionStatus permissions = newNode.getPermissionStatus(); > MkdirOp op = MkdirOp.getInstance(cache.get()) > .setInodeId(newNode.getId()) > .setPath(path) > .setTimestamp(newNode.getModificationTime()) > .setPermissionStatus(permissions); > AclFeature f = newNode.getAclFeature(); > if (f != null) { > op.setAclEntries(AclStorage.readINodeLogicalAcl(newNode)); > } > logEdit(op); > } > For example, if we mkdir with Acls through one handler(Thread indeed), > we set the AclEntries to the op from the cache. After that, if we mkdir > without any Acls setting and set through the same handler, the AclEnties from > the cache is the same with the last one which set the Acls, and because the > newNode have no AclFeature, we don’t have any chance to change it. Then the > editlog is wrong,record the wrong Acls. After the Standby load the editlogs > from journalnodes and apply them to memory in SNN then savenamespace and > transfer the wrong fsimage to ANN, all the fsimages get wrong. The only > solution is to save namespace from ANN and you can get the right fsimage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7056) Snapshot support for truncate
[ https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209037#comment-14209037 ] Hadoop QA commented on HDFS-7056: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12681191/HDFS-7056.patch against trunk revision b0a9cd3. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8726//console This message is automatically generated. > Snapshot support for truncate > - > > Key: HDFS-7056 > URL: https://issues.apache.org/jira/browse/HDFS-7056 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.0.0 >Reporter: Konstantin Shvachko >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107-HDFS-7056-combined.patch, > HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, > HDFS-3107-HDFS-7056-combined.patch, HDFS-7056.patch, HDFS-7056.patch, > HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, > HDFSSnapshotWithTruncateDesign.docx > > > Implementation of truncate in HDFS-3107 does not allow truncating files which > are in a snapshot. It is desirable to be able to truncate and still keep the > old file state of the file in the snapshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7056) Snapshot support for truncate
[ https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Plamen Jeliazkov updated HDFS-7056: --- Status: Open (was: Patch Available) > Snapshot support for truncate > - > > Key: HDFS-7056 > URL: https://issues.apache.org/jira/browse/HDFS-7056 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.0.0 >Reporter: Konstantin Shvachko >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107-HDFS-7056-combined.patch, > HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, > HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, > HDFSSnapshotWithTruncateDesign.docx > > > Implementation of truncate in HDFS-3107 does not allow truncating files which > are in a snapshot. It is desirable to be able to truncate and still keep the > old file state of the file in the snapshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7342) Lease Recovery doesn't happen some times
[ https://issues.apache.org/jira/browse/HDFS-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209022#comment-14209022 ] Ravi Prakash commented on HDFS-7342: Some details I've been able to gather from the logs on a cluster running Hadoop 2.2.0: The client logs. {noformat} 2014-10-27 19:46:54,952 INFO [Thread-60] org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS hdfs://:8020/ . nothing related to this file... 2014-10-28 01:18:26,018 INFO [main] org.apache.hadoop.hdfs.DFSClient: Could not complete retrying... 2014-10-28 01:18:26,419 INFO [main] org.apache.hadoop.hdfs.DFSClient: Could not complete retrying... ...goes on for 10 mins. 2014-10-28 01:28:24,481 INFO [main] org.apache.hadoop.hdfs.DFSClient: Could not complete retrying... 2014-10-28 01:28:24,883 INFO [main] org.apache.hadoop.hdfs.DFSClient: Could not complete retrying... {noformat} The Namenode Logs grepping for {noformat} 2014-10-27 19:46:58,041 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: . blk_A_A{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[:50010|RBW], ReplicaUnderConstruction[:50010|RBW], ReplicaUnderConstruction[:50010|RBW]]} 2014-10-27 20:13:26,607 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: . blk_A_B{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[:50010|RBW], ReplicaUnderConstruction[:50010|RBW], ReplicaUnderConstruction[:50010|RBW]]} 2014-10-27 20:47:52,422 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: . blk_A_C{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[:50010|RBW], ReplicaUnderConstruction[:50010|RBW], ReplicaUnderConstruction[:50010|RBW]]} 2014-10-27 21:23:13,844 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: . blk_A_D{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[:50010|RBW], ReplicaUnderConstruction[:50010|RBW], ReplicaUnderConstruction[:50010|RBW]]} 2014-10-27 22:02:33,405 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: . blk_A_E{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[:50010|RBW], ReplicaUnderConstruction[:50010|RBW], ReplicaUnderConstruction[:50010|RBW]]} 2014-10-27 22:42:49,227 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: . blk_A_F{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[:50010|RBW], ReplicaUnderConstruction[:50010|RBW], ReplicaUnderConstruction[:50010|RBW]]} 2014-10-27 23:25:58,555 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: . blk_A_G{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[:50010|RBW], ReplicaUnderConstruction[:50010|RBW], ReplicaUnderConstruction[:50010|RBW]]} 2014-10-28 00:07:36,093 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: . blk_A_H{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[:50010|RBW], ReplicaUnderConstruction[:50010|RBW], ReplicaUnderConstruction[:50010|RBW]]} 2014-10-28 01:13:50,298 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: . blk_A_I{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[:50010|RBW], ReplicaUnderConstruction[:50010|RBW], ReplicaUnderConstruction[:50010|RBW]]} 2014-10-28 01:18:20,868 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: is closed by DFSClient_attempt_X_Y_r_T_U_V_W 2014-10-28 01:18:21,272 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: is closed by DFSClient_attempt_X_Y_r_T_U_V_W This keeps going interspersed with other logs until 2014-10-28 01:28:24,483 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: is closed by DFSClient_attempt_X_Y_r_T_U_V_W 2014-10-28 01:28:25,615 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: is closed by DFSClient_attempt_X_Y_r_T_U_V_W 2014-10-28 02:28:17,569 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease. Holder: DFSClient_attempt_X_Y_r_T_U_V_W, pendingcreates: 1], src= ..BOOM NN IS IN INFINITE LOOP.. Only the following two messages keep getting repeated: 2014-10-28 02:28:17,568 INFO org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease. Holder: DFSClient_attempt_X_Y_r_T_U_V_W, pendingcreates: 1] has expired hard limit 2014-10-28 02:28:17,569 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease. Holder: DFSClient_attempt_X_Y_r_T_U_V_W, pendingcreates: 1], src= 2014-10-28 02:28:17,569 INFO org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease. Holder: DFSClient_attempt_X_Y_r_T_U_V_W, pendingcreates: 1] has expired hard limit 2014-10-28 02:28:17,569 INFO org.apache.hadoop.hd
[jira] [Commented] (HDFS-7391) Renable SSLv2Hello in HttpFS
[ https://issues.apache.org/jira/browse/HDFS-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209143#comment-14209143 ] Karthik Kambatla commented on HDFS-7391: [~acmurthy] - Missed your comment here, and committed the addendum for HADOOP-11217. Will let you commit this, thanks. > Renable SSLv2Hello in HttpFS > > > Key: HDFS-7391 > URL: https://issues.apache.org/jira/browse/HDFS-7391 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 2.6.0, 2.5.2 >Reporter: Robert Kanter >Assignee: Robert Kanter >Priority: Blocker > Attachments: HDFS-7391-branch-2.5.patch, HDFS-7391.patch > > > We should re-enable "SSLv2Hello", which is required for older clients (e.g. > Java 6 with openssl 0.9.8x) so they can't connect without it. Just to be > clear, it does not mean SSLv2, which is insecure. > I couldn't simply do an addendum patch on HDFS-7274 because it's already been > closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6982) nntop: top-like tool for name node users
[ https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209024#comment-14209024 ] Andrew Wang commented on HDFS-6982: --- How would this behave to a sudden, large spike in operations? This is the situation we're trying to detect. i.e. for something like: {noformat} 0, 0, 0, 100, 0, 0, 0, ... {noformat} What I'd want to see is essentially a step function going 0 -> 100 -> 0, but an EWMA would necessarily tail off exponentially. I'm also happy to take a look at any references you have. I've done some reading on calculating percentiles on rolling windows, and what we have now is pretty typical for that, i.e. a number of buckets each representing a fixed time interval, aggregating buckets to calculate the metric, old buckets being discarded as time passes. > nntop: top-like tool for name node users > - > > Key: HDFS-6982 > URL: https://issues.apache.org/jira/browse/HDFS-6982 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Maysam Yabandeh >Assignee: Maysam Yabandeh > Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, HDFS-6982.v3.patch, > HDFS-6982.v4.patch, HDFS-6982.v5.patch, HDFS-6982.v6.patch, > nntop-design-v1.pdf > > > In this jira we motivate the need for nntop, a tool that, similarly to what > top does in Linux, gives the list of top users of the HDFS name node and > gives insight about which users are sending majority of each traffic type to > the name node. This information turns out to be the most critical when the > name node is under pressure and the HDFS admin needs to know which user is > hammering the name node and with what kind of requests. Here we present the > design of nntop which has been in production at Twitter in the past 10 > months. nntop proved to have low cpu overhead (< 2% in a cluster of 4K > nodes), low memory footprint (less than a few MB), and quite efficient for > the write path (only two hash lookup for updating a metric). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7390) Provide JMX metrics per storage type
[ https://issues.apache.org/jira/browse/HDFS-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209026#comment-14209026 ] Haohui Mai commented on HDFS-7390: -- Thanks for the work. Can you make The JMX directly output JSON objects instead JSON strings? > Provide JMX metrics per storage type > > > Key: HDFS-7390 > URL: https://issues.apache.org/jira/browse/HDFS-7390 > Project: Hadoop HDFS > Issue Type: Improvement >Affects Versions: 2.5.2 >Reporter: Benoy Antony >Assignee: Benoy Antony > Attachments: HDFS-7390.patch > > > HDFS-2832 added heterogeneous support. In a cluster with different storage > types, it is useful to have metrics per storage type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7391) Renable SSLv2Hello in HttpFS
[ https://issues.apache.org/jira/browse/HDFS-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209030#comment-14209030 ] Arun C Murthy commented on HDFS-7391: - Sounds good, thanks [~rkanter] and [~ywskycn]! I'll commit both shortly for RC1. > Renable SSLv2Hello in HttpFS > > > Key: HDFS-7391 > URL: https://issues.apache.org/jira/browse/HDFS-7391 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 2.6.0, 2.5.2 >Reporter: Robert Kanter >Assignee: Robert Kanter >Priority: Blocker > Attachments: HDFS-7391-branch-2.5.patch, HDFS-7391.patch > > > We should re-enable "SSLv2Hello", which is required for older clients (e.g. > Java 6 with openssl 0.9.8x) so they can't connect without it. Just to be > clear, it does not mean SSLv2, which is insecure. > I couldn't simply do an addendum patch on HDFS-7274 because it's already been > closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7056) Snapshot support for truncate
[ https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Plamen Jeliazkov updated HDFS-7056: --- Status: Patch Available (was: Open) > Snapshot support for truncate > - > > Key: HDFS-7056 > URL: https://issues.apache.org/jira/browse/HDFS-7056 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: namenode >Affects Versions: 3.0.0 >Reporter: Konstantin Shvachko >Assignee: Plamen Jeliazkov > Attachments: HDFS-3107-HDFS-7056-combined.patch, > HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, > HDFS-3107-HDFS-7056-combined.patch, HDFS-7056.patch, HDFS-7056.patch, > HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, > HDFSSnapshotWithTruncateDesign.docx > > > Implementation of truncate in HDFS-3107 does not allow truncating files which > are in a snapshot. It is desirable to be able to truncate and still keep the > old file state of the file in the snapshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6982) nntop: top-like tool for name node users
[ https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208995#comment-14208995 ] Haohui Mai commented on HDFS-6982: -- bq. However, my understanding is that there's no direct link between the alpha parameter and a time-based window, e.g. 1mi, 5 min, 30min. Let n equals to the number of observations per window. Setting {{alpha = (n-1) / n}} would make the math right assuming that the number of requests follows Poisson distribution. bq. IIUC the situation you describe will lead to small errors, not big ones. If there are bigger correctness issues, I think we can fix them by adding more synchronization. Thanks. Depending on the timing, the errors will lead to one of the following: (1) correct results, (2) consistently missing one measurement from some users, (3) inconsistent measurement for the same users. The artificial errors makes nntop less valuable. I don't quite understand your concerns on fixing the issue. This is a variant of the online counting problem which is relatively well-studied. Applying the de facto solution can eliminate the errors and makes the implementation simpler. I'm not sure why we need to reinvent the wheel here. > nntop: top-like tool for name node users > - > > Key: HDFS-6982 > URL: https://issues.apache.org/jira/browse/HDFS-6982 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Maysam Yabandeh >Assignee: Maysam Yabandeh > Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, HDFS-6982.v3.patch, > HDFS-6982.v4.patch, HDFS-6982.v5.patch, HDFS-6982.v6.patch, > nntop-design-v1.pdf > > > In this jira we motivate the need for nntop, a tool that, similarly to what > top does in Linux, gives the list of top users of the HDFS name node and > gives insight about which users are sending majority of each traffic type to > the name node. This information turns out to be the most critical when the > name node is under pressure and the HDFS admin needs to know which user is > hammering the name node and with what kind of requests. Here we present the > design of nntop which has been in production at Twitter in the past 10 > months. nntop proved to have low cpu overhead (< 2% in a cluster of 4K > nodes), low memory footprint (less than a few MB), and quite efficient for > the write path (only two hash lookup for updating a metric). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7386) Replace check "port number < 1024" with shared isPrivilegedPort method
[ https://issues.apache.org/jira/browse/HDFS-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208972#comment-14208972 ] Yongjun Zhang commented on HDFS-7386: - HI Chris, thanks again for your input, just submitted a trivial patch. > Replace check "port number < 1024" with shared isPrivilegedPort method > --- > > Key: HDFS-7386 > URL: https://issues.apache.org/jira/browse/HDFS-7386 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Trivial > Attachments: HDFS-7386.001.patch > > > Per discussion in HDFS-7382, I'm filing this jira as a follow-up, to replace > check "port number < 1024" with shared isPrivilegedPort method. > Thanks [~cnauroth] for the work on HDFS-7382 and suggestion there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7386) Replace check "port number < 1024" with shared isPrivilegedPort method
[ https://issues.apache.org/jira/browse/HDFS-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-7386: Status: Patch Available (was: Open) > Replace check "port number < 1024" with shared isPrivilegedPort method > --- > > Key: HDFS-7386 > URL: https://issues.apache.org/jira/browse/HDFS-7386 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Trivial > Attachments: HDFS-7386.001.patch > > > Per discussion in HDFS-7382, I'm filing this jira as a follow-up, to replace > check "port number < 1024" with shared isPrivilegedPort method. > Thanks [~cnauroth] for the work on HDFS-7382 and suggestion there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7386) Replace check "port number < 1024" with shared isPrivilegedPort method
[ https://issues.apache.org/jira/browse/HDFS-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-7386: Priority: Trivial (was: Major) > Replace check "port number < 1024" with shared isPrivilegedPort method > --- > > Key: HDFS-7386 > URL: https://issues.apache.org/jira/browse/HDFS-7386 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Trivial > Attachments: HDFS-7386.001.patch > > > Per discussion in HDFS-7382, I'm filing this jira as a follow-up, to replace > check "port number < 1024" with shared isPrivilegedPort method. > Thanks [~cnauroth] for the work on HDFS-7382 and suggestion there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6982) nntop: top-like tool for name node users
[ https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208968#comment-14208968 ] Andrew Wang commented on HDFS-6982: --- Haohui, IIUC your suggestion here is to use an exponential moving average. However, my understanding is that there's no direct link between the alpha parameter and a time-based window, e.g. 1mi, 5 min, 30min. I think the time-based windows are more operator-friendly. Also, considering that the purpose of this feature is to determine a ranked list of users and what operations they're doing, the exact values aren't actually that important. IIUC the situation you describe will lead to small errors, not big ones. If there are bigger correctness issues, I think we can fix them by adding more synchronization. Thanks. > nntop: top-like tool for name node users > - > > Key: HDFS-6982 > URL: https://issues.apache.org/jira/browse/HDFS-6982 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Maysam Yabandeh >Assignee: Maysam Yabandeh > Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, HDFS-6982.v3.patch, > HDFS-6982.v4.patch, HDFS-6982.v5.patch, HDFS-6982.v6.patch, > nntop-design-v1.pdf > > > In this jira we motivate the need for nntop, a tool that, similarly to what > top does in Linux, gives the list of top users of the HDFS name node and > gives insight about which users are sending majority of each traffic type to > the name node. This information turns out to be the most critical when the > name node is under pressure and the HDFS admin needs to know which user is > hammering the name node and with what kind of requests. Here we present the > design of nntop which has been in production at Twitter in the past 10 > months. nntop proved to have low cpu overhead (< 2% in a cluster of 4K > nodes), low memory footprint (less than a few MB), and quite efficient for > the write path (only two hash lookup for updating a metric). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7386) Replace check "port number < 1024" with shared isPrivilegedPort method
[ https://issues.apache.org/jira/browse/HDFS-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-7386: Attachment: HDFS-7386.001.patch > Replace check "port number < 1024" with shared isPrivilegedPort method > --- > > Key: HDFS-7386 > URL: https://issues.apache.org/jira/browse/HDFS-7386 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang >Priority: Trivial > Attachments: HDFS-7386.001.patch > > > Per discussion in HDFS-7382, I'm filing this jira as a follow-up, to replace > check "port number < 1024" with shared isPrivilegedPort method. > Thanks [~cnauroth] for the work on HDFS-7382 and suggestion there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7386) Replace check "port number < 1024" with shared isPrivilegedPort method
[ https://issues.apache.org/jira/browse/HDFS-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-7386: Issue Type: Improvement (was: Bug) > Replace check "port number < 1024" with shared isPrivilegedPort method > --- > > Key: HDFS-7386 > URL: https://issues.apache.org/jira/browse/HDFS-7386 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > > Per discussion in HDFS-7382, I'm filing this jira as a follow-up, to replace > check "port number < 1024" with shared isPrivilegedPort method. > Thanks [~cnauroth] for the work on HDFS-7382 and suggestion there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6982) nntop: top-like tool for name node users
[ https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208944#comment-14208944 ] Haohui Mai commented on HDFS-6982: -- For the UI part I think it is fine to report only the smallest window and let the UI to plot the graph, thus I think the patch can be further simplified. > nntop: top-like tool for name node users > - > > Key: HDFS-6982 > URL: https://issues.apache.org/jira/browse/HDFS-6982 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Maysam Yabandeh >Assignee: Maysam Yabandeh > Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, HDFS-6982.v3.patch, > HDFS-6982.v4.patch, HDFS-6982.v5.patch, HDFS-6982.v6.patch, > nntop-design-v1.pdf > > > In this jira we motivate the need for nntop, a tool that, similarly to what > top does in Linux, gives the list of top users of the HDFS name node and > gives insight about which users are sending majority of each traffic type to > the name node. This information turns out to be the most critical when the > name node is under pressure and the HDFS admin needs to know which user is > hammering the name node and with what kind of requests. Here we present the > design of nntop which has been in production at Twitter in the past 10 > months. nntop proved to have low cpu overhead (< 2% in a cluster of 4K > nodes), low memory footprint (less than a few MB), and quite efficient for > the write path (only two hash lookup for updating a metric). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6982) nntop: top-like tool for name node users
[ https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208942#comment-14208942 ] Haohui Mai commented on HDFS-6982: -- Please correct me if I'm wrong. Let's say we have a rolling window of 3, and the current observation {{o}} is {noformat} o = [o1, o2, o3]; {noformat} Consider the following interleaving. 1. The user measures the observation. He gets {{(o1 + o2 + o3) / 3}}. 2. The observation {{o1}} is stale, thus it is reset to zero by {{safeReset}}. 3. Right before {{bucket.inc()}} is called, the user makes another measurement, now he gets {{(0 + o2 + o3) / 3}}. 4. {{o1}} is updated. That way the user gets incorrect measurement in step 3. My feeling is that it is more robust to calculate the moving average instead of reseting the observation in every ticks. Actually, the core functionality can be implemented in the following code: {code} observation = new ConcurrentHashMap(); synchronized void bulkUpdate(Map updates) { for (Map.Entry e : updates) { long v = observation.get(e.getKey()) != null ? observation.get(e.getKey()) : 0; observation.put(e.getKey(), ALPHA * v + e.getValue()); } for (Map.Entry e : observation) { if (!updates.containsKey(e.getKey())) { long v = ALPHA * e.getValue(); if (v == 0) { observation.remove(e.getKey()); } else { observation.put(e.getKey(), v); } } } } synchronized Map observe() { return map; } {code} Assuming that the size of {{updates}} is bounded (which should be the case in nntop), it should be fairly efficient. Thoughts? > nntop: top-like tool for name node users > - > > Key: HDFS-6982 > URL: https://issues.apache.org/jira/browse/HDFS-6982 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Maysam Yabandeh >Assignee: Maysam Yabandeh > Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, HDFS-6982.v3.patch, > HDFS-6982.v4.patch, HDFS-6982.v5.patch, HDFS-6982.v6.patch, > nntop-design-v1.pdf > > > In this jira we motivate the need for nntop, a tool that, similarly to what > top does in Linux, gives the list of top users of the HDFS name node and > gives insight about which users are sending majority of each traffic type to > the name node. This information turns out to be the most critical when the > name node is under pressure and the HDFS admin needs to know which user is > hammering the name node and with what kind of requests. Here we present the > design of nntop which has been in production at Twitter in the past 10 > months. nntop proved to have low cpu overhead (< 2% in a cluster of 4K > nodes), low memory footprint (less than a few MB), and quite efficient for > the write path (only two hash lookup for updating a metric). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7312) Update DistCp v1 to optionally not use tmp location
[ https://issues.apache.org/jira/browse/HDFS-7312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208914#comment-14208914 ] Yongjun Zhang commented on HDFS-7312: - Hi [~jprosser], Thanks for the nice work here. I looked through, it looks good, except for a couple of improvements and nits: Improvements: * For the following new test created, to achieve better code sharing {code} /** copy files from dfs file system to dfs file system with skiptmp */ public void testCopyFromDfsToDfsWithSkiptmp() throws Exception { {code} suggest to create a new private method with an additional boolean parameter skipTmp, {{private void testCopyFromDfsToDfs(final boolean skipTmp}}. Then you can make the pre-existing test testCopyFromDfsToDfs and your new test to call this private method with false and true accordingly. * This same above suggestion applies to {{testCopySingleFileWithSkiptmp()}}. * Line 1221, do we really need to ceate the tmpDir here? For example, if we copy a single file to a destination file, why a tmpDir is needed? I think we can avoid doing this (removing that block of code), and have a file-existence checking when doing the fullyDelete. The point is, the TMP_DIR_LABEL may not always be created, we don't have to create it just for the purpose of deleting it. Right? Nits: * About usage description "Copy files directly to the final destination.", maybe we can change it to "Instead, copy files directly to the final destination." ? * Line 394, "// filename and the path becomes its parent directory.", replace "path" with "destPath" * Line 1218, rename "tmpDirRoot" to "tmpDirPrefix"? * Line 396, tab key not replaced with spaces. * line 400 remove newly added extra newline * Line 956, replace "The job to configure" with "The job configuration" * Line 1218, line is too long, need to be <= 80 columns based on hadoop code guideline; BTW, I guess you have tried to test it out in real clusters, right? Thanks a lot. > Update DistCp v1 to optionally not use tmp location > --- > > Key: HDFS-7312 > URL: https://issues.apache.org/jira/browse/HDFS-7312 > Project: Hadoop HDFS > Issue Type: Improvement > Components: tools >Affects Versions: 2.5.1 >Reporter: Joseph Prosser >Assignee: Joseph Prosser >Priority: Minor > Attachments: HDFS-7312.001.patch, HDFS-7312.002.patch, > HDFS-7312.003.patch, HDFS-7312.patch > > Original Estimate: 72h > Remaining Estimate: 72h > > DistCp v1 currently copies files to a tmp location and then renames that to > the specified destination. This can cause performance issues on filesystems > such as S3. A -skiptmp flag will be added to bypass this step and copy > directly to the destination. This feature mirrors a similar one added to > HBase ExportSnapshot > [HBASE-9|https://issues.apache.org/jira/browse/HBASE-9] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6982) nntop: top-like tool for name node users
[ https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208879#comment-14208879 ] Andrew Wang commented on HDFS-6982: --- bq. The configuration keys are already there. I have some constants in TopConf though. Did you mean to move those top-specific constants to DFSConfigKeys? Right now there are some prefixes that are concatenated to generate the full string. Since there are only 3 config keys right now, I think it'd be nicer to just write them out, i.e. "dfs.namenode.top.window.buckets". bq. We actually use jmx for plotting the data exported by nntop, and that is why one reporting period is sufficient. For html view, the web component directly contacts the TopMetrics and retrieves the top users for all reporting periods. I guess this is reasonable, though it does differ a bit from how the new webUI works. [~wheat9] do you have any feelings here? bq. Could you file a follow-on JIRA for this issue, and post the patch? It's fine to do it before or after, but it seems like we definitely need it in the end state either way. Thanks Maysam! > nntop: top-like tool for name node users > - > > Key: HDFS-6982 > URL: https://issues.apache.org/jira/browse/HDFS-6982 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Maysam Yabandeh >Assignee: Maysam Yabandeh > Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, HDFS-6982.v3.patch, > HDFS-6982.v4.patch, HDFS-6982.v5.patch, HDFS-6982.v6.patch, > nntop-design-v1.pdf > > > In this jira we motivate the need for nntop, a tool that, similarly to what > top does in Linux, gives the list of top users of the HDFS name node and > gives insight about which users are sending majority of each traffic type to > the name node. This information turns out to be the most critical when the > name node is under pressure and the HDFS admin needs to know which user is > hammering the name node and with what kind of requests. Here we present the > design of nntop which has been in production at Twitter in the past 10 > months. nntop proved to have low cpu overhead (< 2% in a cluster of 4K > nodes), low memory footprint (less than a few MB), and quite efficient for > the write path (only two hash lookup for updating a metric). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6982) nntop: top-like tool for name node users
[ https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208866#comment-14208866 ] Maysam Yabandeh commented on HDFS-6982: --- Thanks for well-detailed review. Some questions before I submit the new patch: bq. Do you mind writing out the key strings in DFSConfigKeys? It's how we do it in the rest of the file, and more readable. The configuration keys are already there. I have some constants in TopConf though. Did you mean to move those top-specific constants to DFSConfigKeys? bq. The jmx page right now only exposes the shortest window, I see the smallestWindow hardcode. Since the goal is to use jmx to populate the HTML view, we need to somehow expose all of the configured windows. We actually use jmx for plotting the data exported by nntop, and that is why one reporting period is sufficient. For html view, the web component directly contacts the TopMetrics and retrieves the top users for all reporting periods. Also about the counters never go back to 0 and reported users never being removed the plan was to submit a separate patch for it as the fix was kind of orthogonal to what nntop does: refer to #3 in this comment: https://issues.apache.org/jira/browse/HDFS-6982?focusedCommentId=14122097&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14122097 If you think that change should be made before this patch gets committed, I can open the jira for that change and we have it committed before. > nntop: top-like tool for name node users > - > > Key: HDFS-6982 > URL: https://issues.apache.org/jira/browse/HDFS-6982 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Maysam Yabandeh >Assignee: Maysam Yabandeh > Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, HDFS-6982.v3.patch, > HDFS-6982.v4.patch, HDFS-6982.v5.patch, HDFS-6982.v6.patch, > nntop-design-v1.pdf > > > In this jira we motivate the need for nntop, a tool that, similarly to what > top does in Linux, gives the list of top users of the HDFS name node and > gives insight about which users are sending majority of each traffic type to > the name node. This information turns out to be the most critical when the > name node is under pressure and the HDFS admin needs to know which user is > hammering the name node and with what kind of requests. Here we present the > design of nntop which has been in production at Twitter in the past 10 > months. nntop proved to have low cpu overhead (< 2% in a cluster of 4K > nodes), low memory footprint (less than a few MB), and quite efficient for > the write path (only two hash lookup for updating a metric). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7391) Renable SSLv2Hello in HttpFS
[ https://issues.apache.org/jira/browse/HDFS-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208839#comment-14208839 ] Wei Yan commented on HDFS-7391: --- [~acmurthy], for HADOOP-11243, we tried different approaches (whilelist, blacklist) to add the SSLv2Hello, but not success. The shuffle server still cannot accept SSLv2Hello protocol. Given that the shuffle happens between NMs, so I think we can keep the existing solution without SSLv2Hello. > Renable SSLv2Hello in HttpFS > > > Key: HDFS-7391 > URL: https://issues.apache.org/jira/browse/HDFS-7391 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 2.6.0, 2.5.2 >Reporter: Robert Kanter >Assignee: Robert Kanter >Priority: Blocker > Attachments: HDFS-7391-branch-2.5.patch, HDFS-7391.patch > > > We should re-enable "SSLv2Hello", which is required for older clients (e.g. > Java 6 with openssl 0.9.8x) so they can't connect without it. Just to be > clear, it does not mean SSLv2, which is insecure. > I couldn't simply do an addendum patch on HDFS-7274 because it's already been > closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7391) Renable SSLv2Hello in HttpFS
[ https://issues.apache.org/jira/browse/HDFS-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208833#comment-14208833 ] Robert Kanter commented on HDFS-7391: - This is the addendum patch for HDFS-7274. I had to create a new JIRA because HDFS-7274 was already closed. (Also, the branch-2.5 version of the patch is different here because it also fixes a problem where older versions of Tomcat used a different property name). The addendum patch for HADOOP-11217 I was able to put there because it wasn't closed yet. The third JIRA was HADOOP-11243, which [~ywskycn] worked on. I'm not sure of the details, but he wasn't able to re-enable SSLv2Hello there. So, there's just HDFS-7391 and the HADOOP-11217 addendum. > Renable SSLv2Hello in HttpFS > > > Key: HDFS-7391 > URL: https://issues.apache.org/jira/browse/HDFS-7391 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 2.6.0, 2.5.2 >Reporter: Robert Kanter >Assignee: Robert Kanter >Priority: Blocker > Attachments: HDFS-7391-branch-2.5.patch, HDFS-7391.patch > > > We should re-enable "SSLv2Hello", which is required for older clients (e.g. > Java 6 with openssl 0.9.8x) so they can't connect without it. Just to be > clear, it does not mean SSLv2, which is insecure. > I couldn't simply do an addendum patch on HDFS-7274 because it's already been > closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7391) Renable SSLv2Hello in HttpFS
[ https://issues.apache.org/jira/browse/HDFS-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208818#comment-14208818 ] Arun C Murthy commented on HDFS-7391: - [~rkanter] - Is this and the addendum patch for HADOOP-11217 the only ones for SSLv2Hello? I thought there were 3? I'll commit both later today if there are no objections; others please commit to branch-2/branch-2.6/branch-2.6.0 if you get to it before me. Thanks. > Renable SSLv2Hello in HttpFS > > > Key: HDFS-7391 > URL: https://issues.apache.org/jira/browse/HDFS-7391 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 2.6.0, 2.5.2 >Reporter: Robert Kanter >Assignee: Robert Kanter >Priority: Blocker > Attachments: HDFS-7391-branch-2.5.patch, HDFS-7391.patch > > > We should re-enable "SSLv2Hello", which is required for older clients (e.g. > Java 6 with openssl 0.9.8x) so they can't connect without it. Just to be > clear, it does not mean SSLv2, which is insecure. > I couldn't simply do an addendum patch on HDFS-7274 because it's already been > closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-2936) File close()-ing hangs indefinitely if the number of live blocks does not match the minimum replication
[ https://issues.apache.org/jira/browse/HDFS-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208753#comment-14208753 ] Ravi Prakash commented on HDFS-2936: Thanks Harsh for this JIRA! I would go a different route on this. The min-replication count to me as a user means "It will take that many failures to lose data" . That is a simple concept to reason about. If we create a separate config that applies only for the write pipelines, 1. there is a window of opportunity during which my assumption is not valid (the time it takes for the NN to order that replication), and it makes understanding the concept slightly more complex. I would suggest that we should fix the write pipeline to contain the minimum replication count and that the client should wait until that happens. I realize that might be a much bigger change. > File close()-ing hangs indefinitely if the number of live blocks does not > match the minimum replication > --- > > Key: HDFS-2936 > URL: https://issues.apache.org/jira/browse/HDFS-2936 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 0.23.0 >Reporter: Harsh J >Assignee: Harsh J > Attachments: HDFS-2936.patch > > > If an admin wishes to enforce replication today for all the users of their > cluster, he may set {{dfs.namenode.replication.min}}. This property prevents > users from creating files with < expected replication factor. > However, the value of minimum replication set by the above value is also > checked at several other points, especially during completeFile (close) > operations. If a condition arises wherein a write's pipeline may have gotten > only < minimum nodes in it, the completeFile operation does not successfully > close the file and the client begins to hang waiting for NN to replicate the > last bad block in the background. This form of hard-guarantee can, for > example, bring down clusters of HBase during high xceiver load on DN, or disk > fill-ups on many of them, etc.. > I propose we should split the property in two parts: > * dfs.namenode.replication.min > ** Stays the same name, but only checks file creation time replication factor > value and during adjustments made via setrep/etc. > * dfs.namenode.replication.min.for.write > ** New property that disconnects the rest of the checks from the above > property, such as the checks done during block commit, file complete/close, > safemode checks for block availability, etc.. > Alternatively, we may also choose to remove the client-side hang of > completeFile/close calls with a set number of retries. This would further > require discussion about how a file-closure handle ought to be handled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7008) xlator should be closed upon exit from DFSAdmin#genericRefresh()
[ https://issues.apache.org/jira/browse/HDFS-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208746#comment-14208746 ] Hadoop QA commented on HDFS-7008: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12666954/HDFS-7008.1.patch against trunk revision 782abbb. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8724//console This message is automatically generated. > xlator should be closed upon exit from DFSAdmin#genericRefresh() > > > Key: HDFS-7008 > URL: https://issues.apache.org/jira/browse/HDFS-7008 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ted Yu >Assignee: Tsuyoshi OZAWA >Priority: Minor > Attachments: HDFS-7008.1.patch > > > {code} > GenericRefreshProtocol xlator = > new GenericRefreshProtocolClientSideTranslatorPB(proxy); > // Refresh > Collection responses = xlator.refresh(identifier, args); > {code} > GenericRefreshProtocolClientSideTranslatorPB#close() should be called on > xlator before return. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7008) xlator should be closed upon exit from DFSAdmin#genericRefresh()
[ https://issues.apache.org/jira/browse/HDFS-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208739#comment-14208739 ] Chris Li commented on HDFS-7008: Linking issue > xlator should be closed upon exit from DFSAdmin#genericRefresh() > > > Key: HDFS-7008 > URL: https://issues.apache.org/jira/browse/HDFS-7008 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Ted Yu >Assignee: Tsuyoshi OZAWA >Priority: Minor > Attachments: HDFS-7008.1.patch > > > {code} > GenericRefreshProtocol xlator = > new GenericRefreshProtocolClientSideTranslatorPB(proxy); > // Refresh > Collection responses = xlator.refresh(identifier, args); > {code} > GenericRefreshProtocolClientSideTranslatorPB#close() should be called on > xlator before return. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4882) Namenode LeaseManager checkLeases() runs into infinite loop
[ https://issues.apache.org/jira/browse/HDFS-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208657#comment-14208657 ] Ravi Prakash commented on HDFS-4882: These unit tests failures are spurious and unrelated to the code changes in the patch. > Namenode LeaseManager checkLeases() runs into infinite loop > --- > > Key: HDFS-4882 > URL: https://issues.apache.org/jira/browse/HDFS-4882 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client, namenode >Affects Versions: 2.0.0-alpha, 2.5.1 >Reporter: Zesheng Wu >Assignee: Ravi Prakash >Priority: Critical > Attachments: 4882.1.patch, 4882.patch, 4882.patch, HDFS-4882.1.patch, > HDFS-4882.2.patch, HDFS-4882.patch > > > Scenario: > 1. cluster with 4 DNs > 2. the size of the file to be written is a little more than one block > 3. write the first block to 3 DNs, DN1->DN2->DN3 > 4. all the data packets of first block is successfully acked and the client > sets the pipeline stage to PIPELINE_CLOSE, but the last packet isn't sent out > 5. DN2 and DN3 are down > 6. client recovers the pipeline, but no new DN is added to the pipeline > because of the current pipeline stage is PIPELINE_CLOSE > 7. client continuously writes the last block, and try to close the file after > written all the data > 8. NN finds that the penultimate block doesn't has enough replica(our > dfs.namenode.replication.min=2), and the client's close runs into indefinite > loop(HDFS-2936), and at the same time, NN makes the last block's state to > COMPLETE > 9. shutdown the client > 10. the file's lease exceeds hard limit > 11. LeaseManager realizes that and begin to do lease recovery by call > fsnamesystem.internalReleaseLease() > 12. but the last block's state is COMPLETE, and this triggers lease manager's > infinite loop and prints massive logs like this: > {noformat} > 2013-06-05,17:42:25,695 INFO > org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease [Lease. Holder: > DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1] has expired hard > limit > 2013-06-05,17:42:25,695 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. > Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1], src= > /user/h_wuzesheng/test.dat > 2013-06-05,17:42:25,695 WARN org.apache.hadoop.hdfs.StateChange: DIR* > NameSystem.internalReleaseLease: File = /user/h_wuzesheng/test.dat, block > blk_-7028017402720175688_1202597, > lastBLockState=COMPLETE > 2013-06-05,17:42:25,695 INFO > org.apache.hadoop.hdfs.server.namenode.LeaseManager: Started block recovery > for file /user/h_wuzesheng/test.dat lease [Lease. Holder: DFSClient_NONM > APREDUCE_-1252656407_1, pendingcreates: 1] > {noformat} > (the 3rd line log is a debug log added by us) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4882) Namenode LeaseManager checkLeases() runs into infinite loop
[ https://issues.apache.org/jira/browse/HDFS-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208622#comment-14208622 ] Hadoop QA commented on HDFS-4882: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12681116/HDFS-4882.2.patch against trunk revision be7bf95. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerDynamicBehavior org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/8723//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/8723//console This message is automatically generated. > Namenode LeaseManager checkLeases() runs into infinite loop > --- > > Key: HDFS-4882 > URL: https://issues.apache.org/jira/browse/HDFS-4882 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client, namenode >Affects Versions: 2.0.0-alpha, 2.5.1 >Reporter: Zesheng Wu >Assignee: Ravi Prakash >Priority: Critical > Attachments: 4882.1.patch, 4882.patch, 4882.patch, HDFS-4882.1.patch, > HDFS-4882.2.patch, HDFS-4882.patch > > > Scenario: > 1. cluster with 4 DNs > 2. the size of the file to be written is a little more than one block > 3. write the first block to 3 DNs, DN1->DN2->DN3 > 4. all the data packets of first block is successfully acked and the client > sets the pipeline stage to PIPELINE_CLOSE, but the last packet isn't sent out > 5. DN2 and DN3 are down > 6. client recovers the pipeline, but no new DN is added to the pipeline > because of the current pipeline stage is PIPELINE_CLOSE > 7. client continuously writes the last block, and try to close the file after > written all the data > 8. NN finds that the penultimate block doesn't has enough replica(our > dfs.namenode.replication.min=2), and the client's close runs into indefinite > loop(HDFS-2936), and at the same time, NN makes the last block's state to > COMPLETE > 9. shutdown the client > 10. the file's lease exceeds hard limit > 11. LeaseManager realizes that and begin to do lease recovery by call > fsnamesystem.internalReleaseLease() > 12. but the last block's state is COMPLETE, and this triggers lease manager's > infinite loop and prints massive logs like this: > {noformat} > 2013-06-05,17:42:25,695 INFO > org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease [Lease. Holder: > DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1] has expired hard > limit > 2013-06-05,17:42:25,695 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. > Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1], src= > /user/h_wuzesheng/test.dat > 2013-06-05,17:42:25,695 WARN org.apache.hadoop.hdfs.StateChange: DIR* > NameSystem.internalReleaseLease: File = /user/h_wuzesheng/test.dat, block > blk_-7028017402720175688_1202597, > lastBLockState=COMPLETE > 2013-06-05,17:42:25,695 INFO > org.apache.hadoop.hdfs.server.namenode.LeaseManager: Started block recovery > for file /user/h_wuzesheng/test.dat lease [Lease. Holder: DFSClient_NONM > APREDUCE_-1252656407_1, pendingcreates: 1] > {noformat} > (the 3rd line log is a debug log added by us) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7387) NFS may only do partial commit due to a race between COMMIT and write
[ https://issues.apache.org/jira/browse/HDFS-7387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-7387: Fix Version/s: (was: 2.7.0) 2.6.0 I've merged this to branch-2.6 and branch-2.6.0 for inclusion in the 2.6.0 release candidate. > NFS may only do partial commit due to a race between COMMIT and write > - > > Key: HDFS-7387 > URL: https://issues.apache.org/jira/browse/HDFS-7387 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs >Affects Versions: 2.6.0 >Reporter: Brandon Li >Assignee: Brandon Li >Priority: Critical > Fix For: 2.6.0 > > Attachments: HDFS-7387.001.patch, HDFS-7387.002.patch > > > The requested range may not be committed when the following happens: > 1. the last pending write is removed from the queue to write to hdfs > 2. a commit request arrives, NFS sees there is not pending write, and it will > do a sync > 3. this sync request could flush only part of the last write to hdfs > 4. if a file read happens immediately after the above steps, the user may not > see all the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7387) NFS may only do partial commit due to a race between COMMIT and write
[ https://issues.apache.org/jira/browse/HDFS-7387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208528#comment-14208528 ] Hudson commented on HDFS-7387: -- FAILURE: Integrated in Hadoop-trunk-Commit #6524 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/6524/]) HDFS-7387. Merging to branch-2.6 for hadoop-2.6.0-rc1. (acmurthy: rev 782abbb000ab1c9e2e033e347eea8827d6e866ef) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > NFS may only do partial commit due to a race between COMMIT and write > - > > Key: HDFS-7387 > URL: https://issues.apache.org/jira/browse/HDFS-7387 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs >Affects Versions: 2.6.0 >Reporter: Brandon Li >Assignee: Brandon Li >Priority: Critical > Fix For: 2.7.0 > > Attachments: HDFS-7387.001.patch, HDFS-7387.002.patch > > > The requested range may not be committed when the following happens: > 1. the last pending write is removed from the queue to write to hdfs > 2. a commit request arrives, NFS sees there is not pending write, and it will > do a sync > 3. this sync request could flush only part of the last write to hdfs > 4. if a file read happens immediately after the above steps, the user may not > see all the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4882) Namenode LeaseManager checkLeases() runs into infinite loop
[ https://issues.apache.org/jira/browse/HDFS-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208478#comment-14208478 ] Ravi Prakash commented on HDFS-4882: s/ConcurrentinternalReleaseLease/ConcurrentSkipList/ > Namenode LeaseManager checkLeases() runs into infinite loop > --- > > Key: HDFS-4882 > URL: https://issues.apache.org/jira/browse/HDFS-4882 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client, namenode >Affects Versions: 2.0.0-alpha, 2.5.1 >Reporter: Zesheng Wu >Assignee: Ravi Prakash >Priority: Critical > Attachments: 4882.1.patch, 4882.patch, 4882.patch, HDFS-4882.1.patch, > HDFS-4882.2.patch, HDFS-4882.patch > > > Scenario: > 1. cluster with 4 DNs > 2. the size of the file to be written is a little more than one block > 3. write the first block to 3 DNs, DN1->DN2->DN3 > 4. all the data packets of first block is successfully acked and the client > sets the pipeline stage to PIPELINE_CLOSE, but the last packet isn't sent out > 5. DN2 and DN3 are down > 6. client recovers the pipeline, but no new DN is added to the pipeline > because of the current pipeline stage is PIPELINE_CLOSE > 7. client continuously writes the last block, and try to close the file after > written all the data > 8. NN finds that the penultimate block doesn't has enough replica(our > dfs.namenode.replication.min=2), and the client's close runs into indefinite > loop(HDFS-2936), and at the same time, NN makes the last block's state to > COMPLETE > 9. shutdown the client > 10. the file's lease exceeds hard limit > 11. LeaseManager realizes that and begin to do lease recovery by call > fsnamesystem.internalReleaseLease() > 12. but the last block's state is COMPLETE, and this triggers lease manager's > infinite loop and prints massive logs like this: > {noformat} > 2013-06-05,17:42:25,695 INFO > org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease [Lease. Holder: > DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1] has expired hard > limit > 2013-06-05,17:42:25,695 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. > Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1], src= > /user/h_wuzesheng/test.dat > 2013-06-05,17:42:25,695 WARN org.apache.hadoop.hdfs.StateChange: DIR* > NameSystem.internalReleaseLease: File = /user/h_wuzesheng/test.dat, block > blk_-7028017402720175688_1202597, > lastBLockState=COMPLETE > 2013-06-05,17:42:25,695 INFO > org.apache.hadoop.hdfs.server.namenode.LeaseManager: Started block recovery > for file /user/h_wuzesheng/test.dat lease [Lease. Holder: DFSClient_NONM > APREDUCE_-1252656407_1, pendingcreates: 1] > {noformat} > (the 3rd line log is a debug log added by us) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4882) Namenode LeaseManager checkLeases() runs into infinite loop
[ https://issues.apache.org/jira/browse/HDFS-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-4882: --- Attachment: HDFS-4882.2.patch I tried using iterators, but then realized that FSNamesystem.internalReleaseLease() calls into renewLease, and to make those modifications were too unsightly. Here's a patch which uses SortedSet.tailSet. However I still like the earlier patch more (because its a genuine case of two threads accessing the same data-structure). With tailSet we are just trying to build our own synchronization mechanism (which is likely more inefficient than the ConcurrentinternalReleaseLease) . I'd vote for the earlier patch (HDFS-4882.1.patch) . I'd also request for this to make it into 2.6.0 because of this issue's severity. > Namenode LeaseManager checkLeases() runs into infinite loop > --- > > Key: HDFS-4882 > URL: https://issues.apache.org/jira/browse/HDFS-4882 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client, namenode >Affects Versions: 2.0.0-alpha, 2.5.1 >Reporter: Zesheng Wu >Assignee: Ravi Prakash >Priority: Critical > Attachments: 4882.1.patch, 4882.patch, 4882.patch, HDFS-4882.1.patch, > HDFS-4882.2.patch, HDFS-4882.patch > > > Scenario: > 1. cluster with 4 DNs > 2. the size of the file to be written is a little more than one block > 3. write the first block to 3 DNs, DN1->DN2->DN3 > 4. all the data packets of first block is successfully acked and the client > sets the pipeline stage to PIPELINE_CLOSE, but the last packet isn't sent out > 5. DN2 and DN3 are down > 6. client recovers the pipeline, but no new DN is added to the pipeline > because of the current pipeline stage is PIPELINE_CLOSE > 7. client continuously writes the last block, and try to close the file after > written all the data > 8. NN finds that the penultimate block doesn't has enough replica(our > dfs.namenode.replication.min=2), and the client's close runs into indefinite > loop(HDFS-2936), and at the same time, NN makes the last block's state to > COMPLETE > 9. shutdown the client > 10. the file's lease exceeds hard limit > 11. LeaseManager realizes that and begin to do lease recovery by call > fsnamesystem.internalReleaseLease() > 12. but the last block's state is COMPLETE, and this triggers lease manager's > infinite loop and prints massive logs like this: > {noformat} > 2013-06-05,17:42:25,695 INFO > org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease [Lease. Holder: > DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1] has expired hard > limit > 2013-06-05,17:42:25,695 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering lease=[Lease. > Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1], src= > /user/h_wuzesheng/test.dat > 2013-06-05,17:42:25,695 WARN org.apache.hadoop.hdfs.StateChange: DIR* > NameSystem.internalReleaseLease: File = /user/h_wuzesheng/test.dat, block > blk_-7028017402720175688_1202597, > lastBLockState=COMPLETE > 2013-06-05,17:42:25,695 INFO > org.apache.hadoop.hdfs.server.namenode.LeaseManager: Started block recovery > for file /user/h_wuzesheng/test.dat lease [Lease. Holder: DFSClient_NONM > APREDUCE_-1252656407_1, pendingcreates: 1] > {noformat} > (the 3rd line log is a debug log added by us) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7386) Replace check "port number < 1024" with shared isPrivilegedPort method
[ https://issues.apache.org/jira/browse/HDFS-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208371#comment-14208371 ] Yongjun Zhang commented on HDFS-7386: - HI Chris, I just saw your comments here and in HADOOP-11293, they are very helpful, and thank you so much! I will work on patches for both of them a bit later today. > Replace check "port number < 1024" with shared isPrivilegedPort method > --- > > Key: HDFS-7386 > URL: https://issues.apache.org/jira/browse/HDFS-7386 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > > Per discussion in HDFS-7382, I'm filing this jira as a follow-up, to replace > check "port number < 1024" with shared isPrivilegedPort method. > Thanks [~cnauroth] for the work on HDFS-7382 and suggestion there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7386) Replace check "port number < 1024" with shared isPrivilegedPort method
[ https://issues.apache.org/jira/browse/HDFS-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208354#comment-14208354 ] Chris Nauroth commented on HDFS-7386: - Hi Yongjun. On further reflection, I think we should not incorporate a Windows check here. Sometimes the check for < 1024 is used on the client side to detect the behavior of the server side. If we consider the possibility of a Windows client connecting to a Linux server, then the client on Windows could assume incorrectly that there are no privileged ports, even though the server on Linux does have privileged ports. As a practical matter, I think this means that when secure mode is fully implemented for Windows, there is going to be a limitation that the DataNode can't use a port < 1024. Otherwise, it would throw off some of this detection logic. It's not a bad limitation, just something we'll need to be aware of. > Replace check "port number < 1024" with shared isPrivilegedPort method > --- > > Key: HDFS-7386 > URL: https://issues.apache.org/jira/browse/HDFS-7386 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > > Per discussion in HDFS-7382, I'm filing this jira as a follow-up, to replace > check "port number < 1024" with shared isPrivilegedPort method. > Thanks [~cnauroth] for the work on HDFS-7382 and suggestion there. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7391) Renable SSLv2Hello in HttpFS
[ https://issues.apache.org/jira/browse/HDFS-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208349#comment-14208349 ] Robert Kanter commented on HDFS-7391: - We verified that an old OpenSSL client is able to do the SSLv2Hello handshake and then use TLSv1 (and that SSLv2 and SSLv3 are disabled). > Renable SSLv2Hello in HttpFS > > > Key: HDFS-7391 > URL: https://issues.apache.org/jira/browse/HDFS-7391 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 2.6.0, 2.5.2 >Reporter: Robert Kanter >Assignee: Robert Kanter >Priority: Blocker > Attachments: HDFS-7391-branch-2.5.patch, HDFS-7391.patch > > > We should re-enable "SSLv2Hello", which is required for older clients (e.g. > Java 6 with openssl 0.9.8x) so they can't connect without it. Just to be > clear, it does not mean SSLv2, which is insecure. > I couldn't simply do an addendum patch on HDFS-7274 because it's already been > closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7394) Log at INFO level when InvalidToken is seen in ShortCircuitCache
Kihwal Lee created HDFS-7394: Summary: Log at INFO level when InvalidToken is seen in ShortCircuitCache Key: HDFS-7394 URL: https://issues.apache.org/jira/browse/HDFS-7394 Project: Hadoop HDFS Issue Type: Bug Reporter: Kihwal Lee Priority: Minor For long running clients, getting an {{InvalidToken}} exception is expected and the client refetches a block token when it happens. The related events are logged at INFO except the ones in {{ShortCircuitCache}}. It will be better if they are also made to log at INFO. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7391) Renable SSLv2Hello in HttpFS
[ https://issues.apache.org/jira/browse/HDFS-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208188#comment-14208188 ] Dave Thompson commented on HDFS-7391: - Looking at the patch, config change appears benign > Renable SSLv2Hello in HttpFS > > > Key: HDFS-7391 > URL: https://issues.apache.org/jira/browse/HDFS-7391 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 2.6.0, 2.5.2 >Reporter: Robert Kanter >Assignee: Robert Kanter >Priority: Blocker > Attachments: HDFS-7391-branch-2.5.patch, HDFS-7391.patch > > > We should re-enable "SSLv2Hello", which is required for older clients (e.g. > Java 6 with openssl 0.9.8x) so they can't connect without it. Just to be > clear, it does not mean SSLv2, which is insecure. > I couldn't simply do an addendum patch on HDFS-7274 because it's already been > closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7391) Renable SSLv2Hello in HttpFS
[ https://issues.apache.org/jira/browse/HDFS-7391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208172#comment-14208172 ] Dave Thompson commented on HDFS-7391: - For clarifications you are not suggesting turning on SSLv2, which has been deprecated for 18 years, for reasons discussed in RFC6176. Rather, you are suggesting turning on the backwards compatible Client-Hello, that was introduced in 1996 for transition, for clients that didn't know if they were connecting to an SSLv2 or SSLv3 server. A bit surprised that there exists hadoop clients that find this necessary. Java 6 with openssl 0.9.8x, I believe will support up to SSLv3.1 (TLS 1.0), which I've used as a server... I can't speak to client configurability. My primary concern would be that in enabling acceptance of SSLv2 Client-Hello, that assurances/confirmation be made that a resulting SSLv2.0 session is not allowed. > Renable SSLv2Hello in HttpFS > > > Key: HDFS-7391 > URL: https://issues.apache.org/jira/browse/HDFS-7391 > Project: Hadoop HDFS > Issue Type: Bug > Components: webhdfs >Affects Versions: 2.6.0, 2.5.2 >Reporter: Robert Kanter >Assignee: Robert Kanter >Priority: Blocker > Attachments: HDFS-7391-branch-2.5.patch, HDFS-7391.patch > > > We should re-enable "SSLv2Hello", which is required for older clients (e.g. > Java 6 with openssl 0.9.8x) so they can't connect without it. Just to be > clear, it does not mean SSLv2, which is insecure. > I couldn't simply do an addendum patch on HDFS-7274 because it's already been > closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7375) Move FSClusterStats to o.a.h.h.hdfs.server.blockmanagement
[ https://issues.apache.org/jira/browse/HDFS-7375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208131#comment-14208131 ] Hudson commented on HDFS-7375: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #3 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/3/]) HDFS-7375. Move FSClusterStats to o.a.h.h.hdfs.server.blockmanagement. Contributed by Haohui Mai. (wheat9: rev 46f6f9d60d0a2c1f441a0e81a071b08c24dbd6d6) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSClusterStats.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMXBean.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/FSClusterStats.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNamenodeCapacityReport.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicy.java > Move FSClusterStats to o.a.h.h.hdfs.server.blockmanagement > -- > > Key: HDFS-7375 > URL: https://issues.apache.org/jira/browse/HDFS-7375 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > Fix For: 2.7.0 > > Attachments: HDFS-7375.000.patch, HDFS-7375.001.patch > > > {{FSClusterStats}} is a private class that exports statistics for > {{BlockPlacementPolicy}}. This jira proposes moving it to {{ > o.a.h.h.hdfs.server.blockmanagement}} to simplify the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7389) Named user ACL cannot stop the user from accessing the FS entity.
[ https://issues.apache.org/jira/browse/HDFS-7389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208128#comment-14208128 ] Hudson commented on HDFS-7389: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #3 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/3/]) HDFS-7389. Named user ACL cannot stop the user from accessing the FS entity. Contributed by Vinayakumar B. (cnauroth: rev 163bb55067bde71246b4030a08256ba9a8182dc8) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSAclBaseTest.java > Named user ACL cannot stop the user from accessing the FS entity. > - > > Key: HDFS-7389 > URL: https://issues.apache.org/jira/browse/HDFS-7389 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.5.1 >Reporter: Chunjun Xiao >Assignee: Vinayakumar B > Fix For: 2.7.0 > > Attachments: HDFS-7389-001.patch, HDFS-7389-002.patch > > > In > http://hortonworks.com/blog/hdfs-acls-fine-grained-permissions-hdfs-files-hadoop/: > {quote} > It’s important to keep in mind the order of evaluation for ACL entries when a > user attempts to access a file system object: > 1. If the user is the file owner, then the owner permission bits are enforced. > 2. Else if the user has a named user ACL entry, then those permissions are > enforced. > 3. Else if the user is a member of the file’s group or any named group in an > ACL entry, then the union of permissions for all matching entries are > enforced. (The user may be a member of multiple groups.) > 4. If none of the above were applicable, then the other permission bits are > enforced. > {quote} > Assume we have a user UserA from group GroupA, if we config a directory as > following ACL entries: > group:GroupA:rwx > user:UserA:--- > According to the design spec above, userA should have no access permission to > the file object, while actually userA still has rwx access to the dir. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7387) NFS may only do partial commit due to a race between COMMIT and write
[ https://issues.apache.org/jira/browse/HDFS-7387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208135#comment-14208135 ] Hudson commented on HDFS-7387: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #3 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/3/]) HDFS-7387. NFS may only do partial commit due to a race between COMMIT and write. Contributed by Brandon Li (brandonli: rev 99d9d0c2d19b9f161b765947f3fb64619ea58090) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java * hadoop-hdfs-project/hadoop-hdfs-nfs/src/test/java/org/apache/hadoop/hdfs/nfs/nfs3/TestWrites.java > NFS may only do partial commit due to a race between COMMIT and write > - > > Key: HDFS-7387 > URL: https://issues.apache.org/jira/browse/HDFS-7387 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs >Affects Versions: 2.6.0 >Reporter: Brandon Li >Assignee: Brandon Li >Priority: Critical > Fix For: 2.7.0 > > Attachments: HDFS-7387.001.patch, HDFS-7387.002.patch > > > The requested range may not be committed when the following happens: > 1. the last pending write is removed from the queue to write to hdfs > 2. a commit request arrives, NFS sees there is not pending write, and it will > do a sync > 3. this sync request could flush only part of the last write to hdfs > 4. if a file read happens immediately after the above steps, the user may not > see all the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7381) Decouple the management of block id and gen stamps from FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-7381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208130#comment-14208130 ] Hudson commented on HDFS-7381: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #3 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/3/]) HDFS-7381. Decouple the management of block id and gen stamps from FSNamesystem. Contributed by Haohui Mai. (wheat9: rev 571e9c623241106dad5521a870fb8daef3f2b00a) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SequentialBlockIdGenerator.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestSequentialBlockId.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSequentialBlockId.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockIdManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSaveNamespace.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/SequentialBlockIdGenerator.java > Decouple the management of block id and gen stamps from FSNamesystem > > > Key: HDFS-7381 > URL: https://issues.apache.org/jira/browse/HDFS-7381 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > Fix For: 2.7.0 > > Attachments: HDFS-7381.000.patch > > > The block layer should be responsible of managing block ids and generation > stamps. Currently the functionality is misplace into {{FSNamesystem}}. > This jira proposes to decouple them from the {{FSNamesystem}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7387) NFS may only do partial commit due to a race between COMMIT and write
[ https://issues.apache.org/jira/browse/HDFS-7387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208114#comment-14208114 ] Hudson commented on HDFS-7387: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1955 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1955/]) HDFS-7387. NFS may only do partial commit due to a race between COMMIT and write. Contributed by Brandon Li (brandonli: rev 99d9d0c2d19b9f161b765947f3fb64619ea58090) * hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-nfs/src/test/java/org/apache/hadoop/hdfs/nfs/nfs3/TestWrites.java > NFS may only do partial commit due to a race between COMMIT and write > - > > Key: HDFS-7387 > URL: https://issues.apache.org/jira/browse/HDFS-7387 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs >Affects Versions: 2.6.0 >Reporter: Brandon Li >Assignee: Brandon Li >Priority: Critical > Fix For: 2.7.0 > > Attachments: HDFS-7387.001.patch, HDFS-7387.002.patch > > > The requested range may not be committed when the following happens: > 1. the last pending write is removed from the queue to write to hdfs > 2. a commit request arrives, NFS sees there is not pending write, and it will > do a sync > 3. this sync request could flush only part of the last write to hdfs > 4. if a file read happens immediately after the above steps, the user may not > see all the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7375) Move FSClusterStats to o.a.h.h.hdfs.server.blockmanagement
[ https://issues.apache.org/jira/browse/HDFS-7375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208110#comment-14208110 ] Hudson commented on HDFS-7375: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1955 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1955/]) HDFS-7375. Move FSClusterStats to o.a.h.h.hdfs.server.blockmanagement. Contributed by Haohui Mai. (wheat9: rev 46f6f9d60d0a2c1f441a0e81a071b08c24dbd6d6) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMXBean.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNamenodeCapacityReport.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSClusterStats.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/FSClusterStats.java > Move FSClusterStats to o.a.h.h.hdfs.server.blockmanagement > -- > > Key: HDFS-7375 > URL: https://issues.apache.org/jira/browse/HDFS-7375 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > Fix For: 2.7.0 > > Attachments: HDFS-7375.000.patch, HDFS-7375.001.patch > > > {{FSClusterStats}} is a private class that exports statistics for > {{BlockPlacementPolicy}}. This jira proposes moving it to {{ > o.a.h.h.hdfs.server.blockmanagement}} to simplify the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7381) Decouple the management of block id and gen stamps from FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-7381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208109#comment-14208109 ] Hudson commented on HDFS-7381: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1955 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1955/]) HDFS-7381. Decouple the management of block id and gen stamps from FSNamesystem. Contributed by Haohui Mai. (wheat9: rev 571e9c623241106dad5521a870fb8daef3f2b00a) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockIdManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSequentialBlockId.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SequentialBlockIdGenerator.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSaveNamespace.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestSequentialBlockId.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/SequentialBlockIdGenerator.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Decouple the management of block id and gen stamps from FSNamesystem > > > Key: HDFS-7381 > URL: https://issues.apache.org/jira/browse/HDFS-7381 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > Fix For: 2.7.0 > > Attachments: HDFS-7381.000.patch > > > The block layer should be responsible of managing block ids and generation > stamps. Currently the functionality is misplace into {{FSNamesystem}}. > This jira proposes to decouple them from the {{FSNamesystem}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7389) Named user ACL cannot stop the user from accessing the FS entity.
[ https://issues.apache.org/jira/browse/HDFS-7389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208107#comment-14208107 ] Hudson commented on HDFS-7389: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #1955 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1955/]) HDFS-7389. Named user ACL cannot stop the user from accessing the FS entity. Contributed by Vinayakumar B. (cnauroth: rev 163bb55067bde71246b4030a08256ba9a8182dc8) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSAclBaseTest.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Named user ACL cannot stop the user from accessing the FS entity. > - > > Key: HDFS-7389 > URL: https://issues.apache.org/jira/browse/HDFS-7389 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.5.1 >Reporter: Chunjun Xiao >Assignee: Vinayakumar B > Fix For: 2.7.0 > > Attachments: HDFS-7389-001.patch, HDFS-7389-002.patch > > > In > http://hortonworks.com/blog/hdfs-acls-fine-grained-permissions-hdfs-files-hadoop/: > {quote} > It’s important to keep in mind the order of evaluation for ACL entries when a > user attempts to access a file system object: > 1. If the user is the file owner, then the owner permission bits are enforced. > 2. Else if the user has a named user ACL entry, then those permissions are > enforced. > 3. Else if the user is a member of the file’s group or any named group in an > ACL entry, then the union of permissions for all matching entries are > enforced. (The user may be a member of multiple groups.) > 4. If none of the above were applicable, then the other permission bits are > enforced. > {quote} > Assume we have a user UserA from group GroupA, if we config a directory as > following ACL entries: > group:GroupA:rwx > user:UserA:--- > According to the design spec above, userA should have no access permission to > the file object, while actually userA still has rwx access to the dir. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7353) Common Erasure Coder API
[ https://issues.apache.org/jira/browse/HDFS-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng updated HDFS-7353: Summary: Common Erasure Coder API (was: Common Erasure Codec API and plugin support) > Common Erasure Coder API > > > Key: HDFS-7353 > URL: https://issues.apache.org/jira/browse/HDFS-7353 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Kai Zheng > Fix For: HDFS-EC > > > This is to abstract and define common codec API across different codec > algorithms like RS, XOR and etc. Such API can be implemented by utilizing > various library support, such as Intel ISA library and Jerasure library. It > provides default implementation and also allows to plugin vendor specific > ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7353) Common Erasure Coder API
[ https://issues.apache.org/jira/browse/HDFS-7353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng updated HDFS-7353: Description: This is to abstract and define common coder API across different codec algorithms like RS, XOR and etc. Such API can be implemented by utilizing various library support, such as Intel ISA library and Jerasure library. It provides default implementation and also allows to plugin vendor specific ones. (was: This is to abstract and define common codec API across different codec algorithms like RS, XOR and etc. Such API can be implemented by utilizing various library support, such as Intel ISA library and Jerasure library. It provides default implementation and also allows to plugin vendor specific ones.) > Common Erasure Coder API > > > Key: HDFS-7353 > URL: https://issues.apache.org/jira/browse/HDFS-7353 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: Kai Zheng > Fix For: HDFS-EC > > > This is to abstract and define common coder API across different codec > algorithms like RS, XOR and etc. Such API can be implemented by utilizing > various library support, such as Intel ISA library and Jerasure library. It > provides default implementation and also allows to plugin vendor specific > ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7393) TestDFSUpgradeFromImage#testUpgradeFromCorruptRel22Image fails in trunk
Ted Yu created HDFS-7393: Summary: TestDFSUpgradeFromImage#testUpgradeFromCorruptRel22Image fails in trunk Key: HDFS-7393 URL: https://issues.apache.org/jira/browse/HDFS-7393 Project: Hadoop HDFS Issue Type: Test Reporter: Ted Yu Priority: Minor The following is reproducible: {code} Running org.apache.hadoop.hdfs.TestDFSUpgradeFromImage Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 12.017 sec <<< FAILURE! - in org.apache.hadoop.hdfs.TestDFSUpgradeFromImage testUpgradeFromCorruptRel22Image(org.apache.hadoop.hdfs.TestDFSUpgradeFromImage) Time elapsed: 1.005 sec <<< ERROR! java.lang.IllegalStateException: null at com.google.common.base.Preconditions.checkState(Preconditions.java:129) at org.apache.hadoop.hdfs.server.blockmanagement.BlockIdManager.setGenerationStampV1Limit(BlockIdManager.java:85) at org.apache.hadoop.hdfs.server.blockmanagement.BlockIdManager.clear(BlockIdManager.java:206) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.clear(FSNamesystem.java:622) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:667) at org.apache.hadoop.hdfs.server.namenode.FSImage.doUpgrade(FSImage.java:376) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:268) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:991) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:714) at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:537) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:596) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:763) at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:747) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1443) at org.apache.hadoop.hdfs.MiniDFSCluster.createNameNode(MiniDFSCluster.java:1104) at org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:975) at org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:804) at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:465) at org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:424) at org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.upgradeAndVerify(TestDFSUpgradeFromImage.java:582) at org.apache.hadoop.hdfs.TestDFSUpgradeFromImage.testUpgradeFromCorruptRel22Image(TestDFSUpgradeFromImage.java:318) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7375) Move FSClusterStats to o.a.h.h.hdfs.server.blockmanagement
[ https://issues.apache.org/jira/browse/HDFS-7375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208054#comment-14208054 ] Hudson commented on HDFS-7375: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1931 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1931/]) HDFS-7375. Move FSClusterStats to o.a.h.h.hdfs.server.blockmanagement. Contributed by Haohui Mai. (wheat9: rev 46f6f9d60d0a2c1f441a0e81a071b08c24dbd6d6) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSClusterStats.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMXBean.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/FSClusterStats.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNamenodeCapacityReport.java > Move FSClusterStats to o.a.h.h.hdfs.server.blockmanagement > -- > > Key: HDFS-7375 > URL: https://issues.apache.org/jira/browse/HDFS-7375 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > Fix For: 2.7.0 > > Attachments: HDFS-7375.000.patch, HDFS-7375.001.patch > > > {{FSClusterStats}} is a private class that exports statistics for > {{BlockPlacementPolicy}}. This jira proposes moving it to {{ > o.a.h.h.hdfs.server.blockmanagement}} to simplify the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7387) NFS may only do partial commit due to a race between COMMIT and write
[ https://issues.apache.org/jira/browse/HDFS-7387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208058#comment-14208058 ] Hudson commented on HDFS-7387: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1931 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1931/]) HDFS-7387. NFS may only do partial commit due to a race between COMMIT and write. Contributed by Brandon Li (brandonli: rev 99d9d0c2d19b9f161b765947f3fb64619ea58090) * hadoop-hdfs-project/hadoop-hdfs-nfs/src/test/java/org/apache/hadoop/hdfs/nfs/nfs3/TestWrites.java * hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > NFS may only do partial commit due to a race between COMMIT and write > - > > Key: HDFS-7387 > URL: https://issues.apache.org/jira/browse/HDFS-7387 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs >Affects Versions: 2.6.0 >Reporter: Brandon Li >Assignee: Brandon Li >Priority: Critical > Fix For: 2.7.0 > > Attachments: HDFS-7387.001.patch, HDFS-7387.002.patch > > > The requested range may not be committed when the following happens: > 1. the last pending write is removed from the queue to write to hdfs > 2. a commit request arrives, NFS sees there is not pending write, and it will > do a sync > 3. this sync request could flush only part of the last write to hdfs > 4. if a file read happens immediately after the above steps, the user may not > see all the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7381) Decouple the management of block id and gen stamps from FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-7381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208053#comment-14208053 ] Hudson commented on HDFS-7381: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1931 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1931/]) HDFS-7381. Decouple the management of block id and gen stamps from FSNamesystem. Contributed by Haohui Mai. (wheat9: rev 571e9c623241106dad5521a870fb8daef3f2b00a) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSaveNamespace.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SequentialBlockIdGenerator.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSequentialBlockId.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/SequentialBlockIdGenerator.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockIdManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestSequentialBlockId.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java > Decouple the management of block id and gen stamps from FSNamesystem > > > Key: HDFS-7381 > URL: https://issues.apache.org/jira/browse/HDFS-7381 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > Fix For: 2.7.0 > > Attachments: HDFS-7381.000.patch > > > The block layer should be responsible of managing block ids and generation > stamps. Currently the functionality is misplace into {{FSNamesystem}}. > This jira proposes to decouple them from the {{FSNamesystem}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7389) Named user ACL cannot stop the user from accessing the FS entity.
[ https://issues.apache.org/jira/browse/HDFS-7389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208051#comment-14208051 ] Hudson commented on HDFS-7389: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1931 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1931/]) HDFS-7389. Named user ACL cannot stop the user from accessing the FS entity. Contributed by Vinayakumar B. (cnauroth: rev 163bb55067bde71246b4030a08256ba9a8182dc8) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSAclBaseTest.java > Named user ACL cannot stop the user from accessing the FS entity. > - > > Key: HDFS-7389 > URL: https://issues.apache.org/jira/browse/HDFS-7389 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.5.1 >Reporter: Chunjun Xiao >Assignee: Vinayakumar B > Fix For: 2.7.0 > > Attachments: HDFS-7389-001.patch, HDFS-7389-002.patch > > > In > http://hortonworks.com/blog/hdfs-acls-fine-grained-permissions-hdfs-files-hadoop/: > {quote} > It’s important to keep in mind the order of evaluation for ACL entries when a > user attempts to access a file system object: > 1. If the user is the file owner, then the owner permission bits are enforced. > 2. Else if the user has a named user ACL entry, then those permissions are > enforced. > 3. Else if the user is a member of the file’s group or any named group in an > ACL entry, then the union of permissions for all matching entries are > enforced. (The user may be a member of multiple groups.) > 4. If none of the above were applicable, then the other permission bits are > enforced. > {quote} > Assume we have a user UserA from group GroupA, if we config a directory as > following ACL entries: > group:GroupA:rwx > user:UserA:--- > According to the design spec above, userA should have no access permission to > the file object, while actually userA still has rwx access to the dir. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7389) Named user ACL cannot stop the user from accessing the FS entity.
[ https://issues.apache.org/jira/browse/HDFS-7389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208042#comment-14208042 ] Hudson commented on HDFS-7389: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #3 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/3/]) HDFS-7389. Named user ACL cannot stop the user from accessing the FS entity. Contributed by Vinayakumar B. (cnauroth: rev 163bb55067bde71246b4030a08256ba9a8182dc8) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSAclBaseTest.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java > Named user ACL cannot stop the user from accessing the FS entity. > - > > Key: HDFS-7389 > URL: https://issues.apache.org/jira/browse/HDFS-7389 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.5.1 >Reporter: Chunjun Xiao >Assignee: Vinayakumar B > Fix For: 2.7.0 > > Attachments: HDFS-7389-001.patch, HDFS-7389-002.patch > > > In > http://hortonworks.com/blog/hdfs-acls-fine-grained-permissions-hdfs-files-hadoop/: > {quote} > It’s important to keep in mind the order of evaluation for ACL entries when a > user attempts to access a file system object: > 1. If the user is the file owner, then the owner permission bits are enforced. > 2. Else if the user has a named user ACL entry, then those permissions are > enforced. > 3. Else if the user is a member of the file’s group or any named group in an > ACL entry, then the union of permissions for all matching entries are > enforced. (The user may be a member of multiple groups.) > 4. If none of the above were applicable, then the other permission bits are > enforced. > {quote} > Assume we have a user UserA from group GroupA, if we config a directory as > following ACL entries: > group:GroupA:rwx > user:UserA:--- > According to the design spec above, userA should have no access permission to > the file object, while actually userA still has rwx access to the dir. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7381) Decouple the management of block id and gen stamps from FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-7381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208044#comment-14208044 ] Hudson commented on HDFS-7381: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #3 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/3/]) HDFS-7381. Decouple the management of block id and gen stamps from FSNamesystem. Contributed by Haohui Mai. (wheat9: rev 571e9c623241106dad5521a870fb8daef3f2b00a) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestSequentialBlockId.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/SequentialBlockIdGenerator.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSequentialBlockId.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockIdManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSaveNamespace.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SequentialBlockIdGenerator.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Decouple the management of block id and gen stamps from FSNamesystem > > > Key: HDFS-7381 > URL: https://issues.apache.org/jira/browse/HDFS-7381 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > Fix For: 2.7.0 > > Attachments: HDFS-7381.000.patch > > > The block layer should be responsible of managing block ids and generation > stamps. Currently the functionality is misplace into {{FSNamesystem}}. > This jira proposes to decouple them from the {{FSNamesystem}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7387) NFS may only do partial commit due to a race between COMMIT and write
[ https://issues.apache.org/jira/browse/HDFS-7387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208049#comment-14208049 ] Hudson commented on HDFS-7387: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #3 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/3/]) HDFS-7387. NFS may only do partial commit due to a race between COMMIT and write. Contributed by Brandon Li (brandonli: rev 99d9d0c2d19b9f161b765947f3fb64619ea58090) * hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-nfs/src/test/java/org/apache/hadoop/hdfs/nfs/nfs3/TestWrites.java > NFS may only do partial commit due to a race between COMMIT and write > - > > Key: HDFS-7387 > URL: https://issues.apache.org/jira/browse/HDFS-7387 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs >Affects Versions: 2.6.0 >Reporter: Brandon Li >Assignee: Brandon Li >Priority: Critical > Fix For: 2.7.0 > > Attachments: HDFS-7387.001.patch, HDFS-7387.002.patch > > > The requested range may not be committed when the following happens: > 1. the last pending write is removed from the queue to write to hdfs > 2. a commit request arrives, NFS sees there is not pending write, and it will > do a sync > 3. this sync request could flush only part of the last write to hdfs > 4. if a file read happens immediately after the above steps, the user may not > see all the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7375) Move FSClusterStats to o.a.h.h.hdfs.server.blockmanagement
[ https://issues.apache.org/jira/browse/HDFS-7375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208045#comment-14208045 ] Hudson commented on HDFS-7375: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #3 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/3/]) HDFS-7375. Move FSClusterStats to o.a.h.h.hdfs.server.blockmanagement. Contributed by Haohui Mai. (wheat9: rev 46f6f9d60d0a2c1f441a0e81a071b08c24dbd6d6) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/FSClusterStats.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMXBean.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNamenodeCapacityReport.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSClusterStats.java > Move FSClusterStats to o.a.h.h.hdfs.server.blockmanagement > -- > > Key: HDFS-7375 > URL: https://issues.apache.org/jira/browse/HDFS-7375 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > Fix For: 2.7.0 > > Attachments: HDFS-7375.000.patch, HDFS-7375.001.patch > > > {{FSClusterStats}} is a private class that exports statistics for > {{BlockPlacementPolicy}}. This jira proposes moving it to {{ > o.a.h.h.hdfs.server.blockmanagement}} to simplify the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7392) org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever
[ https://issues.apache.org/jira/browse/HDFS-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frantisek Vacek updated HDFS-7392: -- Description: In some specific circumstances, org.apache.hadoop.hdfs.DistributedFileSystem.open(invalid URI) never timeouts and last forever. What are specific circumstances: 1) HDFS URI (hdfs://share.merck.com:8020/someDir/someFile.txt) should point to valid IP address but without name node service running on it. 2) There should be at least 2 IP addresses for such a URI. See output below: {quote} [~/proj/quickbox]$ nslookup share.merck.com Server: 127.0.1.1 Address:127.0.1.1#53 share.merck.com canonical name = internal-gicprg-share-merck-com-1538706884.us-east-1.elb.amazonaws.com. Name: internal-gicprg-share-merck-com-1538706884.us-east-1.elb.amazonaws.com Address: 54.40.29.223 Name: internal-gicprg-share-merck-com-1538706884.us-east-1.elb.amazonaws.com Address: 54.40.29.65 {quote} In such a case the org.apache.hadoop.ipc.Client.Connection.updateAddress() returns sometimes true (even if address didn't actually changed see img. 1) and the timeoutFailures counter is set to 0 (see img. 2). The maxRetriesOnSocketTimeouts (45) is never reached and connection attempt is repeated forever. was: In some specific circumstances, org.apache.hadoop.hdfs.DistributedFileSystem.open(invalid URI) never timeouts and last forever. What are specific circumstances: 1) HDFS URI (hdfs://share.merck.com:8020/someDir/someFile.txt) should point to valid IP address but without name node service running on it. 2) There should be at least 2 IP addresses for such a URI. See output below: [~/proj/quickbox]$ nslookup share.merck.com Server: 127.0.1.1 Address:127.0.1.1#53 share.merck.com canonical name = internal-gicprg-share-merck-com-1538706884.us-east-1.elb.amazonaws.com. Name: internal-gicprg-share-merck-com-1538706884.us-east-1.elb.amazonaws.com Address: 54.40.29.223 Name: internal-gicprg-share-merck-com-1538706884.us-east-1.elb.amazonaws.com Address: 54.40.29.65 In such a case the org.apache.hadoop.ipc.Client.Connection.updateAddress() returns sometimes true (even if address didn't actually changed see img. 1) and the timeoutFailures counter is set to 0 (see img. 2). The maxRetriesOnSocketTimeouts (45) is never reached and connection attempt is repeated forever. > org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever > - > > Key: HDFS-7392 > URL: https://issues.apache.org/jira/browse/HDFS-7392 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Frantisek Vacek >Priority: Critical > Attachments: 1.png, 2.png > > > In some specific circumstances, > org.apache.hadoop.hdfs.DistributedFileSystem.open(invalid URI) never timeouts > and last forever. > What are specific circumstances: > 1) HDFS URI (hdfs://share.merck.com:8020/someDir/someFile.txt) should point > to valid IP address but without name node service running on it. > 2) There should be at least 2 IP addresses for such a URI. See output below: > {quote} > [~/proj/quickbox]$ nslookup share.merck.com > Server: 127.0.1.1 > Address:127.0.1.1#53 > share.merck.com canonical name = > internal-gicprg-share-merck-com-1538706884.us-east-1.elb.amazonaws.com. > Name: internal-gicprg-share-merck-com-1538706884.us-east-1.elb.amazonaws.com > Address: 54.40.29.223 > Name: internal-gicprg-share-merck-com-1538706884.us-east-1.elb.amazonaws.com > Address: 54.40.29.65 > {quote} > In such a case the org.apache.hadoop.ipc.Client.Connection.updateAddress() > returns sometimes true (even if address didn't actually changed see img. 1) > and the timeoutFailures counter is set to 0 (see img. 2). The > maxRetriesOnSocketTimeouts (45) is never reached and connection attempt is > repeated forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7392) org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever
[ https://issues.apache.org/jira/browse/HDFS-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frantisek Vacek updated HDFS-7392: -- Attachment: 2.png 1.png > org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever > - > > Key: HDFS-7392 > URL: https://issues.apache.org/jira/browse/HDFS-7392 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs-client >Reporter: Frantisek Vacek >Priority: Critical > Attachments: 1.png, 2.png > > > In some specific circumstances, > org.apache.hadoop.hdfs.DistributedFileSystem.open(invalid URI) never timeouts > and last forever. > What are specific circumstances: > 1) HDFS URI (hdfs://share.merck.com:8020/someDir/someFile.txt) should point > to valid IP address but without name node service running on it. > 2) There should be at least 2 IP addresses for such a URI. See output below: > [~/proj/quickbox]$ nslookup share.merck.com > Server: 127.0.1.1 > Address:127.0.1.1#53 > share.merck.com canonical name = > internal-gicprg-share-merck-com-1538706884.us-east-1.elb.amazonaws.com. > Name: internal-gicprg-share-merck-com-1538706884.us-east-1.elb.amazonaws.com > Address: 54.40.29.223 > Name: internal-gicprg-share-merck-com-1538706884.us-east-1.elb.amazonaws.com > Address: 54.40.29.65 > In such a case the org.apache.hadoop.ipc.Client.Connection.updateAddress() > returns sometimes true (even if address didn't actually changed see img. 1) > and the timeoutFailures counter is set to 0 (see img. 2). The > maxRetriesOnSocketTimeouts (45) is never reached and connection attempt is > repeated forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7392) org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever
Frantisek Vacek created HDFS-7392: - Summary: org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever Key: HDFS-7392 URL: https://issues.apache.org/jira/browse/HDFS-7392 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Reporter: Frantisek Vacek Priority: Critical In some specific circumstances, org.apache.hadoop.hdfs.DistributedFileSystem.open(invalid URI) never timeouts and last forever. What are specific circumstances: 1) HDFS URI (hdfs://share.merck.com:8020/someDir/someFile.txt) should point to valid IP address but without name node service running on it. 2) There should be at least 2 IP addresses for such a URI. See output below: [~/proj/quickbox]$ nslookup share.merck.com Server: 127.0.1.1 Address:127.0.1.1#53 share.merck.com canonical name = internal-gicprg-share-merck-com-1538706884.us-east-1.elb.amazonaws.com. Name: internal-gicprg-share-merck-com-1538706884.us-east-1.elb.amazonaws.com Address: 54.40.29.223 Name: internal-gicprg-share-merck-com-1538706884.us-east-1.elb.amazonaws.com Address: 54.40.29.65 In such a case the org.apache.hadoop.ipc.Client.Connection.updateAddress() returns sometimes true (even if address didn't actually changed see img. 1) and the timeoutFailures counter is set to 0 (see img. 2). The maxRetriesOnSocketTimeouts (45) is never reached and connection attempt is repeated forever. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7381) Decouple the management of block id and gen stamps from FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-7381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207961#comment-14207961 ] Hudson commented on HDFS-7381: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #741 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/741/]) HDFS-7381. Decouple the management of block id and gen stamps from FSNamesystem. Contributed by Haohui Mai. (wheat9: rev 571e9c623241106dad5521a870fb8daef3f2b00a) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SequentialBlockIdGenerator.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/SequentialBlockIdGenerator.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSaveNamespace.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestSequentialBlockId.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSequentialBlockId.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockIdManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java > Decouple the management of block id and gen stamps from FSNamesystem > > > Key: HDFS-7381 > URL: https://issues.apache.org/jira/browse/HDFS-7381 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > Fix For: 2.7.0 > > Attachments: HDFS-7381.000.patch > > > The block layer should be responsible of managing block ids and generation > stamps. Currently the functionality is misplace into {{FSNamesystem}}. > This jira proposes to decouple them from the {{FSNamesystem}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7375) Move FSClusterStats to o.a.h.h.hdfs.server.blockmanagement
[ https://issues.apache.org/jira/browse/HDFS-7375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207962#comment-14207962 ] Hudson commented on HDFS-7375: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #741 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/741/]) HDFS-7375. Move FSClusterStats to o.a.h.h.hdfs.server.blockmanagement. Contributed by Haohui Mai. (wheat9: rev 46f6f9d60d0a2c1f441a0e81a071b08c24dbd6d6) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSClusterStats.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/FSClusterStats.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMXBean.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNamenodeCapacityReport.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java > Move FSClusterStats to o.a.h.h.hdfs.server.blockmanagement > -- > > Key: HDFS-7375 > URL: https://issues.apache.org/jira/browse/HDFS-7375 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > Fix For: 2.7.0 > > Attachments: HDFS-7375.000.patch, HDFS-7375.001.patch > > > {{FSClusterStats}} is a private class that exports statistics for > {{BlockPlacementPolicy}}. This jira proposes moving it to {{ > o.a.h.h.hdfs.server.blockmanagement}} to simplify the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7389) Named user ACL cannot stop the user from accessing the FS entity.
[ https://issues.apache.org/jira/browse/HDFS-7389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207959#comment-14207959 ] Hudson commented on HDFS-7389: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #741 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/741/]) HDFS-7389. Named user ACL cannot stop the user from accessing the FS entity. Contributed by Vinayakumar B. (cnauroth: rev 163bb55067bde71246b4030a08256ba9a8182dc8) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSAclBaseTest.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java > Named user ACL cannot stop the user from accessing the FS entity. > - > > Key: HDFS-7389 > URL: https://issues.apache.org/jira/browse/HDFS-7389 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.5.1 >Reporter: Chunjun Xiao >Assignee: Vinayakumar B > Fix For: 2.7.0 > > Attachments: HDFS-7389-001.patch, HDFS-7389-002.patch > > > In > http://hortonworks.com/blog/hdfs-acls-fine-grained-permissions-hdfs-files-hadoop/: > {quote} > It’s important to keep in mind the order of evaluation for ACL entries when a > user attempts to access a file system object: > 1. If the user is the file owner, then the owner permission bits are enforced. > 2. Else if the user has a named user ACL entry, then those permissions are > enforced. > 3. Else if the user is a member of the file’s group or any named group in an > ACL entry, then the union of permissions for all matching entries are > enforced. (The user may be a member of multiple groups.) > 4. If none of the above were applicable, then the other permission bits are > enforced. > {quote} > Assume we have a user UserA from group GroupA, if we config a directory as > following ACL entries: > group:GroupA:rwx > user:UserA:--- > According to the design spec above, userA should have no access permission to > the file object, while actually userA still has rwx access to the dir. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7387) NFS may only do partial commit due to a race between COMMIT and write
[ https://issues.apache.org/jira/browse/HDFS-7387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207966#comment-14207966 ] Hudson commented on HDFS-7387: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #741 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/741/]) HDFS-7387. NFS may only do partial commit due to a race between COMMIT and write. Contributed by Brandon Li (brandonli: rev 99d9d0c2d19b9f161b765947f3fb64619ea58090) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-nfs/src/test/java/org/apache/hadoop/hdfs/nfs/nfs3/TestWrites.java * hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java > NFS may only do partial commit due to a race between COMMIT and write > - > > Key: HDFS-7387 > URL: https://issues.apache.org/jira/browse/HDFS-7387 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs >Affects Versions: 2.6.0 >Reporter: Brandon Li >Assignee: Brandon Li >Priority: Critical > Fix For: 2.7.0 > > Attachments: HDFS-7387.001.patch, HDFS-7387.002.patch > > > The requested range may not be committed when the following happens: > 1. the last pending write is removed from the queue to write to hdfs > 2. a commit request arrives, NFS sees there is not pending write, and it will > do a sync > 3. this sync request could flush only part of the last write to hdfs > 4. if a file read happens immediately after the above steps, the user may not > see all the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7389) Named user ACL cannot stop the user from accessing the FS entity.
[ https://issues.apache.org/jira/browse/HDFS-7389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207947#comment-14207947 ] Hudson commented on HDFS-7389: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #3 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/3/]) HDFS-7389. Named user ACL cannot stop the user from accessing the FS entity. Contributed by Vinayakumar B. (cnauroth: rev 163bb55067bde71246b4030a08256ba9a8182dc8) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/FSAclBaseTest.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt > Named user ACL cannot stop the user from accessing the FS entity. > - > > Key: HDFS-7389 > URL: https://issues.apache.org/jira/browse/HDFS-7389 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.5.1 >Reporter: Chunjun Xiao >Assignee: Vinayakumar B > Fix For: 2.7.0 > > Attachments: HDFS-7389-001.patch, HDFS-7389-002.patch > > > In > http://hortonworks.com/blog/hdfs-acls-fine-grained-permissions-hdfs-files-hadoop/: > {quote} > It’s important to keep in mind the order of evaluation for ACL entries when a > user attempts to access a file system object: > 1. If the user is the file owner, then the owner permission bits are enforced. > 2. Else if the user has a named user ACL entry, then those permissions are > enforced. > 3. Else if the user is a member of the file’s group or any named group in an > ACL entry, then the union of permissions for all matching entries are > enforced. (The user may be a member of multiple groups.) > 4. If none of the above were applicable, then the other permission bits are > enforced. > {quote} > Assume we have a user UserA from group GroupA, if we config a directory as > following ACL entries: > group:GroupA:rwx > user:UserA:--- > According to the design spec above, userA should have no access permission to > the file object, while actually userA still has rwx access to the dir. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7387) NFS may only do partial commit due to a race between COMMIT and write
[ https://issues.apache.org/jira/browse/HDFS-7387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207954#comment-14207954 ] Hudson commented on HDFS-7387: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #3 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/3/]) HDFS-7387. NFS may only do partial commit due to a race between COMMIT and write. Contributed by Brandon Li (brandonli: rev 99d9d0c2d19b9f161b765947f3fb64619ea58090) * hadoop-hdfs-project/hadoop-hdfs-nfs/src/test/java/org/apache/hadoop/hdfs/nfs/nfs3/TestWrites.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-nfs/src/main/java/org/apache/hadoop/hdfs/nfs/nfs3/OpenFileCtx.java > NFS may only do partial commit due to a race between COMMIT and write > - > > Key: HDFS-7387 > URL: https://issues.apache.org/jira/browse/HDFS-7387 > Project: Hadoop HDFS > Issue Type: Bug > Components: nfs >Affects Versions: 2.6.0 >Reporter: Brandon Li >Assignee: Brandon Li >Priority: Critical > Fix For: 2.7.0 > > Attachments: HDFS-7387.001.patch, HDFS-7387.002.patch > > > The requested range may not be committed when the following happens: > 1. the last pending write is removed from the queue to write to hdfs > 2. a commit request arrives, NFS sees there is not pending write, and it will > do a sync > 3. this sync request could flush only part of the last write to hdfs > 4. if a file read happens immediately after the above steps, the user may not > see all the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7381) Decouple the management of block id and gen stamps from FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-7381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207949#comment-14207949 ] Hudson commented on HDFS-7381: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #3 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/3/]) HDFS-7381. Decouple the management of block id and gen stamps from FSNamesystem. Contributed by Haohui Mai. (wheat9: rev 571e9c623241106dad5521a870fb8daef3f2b00a) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSequentialBlockId.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestSequentialBlockId.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormat.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/SequentialBlockIdGenerator.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockIdManager.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestSaveNamespace.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/SequentialBlockIdGenerator.java > Decouple the management of block id and gen stamps from FSNamesystem > > > Key: HDFS-7381 > URL: https://issues.apache.org/jira/browse/HDFS-7381 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > Fix For: 2.7.0 > > Attachments: HDFS-7381.000.patch > > > The block layer should be responsible of managing block ids and generation > stamps. Currently the functionality is misplace into {{FSNamesystem}}. > This jira proposes to decouple them from the {{FSNamesystem}} class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7375) Move FSClusterStats to o.a.h.h.hdfs.server.blockmanagement
[ https://issues.apache.org/jira/browse/HDFS-7375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207950#comment-14207950 ] Hudson commented on HDFS-7375: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #3 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/3/]) HDFS-7375. Move FSClusterStats to o.a.h.h.hdfs.server.blockmanagement. Contributed by Haohui Mai. (wheat9: rev 46f6f9d60d0a2c1f441a0e81a071b08c24dbd6d6) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockManager.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyWithNodeGroup.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeManager.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicyConsiderLoad.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNamenodeCapacityReport.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/FSClusterStats.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeMXBean.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestReplicationPolicy.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockPlacementPolicyDefault.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSClusterStats.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java > Move FSClusterStats to o.a.h.h.hdfs.server.blockmanagement > -- > > Key: HDFS-7375 > URL: https://issues.apache.org/jira/browse/HDFS-7375 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haohui Mai >Assignee: Haohui Mai > Fix For: 2.7.0 > > Attachments: HDFS-7375.000.patch, HDFS-7375.001.patch > > > {{FSClusterStats}} is a private class that exports statistics for > {{BlockPlacementPolicy}}. This jira proposes moving it to {{ > o.a.h.h.hdfs.server.blockmanagement}} to simplify the code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7017) Implement OutputStream for libhdfs3
[ https://issues.apache.org/jira/browse/HDFS-7017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207937#comment-14207937 ] Zhanwei Wang commented on HDFS-7017: Sorry for coming back on this jira late. I add interface LeaseManager for LeaseManagerImpl to implement a mock object and do the unit test. I use google mock to implement the mock object. Using interface class is recommended by google mock. https://code.google.com/p/googlemock/wiki/V1_7_ForDummies google mock is a good mock framework and works well with google test. Hand writing another mock framework is duplicated work and waste time. Fault injection is another thing, I use a tool to do fault injection test without modifying source code, it hooks the functions at runtime. Since it is a internal tool, I canon open source related code. I agree that this indirection makes the code hard to follow, Colin, would you please recommend a better way to do such unit test? Making LeaseManager owned by the hdfsFS instance is better. Although it may introduce more threads if the client connect to many file system instance. I think it is ok. I will make this change and separate packet memory pool into another jira. > Implement OutputStream for libhdfs3 > --- > > Key: HDFS-7017 > URL: https://issues.apache.org/jira/browse/HDFS-7017 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client >Reporter: Zhanwei Wang >Assignee: Zhanwei Wang > Attachments: HDFS-7017-pnative.002.patch, > HDFS-7017-pnative.003.patch, HDFS-7017-pnative.004.patch, HDFS-7017.patch > > > Implement pipeline and OutputStream C++ interface -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HDFS-7363) Pluggable algorithms to form block groups in erasure coding
[ https://issues.apache.org/jira/browse/HDFS-7363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng reopened HDFS-7363: - Assignee: Kai Zheng Let me reuse this item as a sub task of HDFS-7337, where BlockGrouper is defined for this purpose as part of a codec. Its role: Given desired data blocks, BlockGrouper calculates and arranges a BlockGroup for encoding. Different code can have different layout about a BlockGroup. In LRC(6, 2, 2), we have 3 child block groups: 2 local groups plus 1 global group; In RS, we have 1 block group. Given a BlockGroup with some blocks missing, BlockGroups also calculates and determines how to recover if recoverable, like using which blocks to recover a missing block. With such information the corresponding ErasureCoder can perform the concrete decoding work. > Pluggable algorithms to form block groups in erasure coding > --- > > Key: HDFS-7363 > URL: https://issues.apache.org/jira/browse/HDFS-7363 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: Kai Zheng > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7337) Configurable and pluggable Erasure Codec and schema
[ https://issues.apache.org/jira/browse/HDFS-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Zheng updated HDFS-7337: Attachment: PluggableErasureCodec.pdf For time saving, I composed this doc to illustrate the erasure codec framework for review. Your feedback is welcome. > Configurable and pluggable Erasure Codec and schema > --- > > Key: HDFS-7337 > URL: https://issues.apache.org/jira/browse/HDFS-7337 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Zhe Zhang >Assignee: Kai Zheng > Attachments: HDFS-7337-prototype-v1.patch, PluggableErasureCodec.pdf > > > According to HDFS-7285 and the design, this considers to support multiple > Erasure Codecs via pluggable approach. It allows to define and configure > multiple codec schemas with different coding algorithms and parameters. The > resultant codec schemas can be utilized and specified via command tool for > different file folders. While design and implement such pluggable framework, > it’s also to implement a concrete codec by default (Reed Solomon) to prove > the framework is useful and workable. Separate JIRA could be opened for the > RS codec implementation. > Note HDFS-7353 will focus on the very low level codec API and implementation > to make concrete vendor libraries transparent to the upper layer. This JIRA > focuses on high level stuffs that interact with configuration, schema and etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7344) Erasure Coding worker and support in DataNode
[ https://issues.apache.org/jira/browse/HDFS-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Bo updated HDFS-7344: Attachment: HDFS ECWorker Design.pdf This is the first draft of low level design doc base on HDFS-7285 and we're happy to incorporate feedbacks under this JIRA. > Erasure Coding worker and support in DataNode > - > > Key: HDFS-7344 > URL: https://issues.apache.org/jira/browse/HDFS-7344 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode >Reporter: Kai Zheng >Assignee: Li Bo > Attachments: HDFS ECWorker Design.pdf > > > According to HDFS-7285 and the design, this handles DataNode side extension > and related support for Erasure Coding, and implements ECWorker. It mainly > covers the following aspects, and separate tasks may be opened to handle each > of them. > * Process encoding work, calculating parity blocks as specified in block > groups and codec schema; > * Process decoding work, recovering data blocks according to block groups and > codec schema; > * Handle client requests for passive recovery blocks data and serving data on > demand while reconstructing; > * Write parity blocks according to storage policy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)