[jira] [Updated] (HDFS-7834) Allow HDFS to bind to ipv6 conditionally
[ https://issues.apache.org/jira/browse/HDFS-7834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-7834: --- Component/s: scripts Allow HDFS to bind to ipv6 conditionally Key: HDFS-7834 URL: https://issues.apache.org/jira/browse/HDFS-7834 Project: Hadoop HDFS Issue Type: Improvement Components: scripts Affects Versions: 2.6.0 Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HDFS-7834-branch-2-0.patch Currently the bash scripts unconditionally add -Djava.net.preferIPv4Stack=true While this was needed a while ago. IPV6 on java works much better now and there should be a way to allow it to bind dual stack if needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-7834) Allow HDFS to bind to ipv6 conditionally
[ https://issues.apache.org/jira/browse/HDFS-7834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335423#comment-14335423 ] Allen Wittenauer edited comment on HDFS-7834 at 2/24/15 9:00 PM: - In trunk, you can set HADOOP_OPTS to something (blank, for example) and set HADOOP_ALLOW_IPV6 to yes. was (Author: aw): In trunk, you can set HADOOP_ALLOW_IPV6. Allow HDFS to bind to ipv6 conditionally Key: HDFS-7834 URL: https://issues.apache.org/jira/browse/HDFS-7834 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 2.7.0 Currently the bash scripts unconditionally add -Djava.net.preferIPv4Stack=true While this was needed a while ago. IPV6 on java works much better now and there should be a way to allow it to bind dual stack if needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7834) Allow HDFS to bind to ipv6 conditionally
[ https://issues.apache.org/jira/browse/HDFS-7834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-7834: --- Fix Version/s: (was: 2.7.0) Allow HDFS to bind to ipv6 conditionally Key: HDFS-7834 URL: https://issues.apache.org/jira/browse/HDFS-7834 Project: Hadoop HDFS Issue Type: Improvement Components: scripts Affects Versions: 2.6.0 Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HDFS-7834-branch-2-0.patch Currently the bash scripts unconditionally add -Djava.net.preferIPv4Stack=true While this was needed a while ago. IPV6 on java works much better now and there should be a way to allow it to bind dual stack if needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7834) Allow HDFS to bind to ipv6 conditionally
[ https://issues.apache.org/jira/browse/HDFS-7834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-7834: Attachment: HDFS-7834-branch-2-0.patch Here's a patch for branch-2. Allow HDFS to bind to ipv6 conditionally Key: HDFS-7834 URL: https://issues.apache.org/jira/browse/HDFS-7834 Project: Hadoop HDFS Issue Type: Improvement Components: scripts Affects Versions: 2.6.0 Reporter: Elliott Clark Assignee: Elliott Clark Attachments: HDFS-7834-branch-2-0.patch Currently the bash scripts unconditionally add -Djava.net.preferIPv4Stack=true While this was needed a while ago. IPV6 on java works much better now and there should be a way to allow it to bind dual stack if needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7818) DataNode throws NPE if the WebHdfs URL does not contain the offset parameter
[ https://issues.apache.org/jira/browse/HDFS-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335483#comment-14335483 ] Eric Payne commented on HDFS-7818: -- Now that I look at it, the patch in HDFS-7818.v3.txt is not exactly correct either. I think that if we want to keep the NULL check in a constructor, it should be done in {{OffsetParam(final Long value)}} instead of {{OffsetParam(final String str)}}, since the latter invokes the former. DataNode throws NPE if the WebHdfs URL does not contain the offset parameter Key: HDFS-7818 URL: https://issues.apache.org/jira/browse/HDFS-7818 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.7.0 Reporter: Eric Payne Assignee: Eric Payne Attachments: HDFS-7818.v1.txt, HDFS-7818.v2.txt, HDFS-7818.v3.txt This is a regression in 2.7 and later. {{hadoop fs -cat}} over webhdfs works, but {{hadoop fs -text}} does not: {code} $ hadoop fs -cat webhdfs://myhost.com/tmp/test.1 ... output ... $ hadoop fs -text webhdfs://myhost.com/tmp/test.1 text: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): null at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:165) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:615) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:463) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:492) ... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7834) Allow HDFS to bind to ipv6 conditionally
[ https://issues.apache.org/jira/browse/HDFS-7834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335423#comment-14335423 ] Allen Wittenauer commented on HDFS-7834: In trunk, you can set HADOOP_ALLOW_IPV6. Allow HDFS to bind to ipv6 conditionally Key: HDFS-7834 URL: https://issues.apache.org/jira/browse/HDFS-7834 Project: Hadoop HDFS Issue Type: Improvement Reporter: Elliott Clark Assignee: Elliott Clark Currently the bash scripts unconditionally add -Djava.net.preferIPv4Stack=true While this was needed a while ago. IPV6 on java works much better now and there should be a way to allow it to bind dual stack if needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager
[ https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-7411: -- Attachment: hdfs-7411.011.patch Sorry for the delay on this everyone, I was on vacation last week. I've implemented Chris D's suggestion (with unit test) that does a per node limit. If the deprecated config key is set, it is used preferentially over the default for the new config key. Nicholas, does this satisfy your criteria? Refactor and improve decommissioning logic into DecommissionManager --- Key: HDFS-7411 URL: https://issues.apache.org/jira/browse/HDFS-7411 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, hdfs-7411.009.patch, hdfs-7411.010.patch, hdfs-7411.011.patch Would be nice to split out decommission logic from DatanodeManager to DecommissionManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-6962) ACLs inheritance conflict with umaskmode
[ https://issues.apache.org/jira/browse/HDFS-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-6962: Target Version/s: 3.0.0 (was: 2.7.0) Hello, [~usrikanth]. Thank you for posting a prototype patch and providing a great written summary. I'm now certain that it's impossible to make this change in a backwards-compatible way in the 2.x line. The biggest challenge is what happens if someone upgrades the client ahead of the NameNode. In that case, neither the client nor the NameNode would apply the umask. Effectively, that means the upgraded client would start creating directories with 777 and files with 666, which of course would compromise security. Another potential issue is that existing users may be accustomed to the behavior of the current implementation, despite this deviation from the POSIX ACL spec. The effect of the proposed change would be to widen access, because it would stop applying umask in certain cases. Users might find it surprising if their default ACLs stopped restricting access after an upgrade, and some would argue that this is a form of incompatibility with existing persistent data (metadata). This is always a fine line, but I do suspect some would see it as an incompatibility. I'm retargeting this to 3.0.0. That means we'll also have the option of creating a much simpler patch, because we'll have freedom to make backwards-incompatible changes. Here are a few notes on the prototype patch, although I suspect it will go in a very different direction for 3.0.0 anyway. # {{CommandWithDestination}}: This change also probably would have constituted a backwards incompatibility. Prior versions create files as 666 filtered by {{fs.permissions.umask-mode}}, not based on the permissions from the source file system. I see from your notes that you were aiming to replicate the behavior you saw on Linux. It might be worthwhile for us to consider doing that for consistency with other file systems, but it would be backwards-incompatible in 2.x. # {{FSDirectory}}: Here, the NameNode is applying umask based on its configured value for {{fs.permissions.umask-mode}}. Unfortunately, this won't work in the general case, because it's not guaranteed that the client and the NameNode are running with the same set of configuration files. They might have different values configured for {{fs.permissions.umask-mode}}, or the client might have overridden it with a -D option on the command line. ACLs inheritance conflict with umaskmode Key: HDFS-6962 URL: https://issues.apache.org/jira/browse/HDFS-6962 Project: Hadoop HDFS Issue Type: Bug Components: security Affects Versions: 2.4.1 Environment: CentOS release 6.5 (Final) Reporter: LINTE Assignee: Srikanth Upputuri Labels: hadoop, security Attachments: HDFS-6962.1.patch In hdfs-site.xml property namedfs.umaskmode/name value027/value /property 1/ Create a directory as superuser bash# hdfs dfs -mkdir /tmp/ACLS 2/ set default ACLs on this directory rwx access for group readwrite and user toto bash# hdfs dfs -setfacl -m default:group:readwrite:rwx /tmp/ACLS bash# hdfs dfs -setfacl -m default:user:toto:rwx /tmp/ACLS 3/ check ACLs /tmp/ACLS/ bash# hdfs dfs -getfacl /tmp/ACLS/ # file: /tmp/ACLS # owner: hdfs # group: hadoop user::rwx group::r-x other::--- default:user::rwx default:user:toto:rwx default:group::r-x default:group:readwrite:rwx default:mask::rwx default:other::--- user::rwx | group::r-x | other::--- matches with the umaskmode defined in hdfs-site.xml, everything ok ! default:group:readwrite:rwx allow readwrite group with rwx access for inhéritance. default:user:toto:rwx allow toto user with rwx access for inhéritance. default:mask::rwx inhéritance mask is rwx, so no mask 4/ Create a subdir to test inheritance of ACL bash# hdfs dfs -mkdir /tmp/ACLS/hdfs 5/ check ACLs /tmp/ACLS/hdfs bash# hdfs dfs -getfacl /tmp/ACLS/hdfs # file: /tmp/ACLS/hdfs # owner: hdfs # group: hadoop user::rwx user:toto:rwx #effective:r-x group::r-x group:readwrite:rwx #effective:r-x mask::r-x other::--- default:user::rwx default:user:toto:rwx default:group::r-x default:group:readwrite:rwx default:mask::rwx default:other::--- Here we can see that the readwrite group has rwx ACL bu only r-x is effective because the mask is r-x (mask::r-x) in spite of default mask for inheritance is set to default:mask::rwx on /tmp/ACLS/ 6/ Modifiy hdfs-site.xml et restart namenode property namedfs.umaskmode/name value010/value /property 7/ Create a subdir to test inheritance of ACL with new parameter umaskmode bash# hdfs dfs -mkdir /tmp/ACLS/hdfs2 8/ Check ACL on /tmp/ACLS/hdfs2 bash# hdfs dfs -getfacl
[jira] [Commented] (HDFS-7818) DataNode throws NPE if the WebHdfs URL does not contain the offset parameter
[ https://issues.apache.org/jira/browse/HDFS-7818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335438#comment-14335438 ] Eric Payne commented on HDFS-7818: -- Thank you for your review, [~wheat9] bq. Maybe it might make more sense to introduce a new method {{getOffset()}} in {{OffsetParam}}. If a {{getOffset()}} method is created instead of handling the NULL case in the constructor as is done in the HDFS-7818.V3.txt patch, won't I also have to change all of the {{offset.getValue()}} calls to {{offset.getOffset()}} in the {{NamenodeWebHdfsMethods}} class? The change in the current patch seems less risky because it catches the NULL case during construction of the object and has less code change. DataNode throws NPE if the WebHdfs URL does not contain the offset parameter Key: HDFS-7818 URL: https://issues.apache.org/jira/browse/HDFS-7818 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Affects Versions: 2.7.0 Reporter: Eric Payne Assignee: Eric Payne Attachments: HDFS-7818.v1.txt, HDFS-7818.v2.txt, HDFS-7818.v3.txt This is a regression in 2.7 and later. {{hadoop fs -cat}} over webhdfs works, but {{hadoop fs -text}} does not: {code} $ hadoop fs -cat webhdfs://myhost.com/tmp/test.1 ... output ... $ hadoop fs -text webhdfs://myhost.com/tmp/test.1 text: org.apache.hadoop.ipc.RemoteException(java.lang.NullPointerException): null at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:165) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:358) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:615) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:463) at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:492) ... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-778) DistributedFileSystem.getFileBlockLocations() may occasionally return numeric ips as hostnames.
[ https://issues.apache.org/jira/browse/HDFS-778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-778: -- Labels: ipv6 (was: ) DistributedFileSystem.getFileBlockLocations() may occasionally return numeric ips as hostnames. --- Key: HDFS-778 URL: https://issues.apache.org/jira/browse/HDFS-778 Project: Hadoop HDFS Issue Type: Bug Reporter: Hong Tang Labels: ipv6 DistributedFileSystem.getFileBlockLocations() may occasionally return numeric ips as hostnames. This seems to be a breach of the FileSystem.getFileBlockLocation() contract: {noformat} /** * Return an array containing hostnames, offset and size of * portions of the given file. For a nonexistent * file or regions, null will be returned. * * This call is most helpful with DFS, where it returns * hostnames of machines that contain the given file. * * The FileSystem will simply return an elt containing 'localhost'. */ public BlockLocation[] getFileBlockLocations(FileStatus file, long start, long len) throws IOException {noformat} One (maybe minor) consequence of this issue is: When a job includes such numeric ips in in its splits' locations, JobTracker would not be able to assign the job's map tasks local to the file blocks. We should either fix the implementation or change the contract. In the latter case, JobTracker needs to be fixed to maintain both the hostnames and ips of the TaskTrackers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7835) make initial sleeptime in locateFollowingBlock configurable for DFSClient.
zhihai xu created HDFS-7835: --- Summary: make initial sleeptime in locateFollowingBlock configurable for DFSClient. Key: HDFS-7835 URL: https://issues.apache.org/jira/browse/HDFS-7835 Project: Hadoop HDFS Issue Type: Improvement Components: dfsclient Reporter: zhihai xu Assignee: zhihai xu Make initial sleeptime in locateFollowingBlock configurable for DFSClient. Current the sleeptime/localTimeout in locateFollowingBlock/completeFile from DFSOutputStream is hard-coded as 400 ms, but retries can be configured by dfs.client.block.write.locateFollowingBlock.retries. We should also make the initial sleeptime configurable to give user more flexibility to control both retry and delay. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7835) make initial sleeptime in locateFollowingBlock configurable for DFSClient.
[ https://issues.apache.org/jira/browse/HDFS-7835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated HDFS-7835: Status: Patch Available (was: Open) make initial sleeptime in locateFollowingBlock configurable for DFSClient. -- Key: HDFS-7835 URL: https://issues.apache.org/jira/browse/HDFS-7835 Project: Hadoop HDFS Issue Type: Improvement Components: dfsclient Reporter: zhihai xu Assignee: zhihai xu Attachments: HDFS-7835.000.patch Make initial sleeptime in locateFollowingBlock configurable for DFSClient. Current the sleeptime/localTimeout in locateFollowingBlock/completeFile from DFSOutputStream is hard-coded as 400 ms, but retries can be configured by dfs.client.block.write.locateFollowingBlock.retries. We should also make the initial sleeptime configurable to give user more flexibility to control both retry and delay. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7835) make initial sleeptime in locateFollowingBlock configurable for DFSClient.
[ https://issues.apache.org/jira/browse/HDFS-7835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhihai xu updated HDFS-7835: Attachment: HDFS-7835.000.patch make initial sleeptime in locateFollowingBlock configurable for DFSClient. -- Key: HDFS-7835 URL: https://issues.apache.org/jira/browse/HDFS-7835 Project: Hadoop HDFS Issue Type: Improvement Components: dfsclient Reporter: zhihai xu Assignee: zhihai xu Attachments: HDFS-7835.000.patch Make initial sleeptime in locateFollowingBlock configurable for DFSClient. Current the sleeptime/localTimeout in locateFollowingBlock/completeFile from DFSOutputStream is hard-coded as 400 ms, but retries can be configured by dfs.client.block.write.locateFollowingBlock.retries. We should also make the initial sleeptime configurable to give user more flexibility to control both retry and delay. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7537) fsck is confusing when dfs.namenode.replication.min 1 missing replicas NN restart
[ https://issues.apache.org/jira/browse/HDFS-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335523#comment-14335523 ] Allen Wittenauer commented on HDFS-7537: bq. When numUnderMinimalRelicatedBlocks 0 and there is no missing/corrupted block, all under minimal replicated blocks have at least one good replica so that they can be replicated and there is no data loss. It makes sense to consider the file system as healthy. Exactly this. I made a prototype to play with. One of things I did was put the number of blocks that didn't meet the replication minimum surrounded by the asterisks that the corrupted output did. This made it absolutely crystal clear why the NN wasn't coming out of safemode. fsck is confusing when dfs.namenode.replication.min 1 missing replicas NN restart - Key: HDFS-7537 URL: https://issues.apache.org/jira/browse/HDFS-7537 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Allen Wittenauer Assignee: GAO Rui Attachments: HDFS-7537.1.patch, dfs-min-2-fsck.png, dfs-min-2.png If minimum replication is set to 2 or higher and some of those replicas are missing and the namenode restarts, it isn't always obvious that the missing replicas are the reason why the namenode isn't leaving safemode. We should improve the output of fsck and the web UI to make it obvious that the missing blocks are from unmet replicas vs. completely/totally missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7280) Use netty 4 in WebImageViewer
[ https://issues.apache.org/jira/browse/HDFS-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335365#comment-14335365 ] Yongjun Zhang commented on HDFS-7280: - Hi [~wheat9], Thanks for your work on this jira. I have some questions, in general, what impact it is if there is any on user side when we change from using netty3 to netty4? Anything special users need to do? any compatibility issue with tools that interface with hadoop? Thanks. Use netty 4 in WebImageViewer - Key: HDFS-7280 URL: https://issues.apache.org/jira/browse/HDFS-7280 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.7.0 Attachments: HDFS-7280.000.patch, HDFS-7280.001.patch, HDFS-7280.002.patch, HDFS-7280.003.patch, HDFS-7280.004.patch This jira changes WebImageViewer to use netty 4 instead of netty 3. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7834) Allow HDFS to bind to ipv6 conditionally
Elliott Clark created HDFS-7834: --- Summary: Allow HDFS to bind to ipv6 conditionally Key: HDFS-7834 URL: https://issues.apache.org/jira/browse/HDFS-7834 Project: Hadoop HDFS Issue Type: Improvement Reporter: Elliott Clark Assignee: Elliott Clark Currently the bash scripts unconditionally add -Djava.net.preferIPv4Stack=true While this was needed a while ago. IPV6 on java works much better now and there should be a way to allow it to bind dual stack if needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7834) Allow HDFS to bind to ipv6 conditionally
[ https://issues.apache.org/jira/browse/HDFS-7834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elliott Clark updated HDFS-7834: Affects Version/s: 2.6.0 Fix Version/s: 2.7.0 Allow HDFS to bind to ipv6 conditionally Key: HDFS-7834 URL: https://issues.apache.org/jira/browse/HDFS-7834 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.6.0 Reporter: Elliott Clark Assignee: Elliott Clark Fix For: 2.7.0 Currently the bash scripts unconditionally add -Djava.net.preferIPv4Stack=true While this was needed a while ago. IPV6 on java works much better now and there should be a way to allow it to bind dual stack if needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7302) namenode -rollingUpgrade downgrade may finalize a rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-7302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334594#comment-14334594 ] Kai Sasaki commented on HDFS-7302: -- [~szetszwo] I may have some misunderstanding. I found there were some dependencies on FSImage or FSNamesystem and so on. Can I remove all dependencies? This downgrade option is not also necessary to these internal classes? namenode -rollingUpgrade downgrade may finalize a rolling upgrade - Key: HDFS-7302 URL: https://issues.apache.org/jira/browse/HDFS-7302 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Kai Sasaki Labels: document, hdfs Attachments: HADOOP-7302.1.patch The namenode startup option -rollingUpgrade downgrade is originally designed for downgrading cluster. However, running namenode -rollingUpgrade downgrade with the new software could result in finalizing the ongoing rolling upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7537) fsck is confusing when dfs.namenode.replication.min 1 missing replicas NN restart
[ https://issues.apache.org/jira/browse/HDFS-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334686#comment-14334686 ] Hadoop QA commented on HDFS-7537: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700346/HDFS-7537.1.patch against trunk revision 1dba572. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9653//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9653//console This message is automatically generated. fsck is confusing when dfs.namenode.replication.min 1 missing replicas NN restart - Key: HDFS-7537 URL: https://issues.apache.org/jira/browse/HDFS-7537 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Allen Wittenauer Assignee: GAO Rui Attachments: HDFS-7537.1.patch, dfs-min-2-fsck.png, dfs-min-2.png If minimum replication is set to 2 or higher and some of those replicas are missing and the namenode restarts, it isn't always obvious that the missing replicas are the reason why the namenode isn't leaving safemode. We should improve the output of fsck and the web UI to make it obvious that the missing blocks are from unmet replicas vs. completely/totally missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7439) Add BlockOpResponseProto's message to DFSClient's exception message
[ https://issues.apache.org/jira/browse/HDFS-7439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334703#comment-14334703 ] Takanobu Asanuma commented on HDFS-7439: Excuse me, how can I rebuild for this patch? I can’t log in Jenkins WebUI. Add BlockOpResponseProto's message to DFSClient's exception message --- Key: HDFS-7439 URL: https://issues.apache.org/jira/browse/HDFS-7439 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Assignee: Takanobu Asanuma Priority: Minor Attachments: HDFS-7439.1.patch When (BlockOpResponseProto#getStatus() != SUCCESS), it helps with debugging if DFSClient can add BlockOpResponseProto's message to the exception message applications will get. For example, instead of {noformat} throw new IOException(Got error for OP_READ_BLOCK, self= + peer.getLocalAddressString() + , remote= + peer.getRemoteAddressString() + , for file + file + , for pool + block.getBlockPoolId() + block + block.getBlockId() + _ + block.getGenerationStamp()); {noformat} It could be, {noformat} throw new IOException(Got error for OP_READ_BLOCK, self= + peer.getLocalAddressString() + , remote= + peer.getRemoteAddressString() + , for file + file + , for pool + block.getBlockPoolId() + block + block.getBlockId() + _ + block.getGenerationStamp() + , status message + status.getMessage()); {noformat} We might want to check out all the references to BlockOpResponseProto in DFSClient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7537) fsck is confusing when dfs.namenode.replication.min 1 missing replicas NN restart
[ https://issues.apache.org/jira/browse/HDFS-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334742#comment-14334742 ] GAO Rui commented on HDFS-7537: --- Thank you very much for your review and comment. 1. I think minReplication may get its value from DFSConfigKeys.DFS_NAMENODE_REPLICATION_MIN_KEY in the first place. I’ll try to figure this out and add it to the output. 2. In Allen’s comment, the Mock-up output shows status as HEALTHY when numUnderMinimalRelicatedBlocks 0. It’s his careless mistake or maybe he has his reason to keep the status as HEALTHY while show the numUnderMinimalRelicatedBlocks in the same time? 3. I haven’t added unit test before, but I’ll try to do that. 4. Sorry, I’ll fix it and avoid this kind of mistakes in future codes. fsck is confusing when dfs.namenode.replication.min 1 missing replicas NN restart - Key: HDFS-7537 URL: https://issues.apache.org/jira/browse/HDFS-7537 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Allen Wittenauer Assignee: GAO Rui Attachments: HDFS-7537.1.patch, dfs-min-2-fsck.png, dfs-min-2.png If minimum replication is set to 2 or higher and some of those replicas are missing and the namenode restarts, it isn't always obvious that the missing replicas are the reason why the namenode isn't leaving safemode. We should improve the output of fsck and the web UI to make it obvious that the missing blocks are from unmet replicas vs. completely/totally missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7056) Snapshot support for truncate
[ https://issues.apache.org/jira/browse/HDFS-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334562#comment-14334562 ] Konstantin Shvachko commented on HDFS-7056: --- The patch is up for review in HDFS-7831. Snapshot support for truncate - Key: HDFS-7056 URL: https://issues.apache.org/jira/browse/HDFS-7056 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Plamen Jeliazkov Fix For: 2.7.0 Attachments: HDFS-3107-HDFS-7056-combined-13.patch, HDFS-3107-HDFS-7056-combined-15.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-3107-HDFS-7056-combined.patch, HDFS-7056-13.patch, HDFS-7056-15.patch, HDFS-7056.15_branch2.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFS-7056.patch, HDFSSnapshotWithTruncateDesign.docx, HDFSSnapshotWithTruncateDesign.docx, editsStored, editsStored.xml Implementation of truncate in HDFS-3107 does not allow truncating files which are in a snapshot. It is desirable to be able to truncate and still keep the old file state of the file in the snapshot. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7831) Fix the starting index of the loop in FileDiffList.findEarlierSnapshotBlocks().
[ https://issues.apache.org/jira/browse/HDFS-7831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-7831: -- Assignee: Konstantin Shvachko Status: Patch Available (was: Open) Fix the starting index of the loop in FileDiffList.findEarlierSnapshotBlocks(). --- Key: HDFS-7831 URL: https://issues.apache.org/jira/browse/HDFS-7831 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Attachments: HDFS-7831-01.patch Currently the loop in {{FileDiffList.findEarlierSnapshotBlocks()}} starts from {{insertPoint + 1}}. It should start from {{insertPoint - 1}}. As noted in [Jing's comment|https://issues.apache.org/jira/browse/HDFS-7056?focusedCommentId=14333864page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14333864] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7308) DFSClient write packet size may 64kB
[ https://issues.apache.org/jira/browse/HDFS-7308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334569#comment-14334569 ] Tsz Wo Nicholas Sze commented on HDFS-7308: --- Patch looks good to me. [~stack], I wonder if you could repeat the test you have done for HDFS-7276 with the patch here to see if the packet size can go over 65536? DFSClient write packet size may 64kB -- Key: HDFS-7308 URL: https://issues.apache.org/jira/browse/HDFS-7308 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Reporter: Tsz Wo Nicholas Sze Assignee: Takuya Fukudome Priority: Minor Attachments: HDFS-7308.1.patch In DFSOutputStream.computePacketChunkSize(..), {code} private void computePacketChunkSize(int psize, int csize) { final int chunkSize = csize + getChecksumSize(); chunksPerPacket = Math.max(psize/chunkSize, 1); packetSize = chunkSize*chunksPerPacket; if (DFSClient.LOG.isDebugEnabled()) { ... } } {code} We have the following || variables || usual values || | psize | dfsClient.getConf().writePacketSize = 64kB | | csize | bytesPerChecksum = 512B | | getChecksumSize(), i.e. CRC size | 32B | | chunkSize = csize + getChecksumSize() | 544B (not a power of two) | | psize/chunkSize | 120.47 | | chunksPerPacket = max(psize/chunkSize, 1) | 120 | | packetSize = chunkSize*chunksPerPacket (not including header) | 65280B | | PacketHeader.PKT_MAX_HEADER_LEN | 33B | | actual packet size | 65280 + 33 = *65313* 65536 = 64k | It is fortunate that the usual packet size = 65313 64k although the calculation above does not guarantee it always happens (e.g. if PKT_MAX_HEADER_LEN=257, then actual packet size=65537 64k.) We should fix the computation in order to guarantee actual packet size 64k. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7831) Fix the starting index of the loop in FileDiffList.findEarlierSnapshotBlocks().
[ https://issues.apache.org/jira/browse/HDFS-7831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-7831: -- Attachment: HDFS-7831-01.patch Fixed the starting index for the loop. Also we do not need to check that {{i diffs.size()}}, because now it always is. This should be treated as an optimization, so there are no additional test cases. Fix the starting index of the loop in FileDiffList.findEarlierSnapshotBlocks(). --- Key: HDFS-7831 URL: https://issues.apache.org/jira/browse/HDFS-7831 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Konstantin Shvachko Attachments: HDFS-7831-01.patch Currently the loop in {{FileDiffList.findEarlierSnapshotBlocks()}} starts from {{insertPoint + 1}}. It should start from {{insertPoint - 1}}. As noted in [Jing's comment|https://issues.apache.org/jira/browse/HDFS-7056?focusedCommentId=14333864page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14333864] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7832) Show 'Last Modified' in Namenode's 'Browse Filesystem'
[ https://issues.apache.org/jira/browse/HDFS-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-7832: Attachment: HDFS-7832-001.patch Attaching the changes Please review Show 'Last Modified' in Namenode's 'Browse Filesystem' -- Key: HDFS-7832 URL: https://issues.apache.org/jira/browse/HDFS-7832 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Vinayakumar B Assignee: Vinayakumar B Attachments: HDFS-7832-001.patch new UI no longer shows the last modified time for a path while browsing. This could be added to make browse file system more useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7807) libhdfs htable.c: fix htable resizing, add unit test
[ https://issues.apache.org/jira/browse/HDFS-7807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334808#comment-14334808 ] Hudson commented on HDFS-7807: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #848 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/848/]) HDFS-7807. libhdfs htable.c: fix htable resizing, add unit test (cmccabe) (cmccabe: rev 585768667e443f56c2f97068276ec8768dc49cf8) * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/test/test_htable.c * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/common/htable.c * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt libhdfs htable.c: fix htable resizing, add unit test Key: HDFS-7807 URL: https://issues.apache.org/jira/browse/HDFS-7807 Project: Hadoop HDFS Issue Type: Bug Components: native Affects Versions: 2.7.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.7.0 Attachments: HDFS-7807.001.patch, HDFS-7807.002.patch libhdfs htable.c: fix htable resizing, add unit test -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7009) Active NN and standby NN have different live nodes
[ https://issues.apache.org/jira/browse/HDFS-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334810#comment-14334810 ] Hudson commented on HDFS-7009: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #848 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/848/]) HDFS-7009. Active NN and standby NN have different live nodes. Contributed by Ming Ma. (cnauroth: rev 769507bd7a501929d9a2fd56c72c3f50673488a4) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Active NN and standby NN have different live nodes -- Key: HDFS-7009 URL: https://issues.apache.org/jira/browse/HDFS-7009 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Ming Ma Assignee: Ming Ma Fix For: 2.7.0 Attachments: HDFS-7009-2.patch, HDFS-7009-3.patch, HDFS-7009-4.patch, HDFS-7009.patch To follow up on https://issues.apache.org/jira/browse/HDFS-6478, in most cases, given DN sends HB and BR to NN regularly, if a specific RPC call fails, it isn't a big deal. However, there are cases where DN fails to register with NN during initial handshake due to exceptions not covered by RPC client's connection retry. When this happens, the DN won't talk to that NN until the DN restarts. {noformat} BPServiceActor public void run() { LOG.info(this + starting to offer service); try { // init stuff try { // setup storage connectToNNAndHandshake(); } catch (IOException ioe) { // Initial handshake, storage recovery or registration failed // End BPOfferService thread LOG.fatal(Initialization failed for block pool + this, ioe); return; } initialized = true; // bp is initialized; while (shouldRun()) { try { offerService(); } catch (Exception ex) { LOG.error(Exception in BPOfferService for + this, ex); sleepAndLogInterrupts(5000, offering service); } } ... {noformat} Here is an example of the call stack. {noformat} java.io.IOException: Failed on local exception: java.io.IOException: Response is null.; Host Details : local host is: xxx; destination host is: yyy:8030; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:761) at org.apache.hadoop.ipc.Client.call(Client.java:1239) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Response is null. at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:949) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:844) {noformat} This will create discrepancy between active NN and standby NN in terms of live nodes. Here is a possible scenario of missing blocks after failover. 1. DN A, B set up handshakes with active NN, but not with standby NN. 2. A block is replicated to DN A, B and C. 3. From standby NN's point of view, given A and B are dead nodes, the block is under replicated. 4. DN C is down. 5. Before active NN detects DN C is down, it fails over. 6. The new active NN considers the block is missing. Even though there are two replicas on DN A and B. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7805) NameNode recovery prompt should be printed on console
[ https://issues.apache.org/jira/browse/HDFS-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334811#comment-14334811 ] Hudson commented on HDFS-7805: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #848 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/848/]) HDFS-7805. NameNode recovery prompt should be printed on console (Surendra Singh Lilhore via Colin P. McCabe) (cmccabe: rev faaddb6ecb44cdc9ef82a2ab392f64fc2561e938) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/MetaRecoveryContext.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt NameNode recovery prompt should be printed on console - Key: HDFS-7805 URL: https://issues.apache.org/jira/browse/HDFS-7805 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: surendra singh lilhore Assignee: surendra singh lilhore Fix For: 2.7.0 Attachments: HDFS-7805.patch, HDFS-7805_1.patch In my cluster root logger is not console, so when I run namenode recovery tool MetaRecoveryContext.java prompt message is logged in log file. Actually is should be display on console. Currently it is like this {code} LOG.info(prompt); {code} It should be {code} System.err.print(prompt); {code} NameNode recovery prompt should be printed on console -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7831) Fix the starting index of the loop in FileDiffList.findEarlierSnapshotBlocks().
[ https://issues.apache.org/jira/browse/HDFS-7831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334785#comment-14334785 ] Hadoop QA commented on HDFS-7831: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700372/HDFS-7831-01.patch against trunk revision b610c68. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestFailureToReadEdits Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9656//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9656//console This message is automatically generated. Fix the starting index of the loop in FileDiffList.findEarlierSnapshotBlocks(). --- Key: HDFS-7831 URL: https://issues.apache.org/jira/browse/HDFS-7831 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Attachments: HDFS-7831-01.patch Currently the loop in {{FileDiffList.findEarlierSnapshotBlocks()}} starts from {{insertPoint + 1}}. It should start from {{insertPoint - 1}}. As noted in [Jing's comment|https://issues.apache.org/jira/browse/HDFS-7056?focusedCommentId=14333864page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14333864] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7832) Show 'Last Modified' in Namenode's 'Browse Filesystem'
[ https://issues.apache.org/jira/browse/HDFS-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-7832: Status: Patch Available (was: Open) Show 'Last Modified' in Namenode's 'Browse Filesystem' -- Key: HDFS-7832 URL: https://issues.apache.org/jira/browse/HDFS-7832 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Vinayakumar B Assignee: Vinayakumar B Attachments: HDFS-7832-001.patch new UI no longer shows the last modified time for a path while browsing. This could be added to make browse file system more useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7824) GetContentSummary API and its namenode implementaion for Storage Type Quota/Usage
[ https://issues.apache.org/jira/browse/HDFS-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334769#comment-14334769 ] Hadoop QA commented on HDFS-7824: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700363/HDFS-7824.02.patch against trunk revision b610c68. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1156 javac compiler warnings (more than the trunk's current 1155 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-httpfs: org.apache.hadoop.hdfs.server.namenode.TestHDFSConcat org.apache.hadoop.fs.viewfs.TestViewFsDefaultValue org.apache.hadoop.hdfs.TestClientProtocolForPipelineRecovery org.apache.hadoop.hdfs.TestEncryptedTransfer org.apache.hadoop.hdfs.TestPersistBlocks org.apache.hadoop.fs.permission.TestStickyBit org.apache.hadoop.hdfs.server.namenode.TestFSImageWithSnapshot org.apache.hadoop.hdfs.TestPipelines org.apache.hadoop.fs.TestHDFSFileContextMainOperations org.apache.hadoop.cli.TestHDFSCLI org.apache.hadoop.hdfs.TestReplaceDatanodeOnFailure The following test timeouts occurred in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs hadoop-hdfs-project/hadoop-hdfs-httpfs: org.apache.hadoop.fs.viewfs.TestViewFsWithAuthorityLocalFs The test build failed in hadoop-hdfs-project/hadoop-hdfs-httpfs Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9654//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9654//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9654//console This message is automatically generated. GetContentSummary API and its namenode implementaion for Storage Type Quota/Usage - Key: HDFS-7824 URL: https://issues.apache.org/jira/browse/HDFS-7824 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, namenode Reporter: Xiaoyu Yao Assignee: Xiaoyu Yao Fix For: 2.7.0 Attachments: HDFS-7824.00.patch, HDFS-7824.01.patch, HDFS-7824.02.patch This JIRA is opened to provide API support of GetContentSummary with storage type quota and usage information. It includes namenode implementation, client namenode RPC protocol and Content.Counts refactoring. It is required by HDFS-7701 (CLI to display storage type quota and usage). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7439) Add BlockOpResponseProto's message to DFSClient's exception message
[ https://issues.apache.org/jira/browse/HDFS-7439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334763#comment-14334763 ] Hadoop QA commented on HDFS-7439: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700317/HDFS-7439.1.patch against trunk revision b610c68. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestFileTruncate Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9655//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9655//console This message is automatically generated. Add BlockOpResponseProto's message to DFSClient's exception message --- Key: HDFS-7439 URL: https://issues.apache.org/jira/browse/HDFS-7439 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Assignee: Takanobu Asanuma Priority: Minor Attachments: HDFS-7439.1.patch When (BlockOpResponseProto#getStatus() != SUCCESS), it helps with debugging if DFSClient can add BlockOpResponseProto's message to the exception message applications will get. For example, instead of {noformat} throw new IOException(Got error for OP_READ_BLOCK, self= + peer.getLocalAddressString() + , remote= + peer.getRemoteAddressString() + , for file + file + , for pool + block.getBlockPoolId() + block + block.getBlockId() + _ + block.getGenerationStamp()); {noformat} It could be, {noformat} throw new IOException(Got error for OP_READ_BLOCK, self= + peer.getLocalAddressString() + , remote= + peer.getRemoteAddressString() + , for file + file + , for pool + block.getBlockPoolId() + block + block.getBlockId() + _ + block.getGenerationStamp() + , status message + status.getMessage()); {noformat} We might want to check out all the references to BlockOpResponseProto in DFSClient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7009) Active NN and standby NN have different live nodes
[ https://issues.apache.org/jira/browse/HDFS-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334792#comment-14334792 ] Hudson commented on HDFS-7009: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #114 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/114/]) HDFS-7009. Active NN and standby NN have different live nodes. Contributed by Ming Ma. (cnauroth: rev 769507bd7a501929d9a2fd56c72c3f50673488a4) * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java Active NN and standby NN have different live nodes -- Key: HDFS-7009 URL: https://issues.apache.org/jira/browse/HDFS-7009 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Ming Ma Assignee: Ming Ma Fix For: 2.7.0 Attachments: HDFS-7009-2.patch, HDFS-7009-3.patch, HDFS-7009-4.patch, HDFS-7009.patch To follow up on https://issues.apache.org/jira/browse/HDFS-6478, in most cases, given DN sends HB and BR to NN regularly, if a specific RPC call fails, it isn't a big deal. However, there are cases where DN fails to register with NN during initial handshake due to exceptions not covered by RPC client's connection retry. When this happens, the DN won't talk to that NN until the DN restarts. {noformat} BPServiceActor public void run() { LOG.info(this + starting to offer service); try { // init stuff try { // setup storage connectToNNAndHandshake(); } catch (IOException ioe) { // Initial handshake, storage recovery or registration failed // End BPOfferService thread LOG.fatal(Initialization failed for block pool + this, ioe); return; } initialized = true; // bp is initialized; while (shouldRun()) { try { offerService(); } catch (Exception ex) { LOG.error(Exception in BPOfferService for + this, ex); sleepAndLogInterrupts(5000, offering service); } } ... {noformat} Here is an example of the call stack. {noformat} java.io.IOException: Failed on local exception: java.io.IOException: Response is null.; Host Details : local host is: xxx; destination host is: yyy:8030; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:761) at org.apache.hadoop.ipc.Client.call(Client.java:1239) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Response is null. at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:949) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:844) {noformat} This will create discrepancy between active NN and standby NN in terms of live nodes. Here is a possible scenario of missing blocks after failover. 1. DN A, B set up handshakes with active NN, but not with standby NN. 2. A block is replicated to DN A, B and C. 3. From standby NN's point of view, given A and B are dead nodes, the block is under replicated. 4. DN C is down. 5. Before active NN detects DN C is down, it fails over. 6. The new active NN considers the block is missing. Even though there are two replicas on DN A and B. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7832) Show 'Last Modified' in Namenode's 'Browse Filesystem'
Vinayakumar B created HDFS-7832: --- Summary: Show 'Last Modified' in Namenode's 'Browse Filesystem' Key: HDFS-7832 URL: https://issues.apache.org/jira/browse/HDFS-7832 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Vinayakumar B Assignee: Vinayakumar B new UI no longer shows the last modified time for a path while browsing. This could be added to make browse file system more useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7805) NameNode recovery prompt should be printed on console
[ https://issues.apache.org/jira/browse/HDFS-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334793#comment-14334793 ] Hudson commented on HDFS-7805: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #114 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/114/]) HDFS-7805. NameNode recovery prompt should be printed on console (Surendra Singh Lilhore via Colin P. McCabe) (cmccabe: rev faaddb6ecb44cdc9ef82a2ab392f64fc2561e938) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/MetaRecoveryContext.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt NameNode recovery prompt should be printed on console - Key: HDFS-7805 URL: https://issues.apache.org/jira/browse/HDFS-7805 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: surendra singh lilhore Assignee: surendra singh lilhore Fix For: 2.7.0 Attachments: HDFS-7805.patch, HDFS-7805_1.patch In my cluster root logger is not console, so when I run namenode recovery tool MetaRecoveryContext.java prompt message is logged in log file. Actually is should be display on console. Currently it is like this {code} LOG.info(prompt); {code} It should be {code} System.err.print(prompt); {code} NameNode recovery prompt should be printed on console -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7807) libhdfs htable.c: fix htable resizing, add unit test
[ https://issues.apache.org/jira/browse/HDFS-7807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334790#comment-14334790 ] Hudson commented on HDFS-7807: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #114 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/114/]) HDFS-7807. libhdfs htable.c: fix htable resizing, add unit test (cmccabe) (cmccabe: rev 585768667e443f56c2f97068276ec8768dc49cf8) * hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/test/test_htable.c * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/common/htable.c libhdfs htable.c: fix htable resizing, add unit test Key: HDFS-7807 URL: https://issues.apache.org/jira/browse/HDFS-7807 Project: Hadoop HDFS Issue Type: Bug Components: native Affects Versions: 2.7.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.7.0 Attachments: HDFS-7807.001.patch, HDFS-7807.002.patch libhdfs htable.c: fix htable resizing, add unit test -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7807) libhdfs htable.c: fix htable resizing, add unit test
[ https://issues.apache.org/jira/browse/HDFS-7807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334976#comment-14334976 ] Hudson commented on HDFS-7807: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #114 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/114/]) HDFS-7807. libhdfs htable.c: fix htable resizing, add unit test (cmccabe) (cmccabe: rev 585768667e443f56c2f97068276ec8768dc49cf8) * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/common/htable.c * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/test/test_htable.c * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt libhdfs htable.c: fix htable resizing, add unit test Key: HDFS-7807 URL: https://issues.apache.org/jira/browse/HDFS-7807 Project: Hadoop HDFS Issue Type: Bug Components: native Affects Versions: 2.7.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.7.0 Attachments: HDFS-7807.001.patch, HDFS-7807.002.patch libhdfs htable.c: fix htable resizing, add unit test -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7805) NameNode recovery prompt should be printed on console
[ https://issues.apache.org/jira/browse/HDFS-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335000#comment-14335000 ] Hudson commented on HDFS-7805: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2064 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2064/]) HDFS-7805. NameNode recovery prompt should be printed on console (Surendra Singh Lilhore via Colin P. McCabe) (cmccabe: rev faaddb6ecb44cdc9ef82a2ab392f64fc2561e938) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/MetaRecoveryContext.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt NameNode recovery prompt should be printed on console - Key: HDFS-7805 URL: https://issues.apache.org/jira/browse/HDFS-7805 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: surendra singh lilhore Assignee: surendra singh lilhore Fix For: 2.7.0 Attachments: HDFS-7805.patch, HDFS-7805_1.patch In my cluster root logger is not console, so when I run namenode recovery tool MetaRecoveryContext.java prompt message is logged in log file. Actually is should be display on console. Currently it is like this {code} LOG.info(prompt); {code} It should be {code} System.err.print(prompt); {code} NameNode recovery prompt should be printed on console -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7008) xlator should be closed upon exit from DFSAdmin#genericRefresh()
[ https://issues.apache.org/jira/browse/HDFS-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsuyoshi OZAWA updated HDFS-7008: - Resolution: Fixed Fix Version/s: 2.7.0 Target Version/s: 2.7.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed this to trunk and branch-2. Thanks [~ted_yu] for your report and review. xlator should be closed upon exit from DFSAdmin#genericRefresh() Key: HDFS-7008 URL: https://issues.apache.org/jira/browse/HDFS-7008 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Assignee: Tsuyoshi OZAWA Priority: Minor Fix For: 2.7.0 Attachments: HDFS-7008.1.patch, HDFS-7008.2.patch {code} GenericRefreshProtocol xlator = new GenericRefreshProtocolClientSideTranslatorPB(proxy); // Refresh CollectionRefreshResponse responses = xlator.refresh(identifier, args); {code} GenericRefreshProtocolClientSideTranslatorPB#close() should be called on xlator before return. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7805) NameNode recovery prompt should be printed on console
[ https://issues.apache.org/jira/browse/HDFS-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334979#comment-14334979 ] Hudson commented on HDFS-7805: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #114 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/114/]) HDFS-7805. NameNode recovery prompt should be printed on console (Surendra Singh Lilhore via Colin P. McCabe) (cmccabe: rev faaddb6ecb44cdc9ef82a2ab392f64fc2561e938) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/MetaRecoveryContext.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt NameNode recovery prompt should be printed on console - Key: HDFS-7805 URL: https://issues.apache.org/jira/browse/HDFS-7805 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: surendra singh lilhore Assignee: surendra singh lilhore Fix For: 2.7.0 Attachments: HDFS-7805.patch, HDFS-7805_1.patch In my cluster root logger is not console, so when I run namenode recovery tool MetaRecoveryContext.java prompt message is logged in log file. Actually is should be display on console. Currently it is like this {code} LOG.info(prompt); {code} It should be {code} System.err.print(prompt); {code} NameNode recovery prompt should be printed on console -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7009) Active NN and standby NN have different live nodes
[ https://issues.apache.org/jira/browse/HDFS-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334978#comment-14334978 ] Hudson commented on HDFS-7009: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #114 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/114/]) HDFS-7009. Active NN and standby NN have different live nodes. Contributed by Ming Ma. (cnauroth: rev 769507bd7a501929d9a2fd56c72c3f50673488a4) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Active NN and standby NN have different live nodes -- Key: HDFS-7009 URL: https://issues.apache.org/jira/browse/HDFS-7009 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Ming Ma Assignee: Ming Ma Fix For: 2.7.0 Attachments: HDFS-7009-2.patch, HDFS-7009-3.patch, HDFS-7009-4.patch, HDFS-7009.patch To follow up on https://issues.apache.org/jira/browse/HDFS-6478, in most cases, given DN sends HB and BR to NN regularly, if a specific RPC call fails, it isn't a big deal. However, there are cases where DN fails to register with NN during initial handshake due to exceptions not covered by RPC client's connection retry. When this happens, the DN won't talk to that NN until the DN restarts. {noformat} BPServiceActor public void run() { LOG.info(this + starting to offer service); try { // init stuff try { // setup storage connectToNNAndHandshake(); } catch (IOException ioe) { // Initial handshake, storage recovery or registration failed // End BPOfferService thread LOG.fatal(Initialization failed for block pool + this, ioe); return; } initialized = true; // bp is initialized; while (shouldRun()) { try { offerService(); } catch (Exception ex) { LOG.error(Exception in BPOfferService for + this, ex); sleepAndLogInterrupts(5000, offering service); } } ... {noformat} Here is an example of the call stack. {noformat} java.io.IOException: Failed on local exception: java.io.IOException: Response is null.; Host Details : local host is: xxx; destination host is: yyy:8030; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:761) at org.apache.hadoop.ipc.Client.call(Client.java:1239) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Response is null. at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:949) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:844) {noformat} This will create discrepancy between active NN and standby NN in terms of live nodes. Here is a possible scenario of missing blocks after failover. 1. DN A, B set up handshakes with active NN, but not with standby NN. 2. A block is replicated to DN A, B and C. 3. From standby NN's point of view, given A and B are dead nodes, the block is under replicated. 4. DN C is down. 5. Before active NN detects DN C is down, it fails over. 6. The new active NN considers the block is missing. Even though there are two replicas on DN A and B. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7008) xlator should be closed upon exit from DFSAdmin#genericRefresh()
[ https://issues.apache.org/jira/browse/HDFS-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334984#comment-14334984 ] Hudson commented on HDFS-7008: -- FAILURE: Integrated in Hadoop-trunk-Commit #7186 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7186/]) HDFS-7008. xlator should be closed upon exit from DFSAdmin#genericRefresh(). (ozawa) (ozawa: rev b53fd7163bc3a4eef4632afb55e5513c7c592fcf) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt xlator should be closed upon exit from DFSAdmin#genericRefresh() Key: HDFS-7008 URL: https://issues.apache.org/jira/browse/HDFS-7008 Project: Hadoop HDFS Issue Type: Bug Reporter: Ted Yu Assignee: Tsuyoshi OZAWA Priority: Minor Fix For: 2.7.0 Attachments: HDFS-7008.1.patch, HDFS-7008.2.patch {code} GenericRefreshProtocol xlator = new GenericRefreshProtocolClientSideTranslatorPB(proxy); // Refresh CollectionRefreshResponse responses = xlator.refresh(identifier, args); {code} GenericRefreshProtocolClientSideTranslatorPB#close() should be called on xlator before return. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7009) Active NN and standby NN have different live nodes
[ https://issues.apache.org/jira/browse/HDFS-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334999#comment-14334999 ] Hudson commented on HDFS-7009: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2064 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2064/]) HDFS-7009. Active NN and standby NN have different live nodes. Contributed by Ming Ma. (cnauroth: rev 769507bd7a501929d9a2fd56c72c3f50673488a4) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Active NN and standby NN have different live nodes -- Key: HDFS-7009 URL: https://issues.apache.org/jira/browse/HDFS-7009 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Ming Ma Assignee: Ming Ma Fix For: 2.7.0 Attachments: HDFS-7009-2.patch, HDFS-7009-3.patch, HDFS-7009-4.patch, HDFS-7009.patch To follow up on https://issues.apache.org/jira/browse/HDFS-6478, in most cases, given DN sends HB and BR to NN regularly, if a specific RPC call fails, it isn't a big deal. However, there are cases where DN fails to register with NN during initial handshake due to exceptions not covered by RPC client's connection retry. When this happens, the DN won't talk to that NN until the DN restarts. {noformat} BPServiceActor public void run() { LOG.info(this + starting to offer service); try { // init stuff try { // setup storage connectToNNAndHandshake(); } catch (IOException ioe) { // Initial handshake, storage recovery or registration failed // End BPOfferService thread LOG.fatal(Initialization failed for block pool + this, ioe); return; } initialized = true; // bp is initialized; while (shouldRun()) { try { offerService(); } catch (Exception ex) { LOG.error(Exception in BPOfferService for + this, ex); sleepAndLogInterrupts(5000, offering service); } } ... {noformat} Here is an example of the call stack. {noformat} java.io.IOException: Failed on local exception: java.io.IOException: Response is null.; Host Details : local host is: xxx; destination host is: yyy:8030; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:761) at org.apache.hadoop.ipc.Client.call(Client.java:1239) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Response is null. at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:949) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:844) {noformat} This will create discrepancy between active NN and standby NN in terms of live nodes. Here is a possible scenario of missing blocks after failover. 1. DN A, B set up handshakes with active NN, but not with standby NN. 2. A block is replicated to DN A, B and C. 3. From standby NN's point of view, given A and B are dead nodes, the block is under replicated. 4. DN C is down. 5. Before active NN detects DN C is down, it fails over. 6. The new active NN considers the block is missing. Even though there are two replicas on DN A and B. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7832) Show 'Last Modified' in Namenode's 'Browse Filesystem'
[ https://issues.apache.org/jira/browse/HDFS-7832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334954#comment-14334954 ] Hadoop QA commented on HDFS-7832: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700483/HDFS-7832-001.patch against trunk revision b610c68. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.snapshot.TestOpenFilesWithSnapshot Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9657//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9657//console This message is automatically generated. Show 'Last Modified' in Namenode's 'Browse Filesystem' -- Key: HDFS-7832 URL: https://issues.apache.org/jira/browse/HDFS-7832 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Vinayakumar B Assignee: Vinayakumar B Attachments: HDFS-7832-001.patch new UI no longer shows the last modified time for a path while browsing. This could be added to make browse file system more useful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7807) libhdfs htable.c: fix htable resizing, add unit test
[ https://issues.apache.org/jira/browse/HDFS-7807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334997#comment-14334997 ] Hudson commented on HDFS-7807: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2064 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2064/]) HDFS-7807. libhdfs htable.c: fix htable resizing, add unit test (cmccabe) (cmccabe: rev 585768667e443f56c2f97068276ec8768dc49cf8) * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/test/test_htable.c * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/common/htable.c * hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt libhdfs htable.c: fix htable resizing, add unit test Key: HDFS-7807 URL: https://issues.apache.org/jira/browse/HDFS-7807 Project: Hadoop HDFS Issue Type: Bug Components: native Affects Versions: 2.7.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.7.0 Attachments: HDFS-7807.001.patch, HDFS-7807.002.patch libhdfs htable.c: fix htable resizing, add unit test -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7807) libhdfs htable.c: fix htable resizing, add unit test
[ https://issues.apache.org/jira/browse/HDFS-7807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334904#comment-14334904 ] Hudson commented on HDFS-7807: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #105 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/105/]) HDFS-7807. libhdfs htable.c: fix htable resizing, add unit test (cmccabe) (cmccabe: rev 585768667e443f56c2f97068276ec8768dc49cf8) * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/common/htable.c * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/test/test_htable.c libhdfs htable.c: fix htable resizing, add unit test Key: HDFS-7807 URL: https://issues.apache.org/jira/browse/HDFS-7807 Project: Hadoop HDFS Issue Type: Bug Components: native Affects Versions: 2.7.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.7.0 Attachments: HDFS-7807.001.patch, HDFS-7807.002.patch libhdfs htable.c: fix htable resizing, add unit test -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7009) Active NN and standby NN have different live nodes
[ https://issues.apache.org/jira/browse/HDFS-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334906#comment-14334906 ] Hudson commented on HDFS-7009: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #105 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/105/]) HDFS-7009. Active NN and standby NN have different live nodes. Contributed by Ming Ma. (cnauroth: rev 769507bd7a501929d9a2fd56c72c3f50673488a4) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Active NN and standby NN have different live nodes -- Key: HDFS-7009 URL: https://issues.apache.org/jira/browse/HDFS-7009 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Ming Ma Assignee: Ming Ma Fix For: 2.7.0 Attachments: HDFS-7009-2.patch, HDFS-7009-3.patch, HDFS-7009-4.patch, HDFS-7009.patch To follow up on https://issues.apache.org/jira/browse/HDFS-6478, in most cases, given DN sends HB and BR to NN regularly, if a specific RPC call fails, it isn't a big deal. However, there are cases where DN fails to register with NN during initial handshake due to exceptions not covered by RPC client's connection retry. When this happens, the DN won't talk to that NN until the DN restarts. {noformat} BPServiceActor public void run() { LOG.info(this + starting to offer service); try { // init stuff try { // setup storage connectToNNAndHandshake(); } catch (IOException ioe) { // Initial handshake, storage recovery or registration failed // End BPOfferService thread LOG.fatal(Initialization failed for block pool + this, ioe); return; } initialized = true; // bp is initialized; while (shouldRun()) { try { offerService(); } catch (Exception ex) { LOG.error(Exception in BPOfferService for + this, ex); sleepAndLogInterrupts(5000, offering service); } } ... {noformat} Here is an example of the call stack. {noformat} java.io.IOException: Failed on local exception: java.io.IOException: Response is null.; Host Details : local host is: xxx; destination host is: yyy:8030; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:761) at org.apache.hadoop.ipc.Client.call(Client.java:1239) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Response is null. at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:949) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:844) {noformat} This will create discrepancy between active NN and standby NN in terms of live nodes. Here is a possible scenario of missing blocks after failover. 1. DN A, B set up handshakes with active NN, but not with standby NN. 2. A block is replicated to DN A, B and C. 3. From standby NN's point of view, given A and B are dead nodes, the block is under replicated. 4. DN C is down. 5. Before active NN detects DN C is down, it fails over. 6. The new active NN considers the block is missing. Even though there are two replicas on DN A and B. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7805) NameNode recovery prompt should be printed on console
[ https://issues.apache.org/jira/browse/HDFS-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334907#comment-14334907 ] Hudson commented on HDFS-7805: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #105 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/105/]) HDFS-7805. NameNode recovery prompt should be printed on console (Surendra Singh Lilhore via Colin P. McCabe) (cmccabe: rev faaddb6ecb44cdc9ef82a2ab392f64fc2561e938) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/MetaRecoveryContext.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt NameNode recovery prompt should be printed on console - Key: HDFS-7805 URL: https://issues.apache.org/jira/browse/HDFS-7805 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: surendra singh lilhore Assignee: surendra singh lilhore Fix For: 2.7.0 Attachments: HDFS-7805.patch, HDFS-7805_1.patch In my cluster root logger is not console, so when I run namenode recovery tool MetaRecoveryContext.java prompt message is logged in log file. Actually is should be display on console. Currently it is like this {code} LOG.info(prompt); {code} It should be {code} System.err.print(prompt); {code} NameNode recovery prompt should be printed on console -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7807) libhdfs htable.c: fix htable resizing, add unit test
[ https://issues.apache.org/jira/browse/HDFS-7807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334926#comment-14334926 ] Hudson commented on HDFS-7807: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2046 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2046/]) HDFS-7807. libhdfs htable.c: fix htable resizing, add unit test (cmccabe) (cmccabe: rev 585768667e443f56c2f97068276ec8768dc49cf8) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/CMakeLists.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/common/htable.c * hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/test/test_htable.c libhdfs htable.c: fix htable resizing, add unit test Key: HDFS-7807 URL: https://issues.apache.org/jira/browse/HDFS-7807 Project: Hadoop HDFS Issue Type: Bug Components: native Affects Versions: 2.7.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.7.0 Attachments: HDFS-7807.001.patch, HDFS-7807.002.patch libhdfs htable.c: fix htable resizing, add unit test -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7805) NameNode recovery prompt should be printed on console
[ https://issues.apache.org/jira/browse/HDFS-7805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334929#comment-14334929 ] Hudson commented on HDFS-7805: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2046 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2046/]) HDFS-7805. NameNode recovery prompt should be printed on console (Surendra Singh Lilhore via Colin P. McCabe) (cmccabe: rev faaddb6ecb44cdc9ef82a2ab392f64fc2561e938) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/MetaRecoveryContext.java NameNode recovery prompt should be printed on console - Key: HDFS-7805 URL: https://issues.apache.org/jira/browse/HDFS-7805 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.6.0 Reporter: surendra singh lilhore Assignee: surendra singh lilhore Fix For: 2.7.0 Attachments: HDFS-7805.patch, HDFS-7805_1.patch In my cluster root logger is not console, so when I run namenode recovery tool MetaRecoveryContext.java prompt message is logged in log file. Actually is should be display on console. Currently it is like this {code} LOG.info(prompt); {code} It should be {code} System.err.print(prompt); {code} NameNode recovery prompt should be printed on console -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7009) Active NN and standby NN have different live nodes
[ https://issues.apache.org/jira/browse/HDFS-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334928#comment-14334928 ] Hudson commented on HDFS-7009: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #2046 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2046/]) HDFS-7009. Active NN and standby NN have different live nodes. Contributed by Ming Ma. (cnauroth: rev 769507bd7a501929d9a2fd56c72c3f50673488a4) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDatanodeProtocolRetryPolicy.java * hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Active NN and standby NN have different live nodes -- Key: HDFS-7009 URL: https://issues.apache.org/jira/browse/HDFS-7009 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Ming Ma Assignee: Ming Ma Fix For: 2.7.0 Attachments: HDFS-7009-2.patch, HDFS-7009-3.patch, HDFS-7009-4.patch, HDFS-7009.patch To follow up on https://issues.apache.org/jira/browse/HDFS-6478, in most cases, given DN sends HB and BR to NN regularly, if a specific RPC call fails, it isn't a big deal. However, there are cases where DN fails to register with NN during initial handshake due to exceptions not covered by RPC client's connection retry. When this happens, the DN won't talk to that NN until the DN restarts. {noformat} BPServiceActor public void run() { LOG.info(this + starting to offer service); try { // init stuff try { // setup storage connectToNNAndHandshake(); } catch (IOException ioe) { // Initial handshake, storage recovery or registration failed // End BPOfferService thread LOG.fatal(Initialization failed for block pool + this, ioe); return; } initialized = true; // bp is initialized; while (shouldRun()) { try { offerService(); } catch (Exception ex) { LOG.error(Exception in BPOfferService for + this, ex); sleepAndLogInterrupts(5000, offering service); } } ... {noformat} Here is an example of the call stack. {noformat} java.io.IOException: Failed on local exception: java.io.IOException: Response is null.; Host Details : local host is: xxx; destination host is: yyy:8030; at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:761) at org.apache.hadoop.ipc.Client.call(Client.java:1239) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) at com.sun.proxy.$Proxy9.registerDatanode(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.registerDatanode(DatanodeProtocolClientSideTranslatorPB.java:146) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.register(BPServiceActor.java:623) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:225) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Response is null. at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:949) at org.apache.hadoop.ipc.Client$Connection.run(Client.java:844) {noformat} This will create discrepancy between active NN and standby NN in terms of live nodes. Here is a possible scenario of missing blocks after failover. 1. DN A, B set up handshakes with active NN, but not with standby NN. 2. A block is replicated to DN A, B and C. 3. From standby NN's point of view, given A and B are dead nodes, the block is under replicated. 4. DN C is down. 5. Before active NN detects DN C is down, it fails over. 6. The new active NN considers the block is missing. Even though there are two replicas on DN A and B. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7817) libhdfs3: fix strerror_r detection
[ https://issues.apache.org/jira/browse/HDFS-7817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335048#comment-14335048 ] Thanh Do commented on HDFS-7817: Hi [~cmccabe]. Thanks for pointing out the code. I was grepping the {{hadoop-hdfs}} folder but not {{hadoop-common}}. So this Jira is about using {{sys_errlist}} instead of {{strerror_r}} for libhdfs3 right? libhdfs3: fix strerror_r detection -- Key: HDFS-7817 URL: https://issues.apache.org/jira/browse/HDFS-7817 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Reporter: Colin Patrick McCabe The signature of strerror_r is not quite detected correctly in libhdfs3. The code assumes that {{int foo = strerror_r}} will fail to compile with the GNU type signature, but this is not the case (C\+\+ will coerce the char* to an int in this case). Instead, we should do what the libhdfs {{terror}} (threaded error) function does here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7302) namenode -rollingUpgrade downgrade may finalize a rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-7302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335036#comment-14335036 ] Tsz Wo Nicholas Sze commented on HDFS-7302: --- ... Can I remove all dependencies? ... Yes, we should remove all dependencies since -rollingUpgrade downgrade is no longer a valid option. namenode -rollingUpgrade downgrade may finalize a rolling upgrade - Key: HDFS-7302 URL: https://issues.apache.org/jira/browse/HDFS-7302 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Kai Sasaki Labels: document, hdfs Attachments: HADOOP-7302.1.patch The namenode startup option -rollingUpgrade downgrade is originally designed for downgrading cluster. However, running namenode -rollingUpgrade downgrade with the new software could result in finalizing the ongoing rolling upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7645) Rolling upgrade is restoring blocks from trash multiple times
[ https://issues.apache.org/jira/browse/HDFS-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335069#comment-14335069 ] Arpit Agarwal commented on HDFS-7645: - Hi [~ogikei], thank you for posting a patch. This fix looks incomplete. # The trash must be restored on rollback. Fairly easy to fix this in the same function. If the rollback option was passed and previous exists we call {{doRollback}}. If previous does not exist, restore trash. # On finalize, the trash directories must be deleted. I think this will be handled by {{signalRollingUpgrade}} but I'd have to check it to make sure. TestDataNodeRollingUpgrade should flag both these issues. Rolling upgrade is restoring blocks from trash multiple times - Key: HDFS-7645 URL: https://issues.apache.org/jira/browse/HDFS-7645 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Nathan Roberts Assignee: Keisuke Ogiwara Attachments: HDFS-7645.01.patch When performing an HDFS rolling upgrade, the trash directory is getting restored twice when under normal circumstances it shouldn't need to be restored at all. iiuc, the only time these blocks should be restored is if we need to rollback a rolling upgrade. On a busy cluster, this can cause significant and unnecessary block churn both on the datanodes, and more importantly in the namenode. The two times this happens are: 1) restart of DN onto new software {code} private void doTransition(DataNode datanode, StorageDirectory sd, NamespaceInfo nsInfo, StartupOption startOpt) throws IOException { if (startOpt == StartupOption.ROLLBACK sd.getPreviousDir().exists()) { Preconditions.checkState(!getTrashRootDir(sd).exists(), sd.getPreviousDir() + and + getTrashRootDir(sd) + should not + both be present.); doRollback(sd, nsInfo); // rollback if applicable } else { // Restore all the files in the trash. The restored files are retained // during rolling upgrade rollback. They are deleted during rolling // upgrade downgrade. int restored = restoreBlockFilesFromTrash(getTrashRootDir(sd)); LOG.info(Restored + restored + block files from trash.); } {code} 2) When heartbeat response no longer indicates a rollingupgrade is in progress {code} /** * Signal the current rolling upgrade status as indicated by the NN. * @param inProgress true if a rolling upgrade is in progress */ void signalRollingUpgrade(boolean inProgress) throws IOException { String bpid = getBlockPoolId(); if (inProgress) { dn.getFSDataset().enableTrash(bpid); dn.getFSDataset().setRollingUpgradeMarker(bpid); } else { dn.getFSDataset().restoreTrash(bpid); dn.getFSDataset().clearRollingUpgradeMarker(bpid); } } {code} HDFS-6800 and HDFS-6981 were modifying this behavior making it not completely clear whether this is somehow intentional. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7537) fsck is confusing when dfs.namenode.replication.min 1 missing replicas NN restart
[ https://issues.apache.org/jira/browse/HDFS-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335054#comment-14335054 ] Tsz Wo Nicholas Sze commented on HDFS-7537: --- In Allen’s comment, the Mock-up output shows status as HEALTHY when numUnderMinimalRelicatedBlocks 0. ... I see. Let's keep showing HEALTHY for the moment. When numUnderMinimalRelicatedBlocks 0 and there is no missing/corrupted block, all under minimal replicated blocks have at least one good replica so that they can be replicated and there is no data loss. It makes sense to consider the file system as healthy. Currently, we only have two statuses, HEALTHY and CORRUPT. In the future, we may want to add one more status for this case. BTW, there is a typo: numUnderMinimalRelicatedBlocks should be numUnderMinimalReplicatedBlocks fsck is confusing when dfs.namenode.replication.min 1 missing replicas NN restart - Key: HDFS-7537 URL: https://issues.apache.org/jira/browse/HDFS-7537 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Allen Wittenauer Assignee: GAO Rui Attachments: HDFS-7537.1.patch, dfs-min-2-fsck.png, dfs-min-2.png If minimum replication is set to 2 or higher and some of those replicas are missing and the namenode restarts, it isn't always obvious that the missing replicas are the reason why the namenode isn't leaving safemode. We should improve the output of fsck and the web UI to make it obvious that the missing blocks are from unmet replicas vs. completely/totally missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7645) Rolling upgrade is restoring blocks from trash multiple times
[ https://issues.apache.org/jira/browse/HDFS-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335072#comment-14335072 ] Arpit Agarwal commented on HDFS-7645: - Also the restore from signalRollingUpgrade pointed out by Nathan can probably be deleted. Rolling upgrade is restoring blocks from trash multiple times - Key: HDFS-7645 URL: https://issues.apache.org/jira/browse/HDFS-7645 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Nathan Roberts Assignee: Keisuke Ogiwara Attachments: HDFS-7645.01.patch When performing an HDFS rolling upgrade, the trash directory is getting restored twice when under normal circumstances it shouldn't need to be restored at all. iiuc, the only time these blocks should be restored is if we need to rollback a rolling upgrade. On a busy cluster, this can cause significant and unnecessary block churn both on the datanodes, and more importantly in the namenode. The two times this happens are: 1) restart of DN onto new software {code} private void doTransition(DataNode datanode, StorageDirectory sd, NamespaceInfo nsInfo, StartupOption startOpt) throws IOException { if (startOpt == StartupOption.ROLLBACK sd.getPreviousDir().exists()) { Preconditions.checkState(!getTrashRootDir(sd).exists(), sd.getPreviousDir() + and + getTrashRootDir(sd) + should not + both be present.); doRollback(sd, nsInfo); // rollback if applicable } else { // Restore all the files in the trash. The restored files are retained // during rolling upgrade rollback. They are deleted during rolling // upgrade downgrade. int restored = restoreBlockFilesFromTrash(getTrashRootDir(sd)); LOG.info(Restored + restored + block files from trash.); } {code} 2) When heartbeat response no longer indicates a rollingupgrade is in progress {code} /** * Signal the current rolling upgrade status as indicated by the NN. * @param inProgress true if a rolling upgrade is in progress */ void signalRollingUpgrade(boolean inProgress) throws IOException { String bpid = getBlockPoolId(); if (inProgress) { dn.getFSDataset().enableTrash(bpid); dn.getFSDataset().setRollingUpgradeMarker(bpid); } else { dn.getFSDataset().restoreTrash(bpid); dn.getFSDataset().clearRollingUpgradeMarker(bpid); } } {code} HDFS-6800 and HDFS-6981 were modifying this behavior making it not completely clear whether this is somehow intentional. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7831) Fix the starting index and end condition of the loop in FileDiffList.findEarlierSnapshotBlocks()
[ https://issues.apache.org/jira/browse/HDFS-7831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335213#comment-14335213 ] Hudson commented on HDFS-7831: -- FAILURE: Integrated in Hadoop-trunk-Commit #7189 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7189/]) HDFS-7831. Fix the starting index and end condition of the loop in FileDiffList.findEarlierSnapshotBlocks(). Contributed by Konstantin Shvachko. (jing9: rev 73bcfa99af61e5202f030510db8954c17cba43cc) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/FileDiffList.java Fix the starting index and end condition of the loop in FileDiffList.findEarlierSnapshotBlocks() Key: HDFS-7831 URL: https://issues.apache.org/jira/browse/HDFS-7831 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Fix For: 2.7.0 Attachments: HDFS-7831-01.patch Currently the loop in {{FileDiffList.findEarlierSnapshotBlocks()}} starts from {{insertPoint + 1}}. It should start from {{insertPoint - 1}}. As noted in [Jing's comment|https://issues.apache.org/jira/browse/HDFS-7056?focusedCommentId=14333864page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14333864] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7830) DataNode does not release the volume lock when adding a volume fails.
[ https://issues.apache.org/jira/browse/HDFS-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335216#comment-14335216 ] Lei (Eddy) Xu commented on HDFS-7830: - [~cnauroth] Would you mind to file a separate JIRA and assign to me? Thanks! DataNode does not release the volume lock when adding a volume fails. - Key: HDFS-7830 URL: https://issues.apache.org/jira/browse/HDFS-7830 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu When there is a failure in adding volume process, the {{in_use.lock}} is not released. Also, doing another {{-reconfig}} to remove the new dir in order to cleanup doesn't remove the lock. lsof still shows datanode holding on to the lock file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient
[ https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335212#comment-14335212 ] Jing Zhao commented on HDFS-7435: - Thanks for sharing the thoughts, [~daryn]. Why not posting your current patch first so that we can also have a better understanding about why bumping the DN's min NN version is necessary? PB encoding of block reports is very inefficient Key: HDFS-7435 URL: https://issues.apache.org/jira/browse/HDFS-7435 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.patch Block reports are encoded as a PB repeating long. Repeating fields use an {{ArrayList}} with default capacity of 10. A block report containing tens or hundreds of thousand of longs (3 for each replica) is extremely expensive since the {{ArrayList}} must realloc many times. Also, decoding repeating fields will box the primitive longs which must then be unboxed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7833) DataNode reconfiguration does not recalculate valid volumes required, based on configured failed volumes tolerated.
Chris Nauroth created HDFS-7833: --- Summary: DataNode reconfiguration does not recalculate valid volumes required, based on configured failed volumes tolerated. Key: HDFS-7833 URL: https://issues.apache.org/jira/browse/HDFS-7833 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Chris Nauroth Assignee: Lei (Eddy) Xu DataNode reconfiguration never recalculates {{FsDatasetImpl#validVolsRequired}}. This may cause incorrect behavior of the {{dfs.datanode.failed.volumes.tolerated}} property if reconfiguration causes the DataNode to run with a different total number of volumes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7830) DataNode does not release the volume lock when adding a volume fails.
[ https://issues.apache.org/jira/browse/HDFS-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335230#comment-14335230 ] Chris Nauroth commented on HDFS-7830: - Thank you, Eddy. I filed HDFS-7833. DataNode does not release the volume lock when adding a volume fails. - Key: HDFS-7830 URL: https://issues.apache.org/jira/browse/HDFS-7830 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu When there is a failure in adding volume process, the {{in_use.lock}} is not released. Also, doing another {{-reconfig}} to remove the new dir in order to cleanup doesn't remove the lock. lsof still shows datanode holding on to the lock file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7833) DataNode reconfiguration does not recalculate valid volumes required, based on configured failed volumes tolerated.
[ https://issues.apache.org/jira/browse/HDFS-7833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335228#comment-14335228 ] Chris Nauroth commented on HDFS-7833: - This is a repeat of the comment I mentioned on HDFS-7830. Thank you to [~eddyxu] for volunteering to take assignment of the issue. Another potential problem that I've noticed in the DataNode reconfiguration code is that it never recalculates {{FsDatasetImpl#validVolsRequired}}. This is a final variable calculated as (# volumes configured) - (# volume failures tolerated): {code} this.validVolsRequired = volsConfigured - volFailuresTolerated; {code} If this variable is not updated for DataNode reconfigurations, then it could lead to some unexpected situations. For example: # DataNode starts running with 6 volumes (all healthy) and {{dfs.datanode.failed.volumes.tolerated}} set to 2. # {{FsDatasetImpl#validVolsRequired}} is set to 6 - 2 = 4. # DataNode is reconfigured to run with 8 volumes (all still healthy). # Now 3 volumes fail. The admin would expect the DataNode to abort, but there are 8 - 3 = 5 good volumes left, and {{FsDatasetImpl#validVolsRequired}} is still 4, so {{FsDatasetImpl#hasEnoughResource}} returns true. DataNode reconfiguration does not recalculate valid volumes required, based on configured failed volumes tolerated. --- Key: HDFS-7833 URL: https://issues.apache.org/jira/browse/HDFS-7833 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Chris Nauroth Assignee: Lei (Eddy) Xu DataNode reconfiguration never recalculates {{FsDatasetImpl#validVolsRequired}}. This may cause incorrect behavior of the {{dfs.datanode.failed.volumes.tolerated}} property if reconfiguration causes the DataNode to run with a different total number of volumes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7439) Add BlockOpResponseProto's message to DFSClient's exception message
[ https://issues.apache.org/jira/browse/HDFS-7439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335080#comment-14335080 ] Tsz Wo Nicholas Sze commented on HDFS-7439: --- ... how can I rebuild for this patch? I can’t log in Jenkins WebUI. I mean I already have started another Jenkins build for the patch. For rebuilding, you may click Cancel Patch and then Submit Patch. It will trigger a new build. Add BlockOpResponseProto's message to DFSClient's exception message --- Key: HDFS-7439 URL: https://issues.apache.org/jira/browse/HDFS-7439 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Assignee: Takanobu Asanuma Priority: Minor Attachments: HDFS-7439.1.patch When (BlockOpResponseProto#getStatus() != SUCCESS), it helps with debugging if DFSClient can add BlockOpResponseProto's message to the exception message applications will get. For example, instead of {noformat} throw new IOException(Got error for OP_READ_BLOCK, self= + peer.getLocalAddressString() + , remote= + peer.getRemoteAddressString() + , for file + file + , for pool + block.getBlockPoolId() + block + block.getBlockId() + _ + block.getGenerationStamp()); {noformat} It could be, {noformat} throw new IOException(Got error for OP_READ_BLOCK, self= + peer.getLocalAddressString() + , remote= + peer.getRemoteAddressString() + , for file + file + , for pool + block.getBlockPoolId() + block + block.getBlockId() + _ + block.getGenerationStamp() + , status message + status.getMessage()); {noformat} We might want to check out all the references to BlockOpResponseProto in DFSClient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7439) Add BlockOpResponseProto's message to DFSClient's exception message
[ https://issues.apache.org/jira/browse/HDFS-7439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335088#comment-14335088 ] Tsz Wo Nicholas Sze commented on HDFS-7439: --- There are other places having similar problem: - DFSOutputStream.DataStreamer.createBlockOutputStream(..) - DFSOutputStream.DataStreamer.transfer(..) - RemoteBlockReader2.checkSuccess(..) - Dispatcher.PendingMove.receiveResponse(..) - DataXceiver.replaceBlock(..) The code has similar format {code} if (status != SUCCESS) { if (status == Status.ERROR_ACCESS_TOKEN) { throw new InvalidBlockTokenException(..); } else { throw new IOException(..); } } {code} How about we add a utility method? Add BlockOpResponseProto's message to DFSClient's exception message --- Key: HDFS-7439 URL: https://issues.apache.org/jira/browse/HDFS-7439 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Assignee: Takanobu Asanuma Priority: Minor Attachments: HDFS-7439.1.patch When (BlockOpResponseProto#getStatus() != SUCCESS), it helps with debugging if DFSClient can add BlockOpResponseProto's message to the exception message applications will get. For example, instead of {noformat} throw new IOException(Got error for OP_READ_BLOCK, self= + peer.getLocalAddressString() + , remote= + peer.getRemoteAddressString() + , for file + file + , for pool + block.getBlockPoolId() + block + block.getBlockId() + _ + block.getGenerationStamp()); {noformat} It could be, {noformat} throw new IOException(Got error for OP_READ_BLOCK, self= + peer.getLocalAddressString() + , remote= + peer.getRemoteAddressString() + , for file + file + , for pool + block.getBlockPoolId() + block + block.getBlockId() + _ + block.getGenerationStamp() + , status message + status.getMessage()); {noformat} We might want to check out all the references to BlockOpResponseProto in DFSClient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7435) PB encoding of block reports is very inefficient
[ https://issues.apache.org/jira/browse/HDFS-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335128#comment-14335128 ] Daryn Sharp commented on HDFS-7435: --- Do I have the luxury of bumping the DN's minimum NN version? That would greatly simplify the implementation. It's easy for the NN to use the presence of the protobuf fields to determine if old/new. However the prior patches illustrate it's not so easy for the DN to auto-detect. I believe the standard upgrade procedure is upgrade the NN, then rolling upgrade the DNs. Per above, upgraded NN supports old/new reports from DNs. The only scenario in which a problem can occur is the cluster is fully or partially upgraded, and the NN is downgraded. The new DNs won't be able to communicate with the old NN, hence why I'd like to bump the minimum version so the DN doesn't continue to send block reports that appear to be empty to the old NN. I'd argue that if the NN is downgraded, there's going to be downtime, so you might as well rollback the DNs too. Thoughts? PB encoding of block reports is very inefficient Key: HDFS-7435 URL: https://issues.apache.org/jira/browse/HDFS-7435 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, namenode Affects Versions: 2.0.0-alpha, 3.0.0 Reporter: Daryn Sharp Assignee: Daryn Sharp Priority: Critical Attachments: HDFS-7435.000.patch, HDFS-7435.001.patch, HDFS-7435.002.patch, HDFS-7435.patch Block reports are encoded as a PB repeating long. Repeating fields use an {{ArrayList}} with default capacity of 10. A block report containing tens or hundreds of thousand of longs (3 for each replica) is extremely expensive since the {{ArrayList}} must realloc many times. Also, decoding repeating fields will box the primitive longs which must then be unboxed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7831) Fix the starting index of the loop in FileDiffList.findEarlierSnapshotBlocks().
[ https://issues.apache.org/jira/browse/HDFS-7831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335192#comment-14335192 ] Jing Zhao commented on HDFS-7831: - Thanks for the fix, [~shv]. +1. I will commit it shortly. Fix the starting index of the loop in FileDiffList.findEarlierSnapshotBlocks(). --- Key: HDFS-7831 URL: https://issues.apache.org/jira/browse/HDFS-7831 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Attachments: HDFS-7831-01.patch Currently the loop in {{FileDiffList.findEarlierSnapshotBlocks()}} starts from {{insertPoint + 1}}. It should start from {{insertPoint - 1}}. As noted in [Jing's comment|https://issues.apache.org/jira/browse/HDFS-7056?focusedCommentId=14333864page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14333864] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7789) DFsck should resolve the path to support cross-FS symlinks
[ https://issues.apache.org/jira/browse/HDFS-7789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335182#comment-14335182 ] Gera Shegalov commented on HDFS-7789: - [~lohit], can you review this patch? DFsck should resolve the path to support cross-FS symlinks -- Key: HDFS-7789 URL: https://issues.apache.org/jira/browse/HDFS-7789 Project: Hadoop HDFS Issue Type: Improvement Components: tools Affects Versions: 2.6.0 Reporter: Gera Shegalov Assignee: Gera Shegalov Attachments: HDFS-7789.001.patch DFsck should resolve the specified path such that it can be used in with viewfs and other cross-filesystem symlinks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7830) DataNode does not release the volume lock when adding a volume fails.
[ https://issues.apache.org/jira/browse/HDFS-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335185#comment-14335185 ] Chris Nauroth commented on HDFS-7830: - Hi [~eddyxu]. Another potential problem that I've noticed in the DataNode reconfiguration code is that it never recalculates {{FsDatasetImpl#validVolsRequired}}. This is a {{final}} variable calculated as (# volumes configured) - (# volume failures tolerated): {code} this.validVolsRequired = volsConfigured - volFailuresTolerated; {code} If this variable is not updated for DataNode reconfigurations, then it could lead to some unexpected situations. For example: # DataNode starts running with 6 volumes (all healthy) and {{dfs.datanode.failed.volumes.tolerated}} set to 2. # {{FsDatasetImpl#validVolsRequired}} is set to 6 - 2 = 4. # DataNode is reconfigured to run with 8 volumes (all still healthy). # Now 3 volumes fail. The admin would expect the DataNode to abort, but there are 8 - 3 = 5 good volumes left, and {{FsDatasetImpl#validVolsRequired}} is still 4, so {{FsDatasetImpl#hasEnoughResource}} returns {{true}}. Is this something that makes sense for you to address as part of the patch you're working on now, or would you prefer I file a separate jira to track this? Thanks! DataNode does not release the volume lock when adding a volume fails. - Key: HDFS-7830 URL: https://issues.apache.org/jira/browse/HDFS-7830 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu When there is a failure in adding volume process, the {{in_use.lock}} is not released. Also, doing another {{-reconfig}} to remove the new dir in order to cleanup doesn't remove the lock. lsof still shows datanode holding on to the lock file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7831) Fix the starting index and end condition of the loop in FileDiffList.findEarlierSnapshotBlocks()
[ https://issues.apache.org/jira/browse/HDFS-7831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-7831: Summary: Fix the starting index and end condition of the loop in FileDiffList.findEarlierSnapshotBlocks() (was: Fix the starting index of the loop in FileDiffList.findEarlierSnapshotBlocks().) Fix the starting index and end condition of the loop in FileDiffList.findEarlierSnapshotBlocks() Key: HDFS-7831 URL: https://issues.apache.org/jira/browse/HDFS-7831 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Attachments: HDFS-7831-01.patch Currently the loop in {{FileDiffList.findEarlierSnapshotBlocks()}} starts from {{insertPoint + 1}}. It should start from {{insertPoint - 1}}. As noted in [Jing's comment|https://issues.apache.org/jira/browse/HDFS-7056?focusedCommentId=14333864page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14333864] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7831) Fix the starting index and end condition of the loop in FileDiffList.findEarlierSnapshotBlocks()
[ https://issues.apache.org/jira/browse/HDFS-7831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-7831: Resolution: Fixed Fix Version/s: 2.7.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed this to trunk and branch-2. Fix the starting index and end condition of the loop in FileDiffList.findEarlierSnapshotBlocks() Key: HDFS-7831 URL: https://issues.apache.org/jira/browse/HDFS-7831 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Fix For: 2.7.0 Attachments: HDFS-7831-01.patch Currently the loop in {{FileDiffList.findEarlierSnapshotBlocks()}} starts from {{insertPoint + 1}}. It should start from {{insertPoint - 1}}. As noted in [Jing's comment|https://issues.apache.org/jira/browse/HDFS-7056?focusedCommentId=14333864page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14333864] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-6133) Make Balancer support exclude specified path
[ https://issues.apache.org/jira/browse/HDFS-6133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335287#comment-14335287 ] Yongjun Zhang commented on HDFS-6133: - Hi [~szetszwo], Thanks for your explanation, and sorry for late reply. I agree with your assessment. I wonder if we can update the config property description to say that enabling is not recommended before rolling upgrade is finished? Thanks. Make Balancer support exclude specified path Key: HDFS-6133 URL: https://issues.apache.org/jira/browse/HDFS-6133 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover, datanode Reporter: zhaoyunjiong Assignee: zhaoyunjiong Fix For: 2.7.0 Attachments: HDFS-6133-1.patch, HDFS-6133-10.patch, HDFS-6133-11.patch, HDFS-6133-2.patch, HDFS-6133-3.patch, HDFS-6133-4.patch, HDFS-6133-5.patch, HDFS-6133-6.patch, HDFS-6133-7.patch, HDFS-6133-8.patch, HDFS-6133-9.patch, HDFS-6133.patch Currently, run Balancer will destroying Regionserver's data locality. If getBlocks could exclude blocks belongs to files which have specific path prefix, like /hbase, then we can run Balancer without destroying Regionserver's data locality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7467) Provide storage tier information for a directory via fsck
[ https://issues.apache.org/jira/browse/HDFS-7467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335606#comment-14335606 ] Benoy Antony commented on HDFS-7467: Thanks for the review [~szetszwo]. If there are no further comments, I'll commit the patch tomorrow. Provide storage tier information for a directory via fsck - Key: HDFS-7467 URL: https://issues.apache.org/jira/browse/HDFS-7467 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer mover Affects Versions: 2.6.0 Reporter: Benoy Antony Assignee: Benoy Antony Attachments: HDFS-7467-002.patch, HDFS-7467-003.patch, HDFS-7467-004.patch, HDFS-7467.patch, storagepolicydisplay.pdf Currently _fsck_ provides information regarding blocks for a directory. It should be augmented to provide storage tier information (optionally). The sample report could be as follows : {code} Storage Tier Combination# of blocks % of blocks DISK:1,ARCHIVE:2 340730 97.7393% ARCHIVE:3 39281.1268% DISK:2,ARCHIVE:231220.8956% DISK:2,ARCHIVE:1 7480.2146% DISK:1,ARCHIVE:3 440.0126% DISK:3,ARCHIVE:2 300.0086% DISK:3,ARCHIVE:1 90.0026% {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7668) Convert site documentation from apt to markdown
[ https://issues.apache.org/jira/browse/HDFS-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-7668: --- Target Version/s: 2.7.0 (was: 3.0.0) Affects Version/s: (was: 3.0.0) 2.7.0 Fix Version/s: (was: 3.0.0) 2.7.0 Convert site documentation from apt to markdown --- Key: HDFS-7668 URL: https://issues.apache.org/jira/browse/HDFS-7668 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Affects Versions: 2.7.0 Reporter: Allen Wittenauer Assignee: Masatake Iwasaki Fix For: 2.7.0 Attachments: HDFS-7668-00.patch, HDFS-7668-01.patch, HDFS-7668-b2.001.patch HDFS analog to HADOOP-11495 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7537) fsck is confusing when dfs.namenode.replication.min 1 missing replicas NN restart
[ https://issues.apache.org/jira/browse/HDFS-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336111#comment-14336111 ] GAO Rui commented on HDFS-7537: --- I have attached a new patch which added DFSConfigKeys.DFS_NAMENODE_REPLICATION_MIN_KEY to the output of fsck and made a unit test to confirm this change. Please review that when you are free, thanks a lot. fsck is confusing when dfs.namenode.replication.min 1 missing replicas NN restart - Key: HDFS-7537 URL: https://issues.apache.org/jira/browse/HDFS-7537 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Allen Wittenauer Assignee: GAO Rui Attachments: HDFS-7537.1.patch, HDFS-7537.2.patch, dfs-min-2-fsck.png, dfs-min-2.png If minimum replication is set to 2 or higher and some of those replicas are missing and the namenode restarts, it isn't always obvious that the missing replicas are the reason why the namenode isn't leaving safemode. We should improve the output of fsck and the web UI to make it obvious that the missing blocks are from unmet replicas vs. completely/totally missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7537) fsck is confusing when dfs.namenode.replication.min 1 missing replicas NN restart
[ https://issues.apache.org/jira/browse/HDFS-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336110#comment-14336110 ] GAO Rui commented on HDFS-7537: --- I have attached a new patch which added DFSConfigKeys.DFS_NAMENODE_REPLICATION_MIN_KEY to the output of fsck and made a unit test to confirm this change. Please review that when you are free, thanks a lot. fsck is confusing when dfs.namenode.replication.min 1 missing replicas NN restart - Key: HDFS-7537 URL: https://issues.apache.org/jira/browse/HDFS-7537 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Allen Wittenauer Assignee: GAO Rui Attachments: HDFS-7537.1.patch, HDFS-7537.2.patch, dfs-min-2-fsck.png, dfs-min-2.png If minimum replication is set to 2 or higher and some of those replicas are missing and the namenode restarts, it isn't always obvious that the missing replicas are the reason why the namenode isn't leaving safemode. We should improve the output of fsck and the web UI to make it obvious that the missing blocks are from unmet replicas vs. completely/totally missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7827) Erasure Coding: support striped blocks in non-protobuf fsimage
[ https://issues.apache.org/jira/browse/HDFS-7827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336134#comment-14336134 ] Jing Zhao commented on HDFS-7827: - Sure. Assign the jira to you. Thanks for working on this, Hui! Erasure Coding: support striped blocks in non-protobuf fsimage -- Key: HDFS-7827 URL: https://issues.apache.org/jira/browse/HDFS-7827 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Hui Zheng HDFS-7749 only adds code to persist striped blocks to protobuf-based fsimage. We should also add this support to the non-protobuf fsimage since it is still used for use cases like offline image processing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7302) namenode -rollingUpgrade downgrade may finalize a rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-7302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kai Sasaki updated HDFS-7302: - Attachment: HDFS-7302.2.patch namenode -rollingUpgrade downgrade may finalize a rolling upgrade - Key: HDFS-7302 URL: https://issues.apache.org/jira/browse/HDFS-7302 Project: Hadoop HDFS Issue Type: Bug Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Kai Sasaki Labels: document, hdfs Attachments: HADOOP-7302.1.patch, HDFS-7302.2.patch The namenode startup option -rollingUpgrade downgrade is originally designed for downgrading cluster. However, running namenode -rollingUpgrade downgrade with the new software could result in finalizing the ongoing rolling upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7749) Erasure Coding: Add striped block support in INodeFile
[ https://issues.apache.org/jira/browse/HDFS-7749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hui Zheng reassigned HDFS-7749: --- Assignee: Hui Zheng (was: Jing Zhao) Erasure Coding: Add striped block support in INodeFile -- Key: HDFS-7749 URL: https://issues.apache.org/jira/browse/HDFS-7749 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Hui Zheng Attachments: HDFS-7749.000.patch This jira plan to add a new INodeFile feature to store the stripped blocks information in case that the INodeFile is erasure coded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7537) fsck is confusing when dfs.namenode.replication.min 1 missing replicas NN restart
[ https://issues.apache.org/jira/browse/HDFS-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] GAO Rui updated HDFS-7537: -- Attachment: HDFS-7537.2.patch fsck is confusing when dfs.namenode.replication.min 1 missing replicas NN restart - Key: HDFS-7537 URL: https://issues.apache.org/jira/browse/HDFS-7537 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Allen Wittenauer Assignee: GAO Rui Attachments: HDFS-7537.1.patch, HDFS-7537.2.patch, dfs-min-2-fsck.png, dfs-min-2.png If minimum replication is set to 2 or higher and some of those replicas are missing and the namenode restarts, it isn't always obvious that the missing replicas are the reason why the namenode isn't leaving safemode. We should improve the output of fsck and the web UI to make it obvious that the missing blocks are from unmet replicas vs. completely/totally missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7827) Erasure Coding: support striped blocks in non-protobuf fsimage
[ https://issues.apache.org/jira/browse/HDFS-7827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336117#comment-14336117 ] Hui Zheng commented on HDFS-7827: - Hi Jing I would like to work on this jira. Could you assign it to me? Erasure Coding: support striped blocks in non-protobuf fsimage -- Key: HDFS-7827 URL: https://issues.apache.org/jira/browse/HDFS-7827 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao HDFS-7749 only adds code to persist striped blocks to protobuf-based fsimage. We should also add this support to the non-protobuf fsimage since it is still used for use cases like offline image processing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7827) Erasure Coding: support striped blocks in non-protobuf fsimage
[ https://issues.apache.org/jira/browse/HDFS-7827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-7827: Assignee: Hui Zheng (was: Jing Zhao) Erasure Coding: support striped blocks in non-protobuf fsimage -- Key: HDFS-7827 URL: https://issues.apache.org/jira/browse/HDFS-7827 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Hui Zheng HDFS-7749 only adds code to persist striped blocks to protobuf-based fsimage. We should also add this support to the non-protobuf fsimage since it is still used for use cases like offline image processing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HDFS-7749) Erasure Coding: Add striped block support in INodeFile
[ https://issues.apache.org/jira/browse/HDFS-7749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao reassigned HDFS-7749: --- Assignee: Jing Zhao (was: Hui Zheng) Erasure Coding: Add striped block support in INodeFile -- Key: HDFS-7749 URL: https://issues.apache.org/jira/browse/HDFS-7749 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-7749.000.patch This jira plan to add a new INodeFile feature to store the stripped blocks information in case that the INodeFile is erasure coded. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7439) Add BlockOpResponseProto's message to DFSClient's exception message
[ https://issues.apache.org/jira/browse/HDFS-7439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335824#comment-14335824 ] Takanobu Asanuma commented on HDFS-7439: Sorry for my misunderstanding. I understand how to rebuild. But, test was failed again. Is my patch the cause? I also try to add a utility methods. Thank you! Add BlockOpResponseProto's message to DFSClient's exception message --- Key: HDFS-7439 URL: https://issues.apache.org/jira/browse/HDFS-7439 Project: Hadoop HDFS Issue Type: Improvement Reporter: Ming Ma Assignee: Takanobu Asanuma Priority: Minor Attachments: HDFS-7439.1.patch When (BlockOpResponseProto#getStatus() != SUCCESS), it helps with debugging if DFSClient can add BlockOpResponseProto's message to the exception message applications will get. For example, instead of {noformat} throw new IOException(Got error for OP_READ_BLOCK, self= + peer.getLocalAddressString() + , remote= + peer.getRemoteAddressString() + , for file + file + , for pool + block.getBlockPoolId() + block + block.getBlockId() + _ + block.getGenerationStamp()); {noformat} It could be, {noformat} throw new IOException(Got error for OP_READ_BLOCK, self= + peer.getLocalAddressString() + , remote= + peer.getRemoteAddressString() + , for file + file + , for pool + block.getBlockPoolId() + block + block.getBlockId() + _ + block.getGenerationStamp() + , status message + status.getMessage()); {noformat} We might want to check out all the references to BlockOpResponseProto in DFSClient. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7839) Erasure coding: move EC policies from file header to XAttr
Zhe Zhang created HDFS-7839: --- Summary: Erasure coding: move EC policies from file header to XAttr Key: HDFS-7839 URL: https://issues.apache.org/jira/browse/HDFS-7839 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Zhe Zhang Assignee: Zhe Zhang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7537) fsck is confusing when dfs.namenode.replication.min 1 missing replicas NN restart
[ https://issues.apache.org/jira/browse/HDFS-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335528#comment-14335528 ] Allen Wittenauer commented on HDFS-7537: Also: I'm not sure what to do about the web UI component. It may not be necessary; better practice should be to run fsck under situations like these. fsck is confusing when dfs.namenode.replication.min 1 missing replicas NN restart - Key: HDFS-7537 URL: https://issues.apache.org/jira/browse/HDFS-7537 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Allen Wittenauer Assignee: GAO Rui Attachments: HDFS-7537.1.patch, dfs-min-2-fsck.png, dfs-min-2.png If minimum replication is set to 2 or higher and some of those replicas are missing and the namenode restarts, it isn't always obvious that the missing replicas are the reason why the namenode isn't leaving safemode. We should improve the output of fsck and the web UI to make it obvious that the missing blocks are from unmet replicas vs. completely/totally missing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7836) BlockManager Scalability Improvements
[ https://issues.apache.org/jira/browse/HDFS-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-7836: Issue Type: Improvement (was: Bug) BlockManager Scalability Improvements - Key: HDFS-7836 URL: https://issues.apache.org/jira/browse/HDFS-7836 Project: Hadoop HDFS Issue Type: Improvement Reporter: Charles Lamb Assignee: Charles Lamb Improvements to BlockManager scalability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7722) DataNode#checkDiskError should also remove Storage when error is found.
[ https://issues.apache.org/jira/browse/HDFS-7722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335752#comment-14335752 ] Lei (Eddy) Xu commented on HDFS-7722: - {{TestDataNodeVolumeFailureReporting}} is relevant. I will work on fixing it. DataNode#checkDiskError should also remove Storage when error is found. --- Key: HDFS-7722 URL: https://issues.apache.org/jira/browse/HDFS-7722 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-7722.000.patch, HDFS-7722.001.patch When {{DataNode#checkDiskError}} found disk errors, it removes all block metadatas from {{FsDatasetImpl}}. However, it does not removed the corresponding {{DataStorage}} and {{BlockPoolSliceStorage}}. The result is that, we could not directly run {{reconfig}} to hot swap the failure disks without changing the configure file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7668) Convert site documentation from apt to markdown
[ https://issues.apache.org/jira/browse/HDFS-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335753#comment-14335753 ] Masatake Iwasaki commented on HDFS-7668: Thanks, [~cmccabe]! Convert site documentation from apt to markdown --- Key: HDFS-7668 URL: https://issues.apache.org/jira/browse/HDFS-7668 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Affects Versions: 2.7.0 Reporter: Allen Wittenauer Assignee: Masatake Iwasaki Fix For: 2.7.0 Attachments: HDFS-7668-00.patch, HDFS-7668-01.patch, HDFS-7668-b2.001.patch HDFS analog to HADOOP-11495 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-7838) Expose truncate API for libhdfs
Yi Liu created HDFS-7838: Summary: Expose truncate API for libhdfs Key: HDFS-7838 URL: https://issues.apache.org/jira/browse/HDFS-7838 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.7.0 Reporter: Yi Liu Assignee: Yi Liu It's good to expose truncate in libhdfs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7495) Remove updatePosition argument from DFSInputStream#getBlockAt()
[ https://issues.apache.org/jira/browse/HDFS-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335809#comment-14335809 ] Yi Liu commented on HDFS-7495: -- Good catch. +1 for the latest patch. Thanks Ted and Colin. Remove updatePosition argument from DFSInputStream#getBlockAt() --- Key: HDFS-7495 URL: https://issues.apache.org/jira/browse/HDFS-7495 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.7.0 Reporter: Ted Yu Assignee: Colin Patrick McCabe Priority: Minor Attachments: HDFS-7495.002.patch, hdfs-7495-001.patch There're two locks: one on DFSInputStream.this , one on DFSInputStream.infoLock Normally lock is obtained on infoLock, then on DFSInputStream.infoLock However, such order is not observed in DFSInputStream#getBlockAt() : {code} synchronized(infoLock) { ... if (updatePosition) { // synchronized not strictly needed, since we only get here // from synchronized caller methods synchronized(this) { {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7835) make initial sleeptime in locateFollowingBlock configurable for DFSClient.
[ https://issues.apache.org/jira/browse/HDFS-7835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335590#comment-14335590 ] Hadoop QA commented on HDFS-7835: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700586/HDFS-7835.000.patch against trunk revision 9a37247. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 1155 javac compiler warnings (more than the trunk's current 185 warnings). {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 47 warning messages. See https://builds.apache.org/job/PreCommit-HDFS-Build/9659//artifact/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9659//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/9659//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9659//console This message is automatically generated. make initial sleeptime in locateFollowingBlock configurable for DFSClient. -- Key: HDFS-7835 URL: https://issues.apache.org/jira/browse/HDFS-7835 Project: Hadoop HDFS Issue Type: Improvement Components: dfsclient Reporter: zhihai xu Assignee: zhihai xu Attachments: HDFS-7835.000.patch Make initial sleeptime in locateFollowingBlock configurable for DFSClient. Current the sleeptime/localTimeout in locateFollowingBlock/completeFile from DFSOutputStream is hard-coded as 400 ms, but retries can be configured by dfs.client.block.write.locateFollowingBlock.retries. We should also make the initial sleeptime configurable to give user more flexibility to control both retry and delay. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-1447) Make getGenerationStampFromFile() more efficient, so it doesn't reprocess full directory listing for every block
[ https://issues.apache.org/jira/browse/HDFS-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allen Wittenauer updated HDFS-1447: --- Status: Patch Available (was: Open) Make getGenerationStampFromFile() more efficient, so it doesn't reprocess full directory listing for every block Key: HDFS-1447 URL: https://issues.apache.org/jira/browse/HDFS-1447 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: 0.20.2 Reporter: Matt Foley Assignee: Matt Foley Attachments: HDFS-1447.patch, Test_HDFS_1447_NotForCommitt.java.patch Make getGenerationStampFromFile() more efficient. Currently this routine is called by addToReplicasMap() for every blockfile in the directory tree, and it walks each file's containing directory on every call. There is a simple refactoring that should make it more efficient. This work item is one of four sub-tasks for HDFS-1443, Improve Datanode startup time. The fix will probably be folded into sibling task HDFS-1446, which is already refactoring the method that calls getGenerationStampFromFile(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7668) Convert site documentation from apt to markdown
[ https://issues.apache.org/jira/browse/HDFS-7668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335714#comment-14335714 ] Colin Patrick McCabe commented on HDFS-7668: +1 for the backport. Thanks, [~iwasakims]. Convert site documentation from apt to markdown --- Key: HDFS-7668 URL: https://issues.apache.org/jira/browse/HDFS-7668 Project: Hadoop HDFS Issue Type: Improvement Components: documentation Affects Versions: 3.0.0 Reporter: Allen Wittenauer Assignee: Masatake Iwasaki Fix For: 3.0.0 Attachments: HDFS-7668-00.patch, HDFS-7668-01.patch, HDFS-7668-b2.001.patch HDFS analog to HADOOP-11495 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7837) Erasure Coding: allocate and persist striped blocks in FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-7837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-7837: Summary: Erasure Coding: allocate and persist striped blocks in FSNamesystem (was: Allocate and persist striped blocks in FSNamesystem) Erasure Coding: allocate and persist striped blocks in FSNamesystem --- Key: HDFS-7837 URL: https://issues.apache.org/jira/browse/HDFS-7837 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Try to finish the remaining work from HDFS-7339 (except the ClientProtocol/DFSClient part): # Allow FSNamesystem#getAdditionalBlock to create striped blocks and persist striped blocks to editlog # Update FSImage for max allocated striped block ID # Update the block commit/complete logic in BlockManager -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7411) Refactor and improve decommissioning logic into DecommissionManager
[ https://issues.apache.org/jira/browse/HDFS-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335738#comment-14335738 ] Hadoop QA commented on HDFS-7411: - {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700583/hdfs-7411.011.patch against trunk revision 9a37247. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9658//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9658//console This message is automatically generated. Refactor and improve decommissioning logic into DecommissionManager --- Key: HDFS-7411 URL: https://issues.apache.org/jira/browse/HDFS-7411 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.5.1 Reporter: Andrew Wang Assignee: Andrew Wang Attachments: hdfs-7411.001.patch, hdfs-7411.002.patch, hdfs-7411.003.patch, hdfs-7411.004.patch, hdfs-7411.005.patch, hdfs-7411.006.patch, hdfs-7411.007.patch, hdfs-7411.008.patch, hdfs-7411.009.patch, hdfs-7411.010.patch, hdfs-7411.011.patch Would be nice to split out decommission logic from DatanodeManager to DecommissionManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7837) Erasure Coding: allocate and persist striped blocks in FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-7837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-7837: Attachment: HDFS-7837.000.patch Patch depending on HDFS-7749. The patch also includes a unit test to make sure the striped blocks is correctly written to and loaded from editlog and fsimage. Erasure Coding: allocate and persist striped blocks in FSNamesystem --- Key: HDFS-7837 URL: https://issues.apache.org/jira/browse/HDFS-7837 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-7837.000.patch Try to finish the remaining work from HDFS-7339 (except the ClientProtocol/DFSClient part): # Allow FSNamesystem#getAdditionalBlock to create striped blocks and persist striped blocks to editlog # Update FSImage for max allocated striped block ID # Update the block commit/complete logic in BlockManager -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7763) fix zkfc hung issue due to not catching exception in a corner case
[ https://issues.apache.org/jira/browse/HDFS-7763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335675#comment-14335675 ] Hudson commented on HDFS-7763: -- FAILURE: Integrated in Hadoop-trunk-Commit #7193 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/7193/]) HDFS-7763. fix zkfc hung issue due to not catching exception in a corner case. Contributed by Liang Xie. (wang: rev 7105ebaa9f370db04962a1e19a67073dc080433b) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSZKFailoverController.java fix zkfc hung issue due to not catching exception in a corner case -- Key: HDFS-7763 URL: https://issues.apache.org/jira/browse/HDFS-7763 Project: Hadoop HDFS Issue Type: Bug Components: ha Affects Versions: 2.6.0 Reporter: Liang Xie Assignee: Liang Xie Fix For: 2.7.0 Attachments: HDFS-7763-001.txt, HDFS-7763-002.txt, jstack.4936 In our product cluster, we hit both the two zkfc process is hung after a zk network outage. the zkfc log said: {code} 2015-02-07,17:40:11,875 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 3334ms for sessionid 0x4a61bacdd9dfb2, closing socket connection and attempting reconnect 2015-02-07,17:40:11,977 FATAL org.apache.hadoop.ha.ActiveStandbyElector: Received stat error from Zookeeper. code:CONNECTIONLOSS. Not retrying further znode monitoring connection errors. 2015-02-07,17:40:12,425 INFO org.apache.zookeeper.ZooKeeper: Session: 0x4a61bacdd9dfb2 closed 2015-02-07,17:40:12,425 FATAL org.apache.hadoop.ha.ZKFailoverController: Fatal error occurred:Received stat error from Zookeeper. code:CONNECTIONLOSS. Not retrying further znode monitoring connection errors. 2015-02-07,17:40:12,425 INFO org.apache.hadoop.ipc.Server: Stopping server on 11300 2015-02-07,17:40:12,425 WARN org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x4a61bacdd9dfb2 2015-02-07,17:40:12,426 WARN org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x4a61bacdd9dfb2 2015-02-07,17:40:12,426 WARN org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x4a61bacdd9dfb2 2015-02-07,17:40:12,426 WARN org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x4a61bacdd9dfb2 2015-02-07,17:40:12,426 WARN org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x4a61bacdd9dfb2 2015-02-07,17:40:12,426 WARN org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x4a61bacdd9dfb2 2015-02-07,17:40:12,426 WARN org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x4a61bacdd9dfb2 2015-02-07,17:40:12,426 WARN org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x4a61bacdd9dfb2 2015-02-07,17:40:12,426 WARN org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x4a61bacdd9dfb2 2015-02-07,17:40:12,426 WARN org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x4a61bacdd9dfb2 2015-02-07,17:40:12,426 WARN org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x4a61bacdd9dfb2 2015-02-07,17:40:12,426 WARN org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x4a61bacdd9dfb2 2015-02-07,17:40:12,426 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down 2015-02-07,17:40:12,426 INFO org.apache.hadoop.ha.ActiveStandbyElector: Yielding from election 2015-02-07,17:40:12,426 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder 2015-02-07,17:40:12,426 INFO org.apache.hadoop.ha.HealthMonitor: Stopping HealthMonitor thread 2015-02-07,17:40:12,426 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 11300 {code} and the thread dump also be uploaded as attachment. From the dump, we can see due to the unknown non-daemon threads(pool-*-thread-*), the process did not exit, but the critical threads, like health monitor and rpc threads had been stopped, so our watchdog(supervisord) had not not observed the zkfc process is down or abnormal. so the following namenode failover could not be done as expected. there're two possible fixes here, 1) figure out the unset-thread-name, like pool-7-thread-1, where them came from and close or set daemon property. i tried to search but got nothing right now. 2) catch the exception from ZKFailoverController.run() so we
[jira] [Commented] (HDFS-7836) BlockManager Scalability Improvements
[ https://issues.apache.org/jira/browse/HDFS-7836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335747#comment-14335747 ] Charles Lamb commented on HDFS-7836: Problem Statement The number of blocks stored by the largest HDFS clusters continues to increase. This increase adds pressure to the BlockManager, that part of the NameNode which handles block data from across the cluster. Full block reports are problematic. The more blocks each DataNode has, the longer it takes to process a full block report from that DataNode. Storage densities have roughly doubled each year for the past few years. Meanwhile, increases in CPU power have come mostly in the form of additional cores rather than faster clock speeds. Currently, the NameNode cannot use these additional cores because full block reports are processed while holding the namesystem lock. The BlockManager stores all blocks in memory and this contributes to a large heap size. As the NameNode Java heap size has grown, full garbage collection events have started to take several minutes. Although it is often possible to avoid full GCs by re-using Java objects, they remain an operational concern for administrators. They also contribute to a long NameNode startup time, sometimes measured in tens of minutes for the biggest clusters. Goals We need to improve the BlockManager to handle the challenges of the next few years. Our specific goals for this project are to: * Reduce lock contention for the FSNamesystem lock * Enable concurrent processing of block reports * Reduce the Java heap size of the NameNode * Optimize the use of network resources [~cmccabe] and I will be working on this Jira. We propose doing this work on a separate branch. If there is interest in a community meeting to discuss these changes, then perhaps Tuesday 3/10/15 at Cloudera in Palo Alto, CA would work? I suggest that date because I will be in the bay area that day and would like to meet with other interested community members in person. I'll also be around 3/11 and 3/12 if we need an alternate date. BlockManager Scalability Improvements - Key: HDFS-7836 URL: https://issues.apache.org/jira/browse/HDFS-7836 Project: Hadoop HDFS Issue Type: Improvement Reporter: Charles Lamb Assignee: Charles Lamb Improvements to BlockManager scalability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7722) DataNode#checkDiskError should also remove Storage when error is found.
[ https://issues.apache.org/jira/browse/HDFS-7722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335745#comment-14335745 ] Hadoop QA commented on HDFS-7722: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700593/HDFS-7722.001.patch against trunk revision 9a37247. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.datanode.TestFsDatasetCache org.apache.hadoop.hdfs.server.datanode.TestDataNodeVolumeFailureReporting The following test timeouts occurred in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.qjournal.client.TestQJMWithFaults Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/9660//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/9660//console This message is automatically generated. DataNode#checkDiskError should also remove Storage when error is found. --- Key: HDFS-7722 URL: https://issues.apache.org/jira/browse/HDFS-7722 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Lei (Eddy) Xu Assignee: Lei (Eddy) Xu Attachments: HDFS-7722.000.patch, HDFS-7722.001.patch When {{DataNode#checkDiskError}} found disk errors, it removes all block metadatas from {{FsDatasetImpl}}. However, it does not removed the corresponding {{DataStorage}} and {{BlockPoolSliceStorage}}. The result is that, we could not directly run {{reconfig}} to hot swap the failure disks without changing the configure file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)