[jira] [Resolved] (HDFS-5820) TesHDFSCLI does not work for user names with '-'
[ https://issues.apache.org/jira/browse/HDFS-5820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov resolved HDFS-5820. - Resolution: Duplicate TesHDFSCLI does not work for user names with '-' Key: HDFS-5820 URL: https://issues.apache.org/jira/browse/HDFS-5820 Project: Hadoop HDFS Issue Type: Bug Reporter: Gera Shegalov -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5879) TestHftpFileSystem: testFileNameEncoding and testSeek leak open fsdis
[ https://issues.apache.org/jira/browse/HDFS-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated HDFS-5879: Fix Version/s: 2.3.0 Status: Patch Available (was: Open) TestHftpFileSystem: testFileNameEncoding and testSeek leak open fsdis - Key: HDFS-5879 URL: https://issues.apache.org/jira/browse/HDFS-5879 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Gera Shegalov Fix For: 2.3.0 Attachments: HDFS-5879.v01.patch FSDataInputStream should be closed once no longer needed for reading in testFileNameEncoding and testSeek. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5879) TestHftpFileSystem: testFileNameEncoding and testSeek leak open fsdis
[ https://issues.apache.org/jira/browse/HDFS-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov updated HDFS-5879: Status: Open (was: Patch Available) TestHftpFileSystem: testFileNameEncoding and testSeek leak open fsdis - Key: HDFS-5879 URL: https://issues.apache.org/jira/browse/HDFS-5879 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Gera Shegalov Attachments: HDFS-5879.v01.patch FSDataInputStream should be closed once no longer needed for reading in testFileNameEncoding and testSeek. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-3405) Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged fsimages
[ https://issues.apache.org/jira/browse/HDFS-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-3405: Attachment: HDFS-3405.patch Attaching the updated patch as per Andrew's comments. 1. {{GetImageServlet}} renamed to {{ImageServlet}}, *but patch shows complete deletion and addition of files.* 2. url /getimage changed to /imagetransfer All other comments also addressed except below. bq. We also should update the hdfs-default.xml description of dfs.image.transfer.timeout, since it is in fact a socket timeout now that the GETs aren't nested. I think we should also lower it back down to 60s, a more normal value for a socket timeout. Javadoc updated. But changing default value back may need some more test and conclusion for the optimal value. bq. In writeFileToPutRequest, we can use IOUtils.copy instead of doing our own buffering. The existing for loop syntax is also messy, it'd be better as a while loop. Instead of this, re-used existing code to use throttler. bq. Have you tried this with a large fsimage, e.g. 2GB+? Last time I looked into this, we could sometimes run into issues with HttpURLConnection. This is not done yet. I will do it and post results. bq. as well as unit tests for secure HA/non-HA environments if we don't already have that, and the SPNEGO stuff. I am not sure these tests are already there is secure mode or not. But I have tested manually. Works fine. I will try to findout how to add secure tests. Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged fsimages Key: HDFS-3405 URL: https://issues.apache.org/jira/browse/HDFS-3405 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 1.0.0, 3.0.0, 2.0.5-alpha Reporter: Aaron T. Myers Assignee: Vinayakumar B Attachments: HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch As Todd points out in [this comment|https://issues.apache.org/jira/browse/HDFS-3404?focusedCommentId=13272986page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13272986], the current scheme for a checkpointing daemon to upload a merged fsimage file to an NN is to issue an HTTP get request to tell the target NN to issue another GET request back to the checkpointing daemon to retrieve the merged fsimage file. There's no fundamental reason the checkpointing daemon can't just use an HTTP POST or PUT to send back the merged fsimage file, rather than the double-GET scheme. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5929) Add Block pool % usage to HDFS federated nn page
[ https://issues.apache.org/jira/browse/HDFS-5929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899005#comment-13899005 ] Hudson commented on HDFS-5929: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #479 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/479/]) HDFS-5929. Add blockpool % usage to HDFS federated nn page. Contributed by Siqi Li. (suresh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567411) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ClusterJspHelper.java Add Block pool % usage to HDFS federated nn page Key: HDFS-5929 URL: https://issues.apache.org/jira/browse/HDFS-5929 Project: Hadoop HDFS Issue Type: Improvement Components: federation Affects Versions: 2.0.0-alpha Reporter: Siqi Li Assignee: Siqi Li Fix For: 2.4.0 Attachments: HDFS-5929.v1.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4858) HDFS DataNode to NameNode RPC should timeout
[ https://issues.apache.org/jira/browse/HDFS-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899006#comment-13899006 ] Hudson commented on HDFS-4858: -- SUCCESS: Integrated in Hadoop-Yarn-trunk #479 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/479/]) HDFS-4858. HDFS DataNode to NameNode RPC should timeout. Contributed by Henry Wang. (shv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567535) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java HDFS DataNode to NameNode RPC should timeout Key: HDFS-4858 URL: https://issues.apache.org/jira/browse/HDFS-4858 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.1.0-beta, 2.0.4-alpha, 2.0.5-alpha Environment: Redhat/CentOS 6.4 64 bit Linux Reporter: Jagane Sundar Assignee: Henry Wang Priority: Minor Fix For: 2.4.0 Attachments: HDFS-4858.patch, HDFS-4858.patch, HDFS-4858.patch The DataNode is configured with ipc.client.ping false and ipc.ping.interval 14000. This configuration means that the IPC Client (DataNode, in this case) should timeout in 14000 seconds if the Standby NameNode does not respond to a sendHeartbeat. What we observe is this: If the Standby NameNode happens to reboot for any reason, the DataNodes that are heartbeating to this Standby get stuck forever while trying to sendHeartbeat. See Stack trace included below. When the Standby NameNode comes back up, we find that the DataNode never re-registers with the Standby NameNode. Thereafter failover completely fails. The desired behavior is that the DataNode's sendHeartbeat should timeout in 14 seconds, and keep retrying till the Standby NameNode comes back up. When it does, the DataNode should reconnect, re-register, and offer service. Specifically, in the class DatanodeProtocolClientSideTranslatorPB.java, the method createNamenode should use RPC.getProtocolProxy and not RPC.getProxy to create the DatanodeProtocolPB object. Stack trace of thread stuck in the DataNode after the Standby NN has rebooted: Thread 25 (DataNode: [file:///opt/hadoop/data] heartbeating to vmhost6-vm1/10.10.10.151:8020): State: WAITING Blocked count: 23843 Waited count: 45676 Waiting on org.apache.hadoop.ipc.Client$Call@305ab6c5 Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.hadoop.ipc.Client.call(Client.java:1220) org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) sun.proxy.$Proxy10.sendHeartbeat(Unknown Source) sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) java.lang.reflect.Method.invoke(Method.java:597) org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) sun.proxy.$Proxy10.sendHeartbeat(Unknown Source) org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:167) org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:445) org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525) org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676) java.lang.Thread.run(Thread.java:662) DataNode RPC to Standby NameNode never times out. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-3405) Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged fsimages
[ https://issues.apache.org/jira/browse/HDFS-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899077#comment-13899077 ] Hadoop QA commented on HDFS-3405: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12628464/HDFS-3405.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestTransferFsImage org.apache.hadoop.hdfs.TestDatanodeConfig {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6122//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6122//console This message is automatically generated. Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged fsimages Key: HDFS-3405 URL: https://issues.apache.org/jira/browse/HDFS-3405 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 1.0.0, 3.0.0, 2.0.5-alpha Reporter: Aaron T. Myers Assignee: Vinayakumar B Attachments: HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch As Todd points out in [this comment|https://issues.apache.org/jira/browse/HDFS-3404?focusedCommentId=13272986page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13272986], the current scheme for a checkpointing daemon to upload a merged fsimage file to an NN is to issue an HTTP get request to tell the target NN to issue another GET request back to the checkpointing daemon to retrieve the merged fsimage file. There's no fundamental reason the checkpointing daemon can't just use an HTTP POST or PUT to send back the merged fsimage file, rather than the double-GET scheme. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5929) Add Block pool % usage to HDFS federated nn page
[ https://issues.apache.org/jira/browse/HDFS-5929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899105#comment-13899105 ] Hudson commented on HDFS-5929: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1671 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1671/]) HDFS-5929. Add blockpool % usage to HDFS federated nn page. Contributed by Siqi Li. (suresh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567411) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ClusterJspHelper.java Add Block pool % usage to HDFS federated nn page Key: HDFS-5929 URL: https://issues.apache.org/jira/browse/HDFS-5929 Project: Hadoop HDFS Issue Type: Improvement Components: federation Affects Versions: 2.0.0-alpha Reporter: Siqi Li Assignee: Siqi Li Fix For: 2.4.0 Attachments: HDFS-5929.v1.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4858) HDFS DataNode to NameNode RPC should timeout
[ https://issues.apache.org/jira/browse/HDFS-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899106#comment-13899106 ] Hudson commented on HDFS-4858: -- SUCCESS: Integrated in Hadoop-Hdfs-trunk #1671 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1671/]) HDFS-4858. HDFS DataNode to NameNode RPC should timeout. Contributed by Henry Wang. (shv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567535) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java HDFS DataNode to NameNode RPC should timeout Key: HDFS-4858 URL: https://issues.apache.org/jira/browse/HDFS-4858 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.1.0-beta, 2.0.4-alpha, 2.0.5-alpha Environment: Redhat/CentOS 6.4 64 bit Linux Reporter: Jagane Sundar Assignee: Henry Wang Priority: Minor Fix For: 2.4.0 Attachments: HDFS-4858.patch, HDFS-4858.patch, HDFS-4858.patch The DataNode is configured with ipc.client.ping false and ipc.ping.interval 14000. This configuration means that the IPC Client (DataNode, in this case) should timeout in 14000 seconds if the Standby NameNode does not respond to a sendHeartbeat. What we observe is this: If the Standby NameNode happens to reboot for any reason, the DataNodes that are heartbeating to this Standby get stuck forever while trying to sendHeartbeat. See Stack trace included below. When the Standby NameNode comes back up, we find that the DataNode never re-registers with the Standby NameNode. Thereafter failover completely fails. The desired behavior is that the DataNode's sendHeartbeat should timeout in 14 seconds, and keep retrying till the Standby NameNode comes back up. When it does, the DataNode should reconnect, re-register, and offer service. Specifically, in the class DatanodeProtocolClientSideTranslatorPB.java, the method createNamenode should use RPC.getProtocolProxy and not RPC.getProxy to create the DatanodeProtocolPB object. Stack trace of thread stuck in the DataNode after the Standby NN has rebooted: Thread 25 (DataNode: [file:///opt/hadoop/data] heartbeating to vmhost6-vm1/10.10.10.151:8020): State: WAITING Blocked count: 23843 Waited count: 45676 Waiting on org.apache.hadoop.ipc.Client$Call@305ab6c5 Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.hadoop.ipc.Client.call(Client.java:1220) org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) sun.proxy.$Proxy10.sendHeartbeat(Unknown Source) sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) java.lang.reflect.Method.invoke(Method.java:597) org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) sun.proxy.$Proxy10.sendHeartbeat(Unknown Source) org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:167) org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:445) org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525) org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676) java.lang.Thread.run(Thread.java:662) DataNode RPC to Standby NameNode never times out. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5889) When rolling upgrade is in progress, standby NN should create checkpoint for downgrade.
[ https://issues.apache.org/jira/browse/HDFS-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo (Nicholas), SZE updated HDFS-5889: - Attachment: h5889_20140212b.patch h5889_20140212b.patch: adds NameNodeFile.IMAGE_ROLLBACK. When rolling upgrade is in progress, standby NN should create checkpoint for downgrade. --- Key: HDFS-5889 URL: https://issues.apache.org/jira/browse/HDFS-5889 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h5889_20140211.patch, h5889_20140212b.patch After rolling upgrade is started and checkpoint is disabled, the edit log may grow to a huge size. It is not a problem if rolling upgrade is finalized normally since NN keeps the current state in memory and it writes a new checkpoint during finalize. However, it is a problem if admin decides to downgrade. It could take a long time to apply edit log. Rollback does not have such problem. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5929) Add Block pool % usage to HDFS federated nn page
[ https://issues.apache.org/jira/browse/HDFS-5929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899154#comment-13899154 ] Hudson commented on HDFS-5929: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1696 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1696/]) HDFS-5929. Add blockpool % usage to HDFS federated nn page. Contributed by Siqi Li. (suresh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567411) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/ClusterJspHelper.java Add Block pool % usage to HDFS federated nn page Key: HDFS-5929 URL: https://issues.apache.org/jira/browse/HDFS-5929 Project: Hadoop HDFS Issue Type: Improvement Components: federation Affects Versions: 2.0.0-alpha Reporter: Siqi Li Assignee: Siqi Li Fix For: 2.4.0 Attachments: HDFS-5929.v1.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-4858) HDFS DataNode to NameNode RPC should timeout
[ https://issues.apache.org/jira/browse/HDFS-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899155#comment-13899155 ] Hudson commented on HDFS-4858: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1696 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1696/]) HDFS-4858. HDFS DataNode to NameNode RPC should timeout. Contributed by Henry Wang. (shv: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567535) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/DatanodeProtocolClientSideTranslatorPB.java HDFS DataNode to NameNode RPC should timeout Key: HDFS-4858 URL: https://issues.apache.org/jira/browse/HDFS-4858 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 3.0.0, 2.1.0-beta, 2.0.4-alpha, 2.0.5-alpha Environment: Redhat/CentOS 6.4 64 bit Linux Reporter: Jagane Sundar Assignee: Henry Wang Priority: Minor Fix For: 2.4.0 Attachments: HDFS-4858.patch, HDFS-4858.patch, HDFS-4858.patch The DataNode is configured with ipc.client.ping false and ipc.ping.interval 14000. This configuration means that the IPC Client (DataNode, in this case) should timeout in 14000 seconds if the Standby NameNode does not respond to a sendHeartbeat. What we observe is this: If the Standby NameNode happens to reboot for any reason, the DataNodes that are heartbeating to this Standby get stuck forever while trying to sendHeartbeat. See Stack trace included below. When the Standby NameNode comes back up, we find that the DataNode never re-registers with the Standby NameNode. Thereafter failover completely fails. The desired behavior is that the DataNode's sendHeartbeat should timeout in 14 seconds, and keep retrying till the Standby NameNode comes back up. When it does, the DataNode should reconnect, re-register, and offer service. Specifically, in the class DatanodeProtocolClientSideTranslatorPB.java, the method createNamenode should use RPC.getProtocolProxy and not RPC.getProxy to create the DatanodeProtocolPB object. Stack trace of thread stuck in the DataNode after the Standby NN has rebooted: Thread 25 (DataNode: [file:///opt/hadoop/data] heartbeating to vmhost6-vm1/10.10.10.151:8020): State: WAITING Blocked count: 23843 Waited count: 45676 Waiting on org.apache.hadoop.ipc.Client$Call@305ab6c5 Stack: java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:485) org.apache.hadoop.ipc.Client.call(Client.java:1220) org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) sun.proxy.$Proxy10.sendHeartbeat(Unknown Source) sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) java.lang.reflect.Method.invoke(Method.java:597) org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164) org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83) sun.proxy.$Proxy10.sendHeartbeat(Unknown Source) org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:167) org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:445) org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525) org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676) java.lang.Thread.run(Thread.java:662) DataNode RPC to Standby NameNode never times out. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5931) Potential bugs and improvements for exception handlers
[ https://issues.apache.org/jira/browse/HDFS-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ding Yuan updated HDFS-5931: Attachment: hdfs-5931-v3.patch A new patch fixing the test error. The reason hdfs-5931-v2.patch broke the test is that TestNNWithQJM#testMismatchedNNIsRejected() expects the exception message Unable to start log segment 1: too few journals thrown by FSEditLog.startLogSegment. Now this too few journals will be detected earlier by recoverUnclosedStreams, and therefore aborts earlier. This seems to be more reasonable as if we cannot even find enough journal in recoverUnclosedStreams, we should not proceed to startLogSegment and wait until then to reject this case. Potential bugs and improvements for exception handlers -- Key: HDFS-5931 URL: https://issues.apache.org/jira/browse/HDFS-5931 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 2.2.0 Reporter: Ding Yuan Attachments: hdfs-5931-v2.patch, hdfs-5931-v3.patch, hdfs-5931.patch This is to report some improvements and potential bug fixes to some error handling code. Also attaching a patch for review. Details in the first comment. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5931) Potential bugs and improvements for exception handlers
[ https://issues.apache.org/jira/browse/HDFS-5931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899325#comment-13899325 ] Hadoop QA commented on HDFS-5931: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12628500/hdfs-5931-v3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.TestClientReportBadBlock {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6123//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6123//console This message is automatically generated. Potential bugs and improvements for exception handlers -- Key: HDFS-5931 URL: https://issues.apache.org/jira/browse/HDFS-5931 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 2.2.0 Reporter: Ding Yuan Attachments: hdfs-5931-v2.patch, hdfs-5931-v3.patch, hdfs-5931.patch This is to report some improvements and potential bug fixes to some error handling code. Also attaching a patch for review. Details in the first comment. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5935) New Namenode UI FS browser should throw smarter error messages
[ https://issues.apache.org/jira/browse/HDFS-5935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899341#comment-13899341 ] Steve Loughran commented on HDFS-5935: -- # the FS browser should be able to look at hdfs-site.xml and know whether or not webhdfs is enabled. # is the browser doing loopback or public IP address GETs? If the latter, there's a whole other set of causes (iptables, hostname wrong, etc). A wiki entry is the only way to list these maintain that list New Namenode UI FS browser should throw smarter error messages -- Key: HDFS-5935 URL: https://issues.apache.org/jira/browse/HDFS-5935 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0 Reporter: Travis Thompson Assignee: Travis Thompson Priority: Minor When browsing using the new FS browser in the namenode, if I try to browse a folder that I don't have permission to view, it throws the error: {noformat} Failed to retreive data from /webhdfs/v1/system?op=LISTSTATUS, cause: Forbidden WebHDFS might be disabled. WebHDFS is required to browse the filesystem. {noformat} The reason I'm not allowed to see /system is because I don't have permission, not because WebHDFS is disabled. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5879) Some TestHftpFileSystem tests do not close streams
[ https://issues.apache.org/jira/browse/HDFS-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-5879: -- Summary: Some TestHftpFileSystem tests do not close streams (was: TestHftpFileSystem tests do not close streams) Some TestHftpFileSystem tests do not close streams -- Key: HDFS-5879 URL: https://issues.apache.org/jira/browse/HDFS-5879 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Gera Shegalov Fix For: 2.3.0 Attachments: HDFS-5879.v01.patch FSDataInputStream should be closed once no longer needed for reading in testFileNameEncoding and testSeek. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5879) TestHftpFileSystem tests do not close streams
[ https://issues.apache.org/jira/browse/HDFS-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-5879: -- Summary: TestHftpFileSystem tests do not close streams (was: TestHftpFileSystem: testFileNameEncoding and testSeek leak open fsdis) TestHftpFileSystem tests do not close streams - Key: HDFS-5879 URL: https://issues.apache.org/jira/browse/HDFS-5879 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Gera Shegalov Fix For: 2.3.0 Attachments: HDFS-5879.v01.patch FSDataInputStream should be closed once no longer needed for reading in testFileNameEncoding and testSeek. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5879) Some TestHftpFileSystem tests do not close streams
[ https://issues.apache.org/jira/browse/HDFS-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-5879: -- Assignee: Gera Shegalov Some TestHftpFileSystem tests do not close streams -- Key: HDFS-5879 URL: https://issues.apache.org/jira/browse/HDFS-5879 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Gera Shegalov Assignee: Gera Shegalov Fix For: 2.3.0 Attachments: HDFS-5879.v01.patch FSDataInputStream should be closed once no longer needed for reading in testFileNameEncoding and testSeek. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5879) Some TestHftpFileSystem tests do not close streams
[ https://issues.apache.org/jira/browse/HDFS-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas updated HDFS-5879: -- Resolution: Fixed Fix Version/s: (was: 2.3.0) 2.4.0 Target Version/s: 2.4.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I have merged the patch to trunk and branch-2. Thank you [~jira.shegalov]! Some TestHftpFileSystem tests do not close streams -- Key: HDFS-5879 URL: https://issues.apache.org/jira/browse/HDFS-5879 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Gera Shegalov Assignee: Gera Shegalov Fix For: 2.4.0 Attachments: HDFS-5879.v01.patch FSDataInputStream should be closed once no longer needed for reading in testFileNameEncoding and testSeek. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5879) Some TestHftpFileSystem tests do not close streams
[ https://issues.apache.org/jira/browse/HDFS-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899372#comment-13899372 ] Hudson commented on HDFS-5879: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5156 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5156/]) HDFS-5879. Some TestHftpFileSystem tests do not close streams. Contributed by Gera Shegalov. (suresh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567704) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/web/TestHftpFileSystem.java Some TestHftpFileSystem tests do not close streams -- Key: HDFS-5879 URL: https://issues.apache.org/jira/browse/HDFS-5879 Project: Hadoop HDFS Issue Type: Bug Components: test Reporter: Gera Shegalov Assignee: Gera Shegalov Fix For: 2.4.0 Attachments: HDFS-5879.v01.patch FSDataInputStream should be closed once no longer needed for reading in testFileNameEncoding and testSeek. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Reopened] (HDFS-5621) NameNode: add indicator in web UI file system browser if a file has an ACL.
[ https://issues.apache.org/jira/browse/HDFS-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth reopened HDFS-5621: - I'm reopening this, because after HDFS-5923, the web UI will no longer get the '+' indicator for free via {{FsPermission#toString}}. NameNode: add indicator in web UI file system browser if a file has an ACL. --- Key: HDFS-5621 URL: https://issues.apache.org/jira/browse/HDFS-5621 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Change the file system browser to append the '+' character to permissions of any file or directory that has an ACL. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5935) New Namenode UI FS browser should throw smarter error messages
[ https://issues.apache.org/jira/browse/HDFS-5935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899415#comment-13899415 ] Haohui Mai commented on HDFS-5935: -- The FileSystem browser is built on the client side, thus it cannot check out the values of hdfs-site.xml directly. The web UI and webhdfs are running on the same server at the same port, it is very likely you can access both or you can access neither of them. Therefore I think it is good enough to checking whether the response is 404, but having a wiki entry would be definitely helpful for cases like SPNEGO / security, etc. New Namenode UI FS browser should throw smarter error messages -- Key: HDFS-5935 URL: https://issues.apache.org/jira/browse/HDFS-5935 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0 Reporter: Travis Thompson Assignee: Travis Thompson Priority: Minor When browsing using the new FS browser in the namenode, if I try to browse a folder that I don't have permission to view, it throws the error: {noformat} Failed to retreive data from /webhdfs/v1/system?op=LISTSTATUS, cause: Forbidden WebHDFS might be disabled. WebHDFS is required to browse the filesystem. {noformat} The reason I'm not allowed to see /system is because I don't have permission, not because WebHDFS is disabled. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5889) When rolling upgrade is in progress, standby NN should create checkpoint for downgrade.
[ https://issues.apache.org/jira/browse/HDFS-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899437#comment-13899437 ] Jing Zhao commented on HDFS-5889: - The new patch looks good to me. One question is that the current patch calls purgeOldStorage(NameNodeFile.IMAGE_ROLLBACK) when finalizing rolling upgrade. Looks like this method will still retain the IMAGE_ROLLBACK checkpoint (by default, = 2 ckpts)? Do we want to make sure the IMAGE_ROLLBACK checkpoint gets purged here? When rolling upgrade is in progress, standby NN should create checkpoint for downgrade. --- Key: HDFS-5889 URL: https://issues.apache.org/jira/browse/HDFS-5889 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h5889_20140211.patch, h5889_20140212b.patch After rolling upgrade is started and checkpoint is disabled, the edit log may grow to a huge size. It is not a problem if rolling upgrade is finalized normally since NN keeps the current state in memory and it writes a new checkpoint during finalize. However, it is a problem if admin decides to downgrade. It could take a long time to apply edit log. Rollback does not have such problem. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5810) Unify mmap cache and short-circuit file descriptor cache
[ https://issues.apache.org/jira/browse/HDFS-5810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899451#comment-13899451 ] Hudson commented on HDFS-5810: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5157 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5157/]) HDFS-5810. Unify mmap cache and short-circuit file descriptor cache (cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567720) * /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Waitable.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderFactory.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocalLegacy.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/ClientContext.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DomainSocketFactory.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/FileInputStreamCache.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/PeerCache.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemotePeerFactory.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/client/ClientMmap.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/client/ClientMmapManager.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/client/ShortCircuitCache.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/client/ShortCircuitReplica.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/client/ShortCircuitReplicaInfo.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/common/JspHelper.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockMetadataHeader.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/fs/TestEnhancedByteBufferAccess.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/DFSTestUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderFactory.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestBlockReaderLocal.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestConnCache.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferKeepalive.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDisableConnCache.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestFileInputStreamCache.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitCache.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockTokenWithDFS.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java Unify mmap
[jira] [Updated] (HDFS-5932) Ls should display the ACL bit
[ https://issues.apache.org/jira/browse/HDFS-5932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5932: Attachment: HDFS-5932.2.patch I'm attaching patch version 2 to address the additional case I described in the last comment. Ls should display the ACL bit - Key: HDFS-5932 URL: https://issues.apache.org/jira/browse/HDFS-5932 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode, security Reporter: Haohui Mai Assignee: Chris Nauroth Attachments: HDFS-5932.1.patch, HDFS-5932.2.patch Based on the discussion of HDFS-5923, the ACL bit is no longer passed to the client directly. Ls should call {{getAclStatus()}} instead since it needs to display the ACL bit as a part of the permission. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5938) Make BlockReaderFactory#BlockReaderPeer a static class
[ https://issues.apache.org/jira/browse/HDFS-5938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5938: --- Attachment: HDFS-5938.001.patch Make BlockReaderFactory#BlockReaderPeer a static class -- Key: HDFS-5938 URL: https://issues.apache.org/jira/browse/HDFS-5938 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Trivial Attachments: HDFS-5938.001.patch Make BlockReaderFactory#BlockReaderPeer a static class to avoid a findbugs warning. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5938) Make BlockReaderFactory#BlockReaderPeer a static class
Colin Patrick McCabe created HDFS-5938: -- Summary: Make BlockReaderFactory#BlockReaderPeer a static class Key: HDFS-5938 URL: https://issues.apache.org/jira/browse/HDFS-5938 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Trivial Attachments: HDFS-5938.001.patch Make BlockReaderFactory#BlockReaderPeer a static class to avoid a findbugs warning. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5938) Make BlockReaderFactory#BlockReaderPeer a static class
[ https://issues.apache.org/jira/browse/HDFS-5938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5938: --- Status: Patch Available (was: Open) Make BlockReaderFactory#BlockReaderPeer a static class -- Key: HDFS-5938 URL: https://issues.apache.org/jira/browse/HDFS-5938 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Trivial Attachments: HDFS-5938.001.patch Make BlockReaderFactory#BlockReaderPeer a static class to avoid a findbugs warning. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5847) Consolidate INodeReference into a separate section
[ https://issues.apache.org/jira/browse/HDFS-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899476#comment-13899476 ] Haohui Mai commented on HDFS-5847: -- The patch looks good. nit: there is one trailing whitespace in the patch: {code} + INodeReferenceSection.INodeReference.Builder rb = + INodeReferenceSection.INodeReference.newBuilder(). +setReferredId(ref.getId()); {code} +1 after addressing it. Consolidate INodeReference into a separate section -- Key: HDFS-5847 URL: https://issues.apache.org/jira/browse/HDFS-5847 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 3.0.0 Reporter: Haohui Mai Assignee: Jing Zhao Attachments: HDFS-5847.000.patch, HDFS-5847.001.patch Currently each INodeDirectorySection.Entry contains variable numbers of INodeReference entries. The INodeReference entries are inlined, therefore it is difficult to quickly navigate through a INodeDirectorySection.Entry. Skipping through a INodeDirectorySection.Entry without parsing is essential to parse these entries in parallel. This jira proposes to consolidate INodeReferences into a section and give each of them an ID. The INodeDirectorySection.Entry can store the list of the IDs as a repeated field. That way we can leverage the existing code in protobuf to quickly skip through a INodeDirectorySection.Entry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5939) WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster
Yongjun Zhang created HDFS-5939: --- Summary: WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster Key: HDFS-5939 URL: https://issues.apache.org/jira/browse/HDFS-5939 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.3.0 Reporter: Yongjun Zhang When trying to access hdfs via webhdfs, and when datanode is dead, user will see an exception below without any clue that it's caused by dead datanode: $ curl -i -X PUT .../webhdfs/v1/t1?op=CREATEuser.name=userNameoverwrite=false ... {RemoteException:{exception:IllegalArgumentException,javaClassName:java.lang.IllegalArgumentException,message:n must be positive}} Need to fix the report to give user hint about dead datanode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5822) InterruptedException to thread sleep ignored
[ https://issues.apache.org/jira/browse/HDFS-5822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899479#comment-13899479 ] Haohui Mai commented on HDFS-5822: -- You can't log in an OOM situation since logging requires buffers. I fail to see the needs of loggings all {{InterruptedException}} of the services in the DataNode side -- I think they are only interrupted during shutdown, it would be quite confusing for the users to see these log entries which are benign. InterruptedException to thread sleep ignored Key: HDFS-5822 URL: https://issues.apache.org/jira/browse/HDFS-5822 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.2.0 Reporter: Ding Yuan Attachments: hdfs-5822.patch In org/apache/hadoop/hdfs/server/datanode/DataXceiverServer.java, there is the following code snippet in the run() method: {noformat} 156: } catch (OutOfMemoryError ie) { 157:IOUtils.cleanup(null, peer); 158:// DataNode can run out of memory if there is too many transfers. 159: // Log the event, Sleep for 30 seconds, other transfers may complete by 160:// then. 161:LOG.warn(DataNode is out of memory. Will retry in 30 seconds., ie); 162:try { 163: Thread.sleep(30 * 1000); 164:} catch (InterruptedException e) { 165: // ignore 166:} 167: } {noformat} Note that InterruptedException is completely ignored. This might not be safe since any potential events that lead to InterruptedException are lost? More info on why InterruptedException shouldn't be ignored: http://stackoverflow.com/questions/1087475/when-does-javas-thread-sleep-throw-interruptedexception Thanks, Ding -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5847) Consolidate INodeReference into a separate section
[ https://issues.apache.org/jira/browse/HDFS-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5847: Attachment: HDFS-5847.002.patch Remove the trailing whitespace and move the buildInodeReference method into FSImageFormatPBSnapshot as a private method. Consolidate INodeReference into a separate section -- Key: HDFS-5847 URL: https://issues.apache.org/jira/browse/HDFS-5847 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 3.0.0 Reporter: Haohui Mai Assignee: Jing Zhao Attachments: HDFS-5847.000.patch, HDFS-5847.001.patch, HDFS-5847.002.patch Currently each INodeDirectorySection.Entry contains variable numbers of INodeReference entries. The INodeReference entries are inlined, therefore it is difficult to quickly navigate through a INodeDirectorySection.Entry. Skipping through a INodeDirectorySection.Entry without parsing is essential to parse these entries in parallel. This jira proposes to consolidate INodeReferences into a section and give each of them an ID. The INodeDirectorySection.Entry can store the list of the IDs as a repeated field. That way we can leverage the existing code in protobuf to quickly skip through a INodeDirectorySection.Entry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5932) Ls should display the ACL bit
[ https://issues.apache.org/jira/browse/HDFS-5932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899493#comment-13899493 ] Haohui Mai commented on HDFS-5932: -- {code} +if (aclNotSupportedFsSet.contains(fs.getUri())) { + // This FileSystem failed to run the ACL API in an earlier iteration. + return false; +} +try { + return !fs.getAclStatus(item.path).getEntries().isEmpty(); +} catch (RemoteException e) { + // If this is a RpcNoSuchMethodException, then the client is connected to + // an older NameNode that doesn't support ACLs. Keep going. + IOException e2 = e.unwrapRemoteException(RpcNoSuchMethodException.class); + if (!(e2 instanceof RpcNoSuchMethodException)) { +throw e; + } +} catch (IOException e) { + // The NameNode supports ACLs, but they are not enabled. Keep going. + String message = e.getMessage(); + if (message != null !message.contains(ACLs has been disabled)) { +throw e; + } +} catch (UnsupportedOperationException e) { + // The underlying FileSystem doesn't implement ACLs. Keep going. +} +// Remember that this FileSystem cannot support ACLs. +aclNotSupportedFsSet.add(fs.getUri()); +return false; {code} This method is a little bit confusing. Can you just catch the {{RpcNoSuchMethodException}}: {code} try { getFileStatus(); } catch (RpcNoSuchMethodException) { unsupportedFs.add(...); } return false; {code} I also wonder whether it is possible to cache the fs object directly instead. Do you want to add a unit test to make sure that ls works as expected when {{RpcNoSuchMethodException}} is thrown? Ls should display the ACL bit - Key: HDFS-5932 URL: https://issues.apache.org/jira/browse/HDFS-5932 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode, security Reporter: Haohui Mai Assignee: Chris Nauroth Attachments: HDFS-5932.1.patch, HDFS-5932.2.patch Based on the discussion of HDFS-5923, the ACL bit is no longer passed to the client directly. Ls should call {{getAclStatus()}} instead since it needs to display the ACL bit as a part of the permission. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HDFS-5939) WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster
[ https://issues.apache.org/jira/browse/HDFS-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang reassigned HDFS-5939: --- Assignee: Yongjun Zhang WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster Key: HDFS-5939 URL: https://issues.apache.org/jira/browse/HDFS-5939 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.3.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang When trying to access hdfs via webhdfs, and when datanode is dead, user will see an exception below without any clue that it's caused by dead datanode: $ curl -i -X PUT .../webhdfs/v1/t1?op=CREATEuser.name=userNameoverwrite=false ... {RemoteException:{exception:IllegalArgumentException,javaClassName:java.lang.IllegalArgumentException,message:n must be positive}} Need to fix the report to give user hint about dead datanode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5939) WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster
[ https://issues.apache.org/jira/browse/HDFS-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899499#comment-13899499 ] Haohui Mai commented on HDFS-5939: -- Can you try whether HDFS-5891 fix this issue? WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster Key: HDFS-5939 URL: https://issues.apache.org/jira/browse/HDFS-5939 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.3.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang When trying to access hdfs via webhdfs, and when datanode is dead, user will see an exception below without any clue that it's caused by dead datanode: $ curl -i -X PUT .../webhdfs/v1/t1?op=CREATEuser.name=userNameoverwrite=false ... {RemoteException:{exception:IllegalArgumentException,javaClassName:java.lang.IllegalArgumentException,message:n must be positive}} Need to fix the report to give user hint about dead datanode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5889) When rolling upgrade is in progress, standby NN should create checkpoint for downgrade.
[ https://issues.apache.org/jira/browse/HDFS-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5889: Attachment: h5889_20140212c.patch Upload a patch to make this trivial change. When rolling upgrade is in progress, standby NN should create checkpoint for downgrade. --- Key: HDFS-5889 URL: https://issues.apache.org/jira/browse/HDFS-5889 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h5889_20140211.patch, h5889_20140212b.patch, h5889_20140212c.patch After rolling upgrade is started and checkpoint is disabled, the edit log may grow to a huge size. It is not a problem if rolling upgrade is finalized normally since NN keeps the current state in memory and it writes a new checkpoint during finalize. However, it is a problem if admin decides to downgrade. It could take a long time to apply edit log. Rollback does not have such problem. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5621) NameNode: add indicator in web UI file system browser if a file has an ACL.
[ https://issues.apache.org/jira/browse/HDFS-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899525#comment-13899525 ] Haohui Mai commented on HDFS-5621: -- I wonder whether it is possible to display the ACL information when the user clicks to check the detail information of a file. That way the UI only makes getAclStatus() calls when needed. NameNode: add indicator in web UI file system browser if a file has an ACL. --- Key: HDFS-5621 URL: https://issues.apache.org/jira/browse/HDFS-5621 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Change the file system browser to append the '+' character to permissions of any file or directory that has an ACL. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5891) webhdfs should not try connecting the DN during redirection
[ https://issues.apache.org/jira/browse/HDFS-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899531#comment-13899531 ] Jing Zhao commented on HDFS-5891: - The patch needs some rebase (JspHelper.java misses one imports). Besides, # looks like we do not need the configuration parameter any more for bestNodes methods # We may still want the following check. Although nodes should not be null in the current code, it may be better to still do this check. {code} -if (nodes == null || nodes.length == 0) { - throw new IOException(No nodes contain this block); {code} +1 after addressing the comments. webhdfs should not try connecting the DN during redirection --- Key: HDFS-5891 URL: https://issues.apache.org/jira/browse/HDFS-5891 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5891.000.patch When the webhdfs server in NN serves an {{OPEN}} request, the NN will eventually redirect the request to a DN. The current implementation intends to choose the active DNs. The code always connects to the DN in a deterministic order to see whether it is active during redirection. Although it reduces the chance of the client from connecting to a failed DN, this is problematic because: # It has no guarantees that the client can connect to that DN even if the NN can connect to it. # It requires an additional network round-trip for every {{OPEN}} / {{CREATE}} request. This jira proposes that the NN should choose the DN based on the information of the data node manager. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5621) NameNode: add indicator in web UI file system browser if a file has an ACL.
[ https://issues.apache.org/jira/browse/HDFS-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899536#comment-13899536 ] Chris Nauroth commented on HDFS-5621: - bq. I wonder whether it is possible to display the ACL information when the user clicks to check the detail information of a file. That sounds like a reasonable approach to me. Thanks, Haohui! NameNode: add indicator in web UI file system browser if a file has an ACL. --- Key: HDFS-5621 URL: https://issues.apache.org/jira/browse/HDFS-5621 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Change the file system browser to append the '+' character to permissions of any file or directory that has an ACL. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5621) NameNode: add indicator in web UI file system browser if a file has an ACL.
[ https://issues.apache.org/jira/browse/HDFS-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5621: Assignee: Haohui Mai NameNode: add indicator in web UI file system browser if a file has an ACL. --- Key: HDFS-5621 URL: https://issues.apache.org/jira/browse/HDFS-5621 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Haohui Mai Change the file system browser to append the '+' character to permissions of any file or directory that has an ACL. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5585) Provide admin commands for data node upgrade
[ https://issues.apache.org/jira/browse/HDFS-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899560#comment-13899560 ] Kihwal Lee commented on HDFS-5585: -- Here is what the patch does: For the immediately visible changes by users, two new DFSAdmin commands: shutdownDatanode and getDatanodeInfo. shutdownDatanode can be called with the upgrade option to make the datanode to do required prep before upgrade. This includes sending an OOB Ack to all writers and saving some state for quick restart (to be added by other jira). The getDatanodeInfo command can be used as a datanode liveness check as well as upgrade completeness check, as it shows the version of running software config and the uptime. The shoutdown command will be hooked up to the OOB ack sending in HDFS-5583. Provide admin commands for data node upgrade Key: HDFS-5585 URL: https://issues.apache.org/jira/browse/HDFS-5585 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode, ha, hdfs-client, namenode Reporter: Kihwal Lee Assignee: Kihwal Lee Attachments: HDFS-5585.patch, HDFS-5585.patch, HDFS-5585.patch Several new methods to ClientDatanodeProtocol may need to be added to support querying version, initiating upgrade, etc. The admin CLI needs to be added as well. This primary use case is for rolling upgrade, but this can be used for preparing for a graceful restart of a data node for any reasons. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5891) webhdfs should not try connecting the DN during redirection
[ https://issues.apache.org/jira/browse/HDFS-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5891: - Attachment: HDFS-5891.001.patch The v1 patch addresses Jing's comments. webhdfs should not try connecting the DN during redirection --- Key: HDFS-5891 URL: https://issues.apache.org/jira/browse/HDFS-5891 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5891.000.patch, HDFS-5891.001.patch When the webhdfs server in NN serves an {{OPEN}} request, the NN will eventually redirect the request to a DN. The current implementation intends to choose the active DNs. The code always connects to the DN in a deterministic order to see whether it is active during redirection. Although it reduces the chance of the client from connecting to a failed DN, this is problematic because: # It has no guarantees that the client can connect to that DN even if the NN can connect to it. # It requires an additional network round-trip for every {{OPEN}} / {{CREATE}} request. This jira proposes that the NN should choose the DN based on the information of the data node manager. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5923) Do not persist the ACL bit in the FsPermission
[ https://issues.apache.org/jira/browse/HDFS-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899571#comment-13899571 ] Haohui Mai commented on HDFS-5923: -- The version has been bumped as a part of HDFS-5914: {code} PROTOBUF_FORMAT(-52, Use protobuf to serialize FSImage), EXTENDED_ACL(-53, Extended ACL), RESERVED_REL2_4_0(-54, -51, Reserved for release 2.4.0, true, PROTOBUF_FORMAT, EXTENDED_ACL); {code} Do not persist the ACL bit in the FsPermission -- Key: HDFS-5923 URL: https://issues.apache.org/jira/browse/HDFS-5923 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode, security Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5923.000.patch, HDFS-5923.001.patch, HDFS-5923.002.patch The current implementation persists and ACL bit in FSImage and editlogs. Moreover, the security decisions also depend on whether the bit is set. The problem here is that we have to maintain the implicit invariant, which is the ACL bit is set if and only if the the inode has AclFeature. The invariant has to be maintained everywhere otherwise it can lead to a security vulnerability. In the worst case, an attacker can toggle the bit and bypass the ACL checks. The jira proposes to treat the ACL bit as a transient bit. The bit should not be persisted onto the disk, neither it should affect any security decisions. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5583) Make DN send an OOB Ack on shutdown before restaring
[ https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899574#comment-13899574 ] Kihwal Lee commented on HDFS-5583: -- This patch triggers sending of the restart OOB ack to clients who are currently writing data. The shutdown ordering and timing have been adjusted to give enough time for DataXceiver threads (serving writes) to send the restart OOB ack upstream. First, DataXceiverServer is interrupted and in turn each DataXceiver threads are interrupted by it after closing the server socket to prevent further client connections. Idling DataXceiver threads due to keepalive will simply terminate. If {{DataNode#restarting}} is set, the OOB ack will be directly sent by these threads before taking down the packet responder threads. If the packet responder is in the middle of sending an ack, it can be blocked up to a configured amount of time before failing, which is 1.5 seconds by default. If they started sending but send takes a long time (e.g. slow client, network issue, etc.), they will get interrupted by DataXceiverServer in 2 seconds. DataXceiverServer will tear down sooner if all DataXceiver threads finish less than 2 seconds. The IPC server is stopped later in order to minimize the chance of shutdownDatanode() response being dropped. The shutdown method will only start interrupting the thread pool after a few seconds have passed since the DataXceiverServer interruption. By this time, all threads must have stopped, but if anyone didn't, they will get interrupted repeatedly. This is an existing behavior. The main DataNode thread joins on BP service threads. There was a fixed 2 second sleep, which has been changed to only wait until the shutdown is done. If the BP service threads terminated but shutdown() was not called, main thread will delay the exit for 2 seconds as it did before. This patch does not include the client-side changes, so the OOB ack will not have any visible effects. It will be treated as a node failure, which also happens when a datanode shuts down. Make DN send an OOB Ack on shutdown before restaring Key: HDFS-5583 URL: https://issues.apache.org/jira/browse/HDFS-5583 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Kihwal Lee Assignee: Kihwal Lee Attachments: HDFS-5583.patch, HDFS-5583.patch Add an ability for data nodes to send an OOB response in order to indicate an upcoming upgrade-restart. Client should ignore the pipeline error from the node for a configured amount of time and try reconstruct the pipeline without excluding the restarted node. If the node does not come back in time, regular pipeline recovery should happen. This feature is useful for the applications with a need to keep blocks local. If the upgrade-restart is fast, the wait is preferable to losing locality. It could also be used in general instead of the draining-writer strategy. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5898) Allow NFS gateway to login/relogin from its kerberos keytab
[ https://issues.apache.org/jira/browse/HDFS-5898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899576#comment-13899576 ] Abin Shahab commented on HDFS-5898: --- Hi ~atm, ~brandon, I would like to first commit the doc change. I agree with you both that we must add the keytab capabilities. Allow NFS gateway to login/relogin from its kerberos keytab --- Key: HDFS-5898 URL: https://issues.apache.org/jira/browse/HDFS-5898 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Affects Versions: 2.2.0, 2.4.0 Reporter: Jing Zhao Assignee: Abin Shahab Attachments: HDFS-5898-documentation.patch, HDFS-5898-documentation.patch According to the discussion in HDFS-5804: 1. The NFS gateway should be able to get it's own tgts, and renew them. 2. We should update the HdfsNfsGateway.apt.vm -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5932) Ls should display the ACL bit
[ https://issues.apache.org/jira/browse/HDFS-5932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5932: Attachment: HDFS-5932.3.patch Thanks for the review, Haohui. I'm attaching patch version 3. bq. Can you just catch the {{RpcNoSuchMethodException}}? To be able to catch it directly, {{DFSClient}} must do the unwrapping, so I've included that change. bq. I also wonder whether it is possible to cache the fs object directly instead. I considered this, but it would put the logic at risk of failing for {{FileSystem}} subclasses that have a misbehaving {{hashCode}} or {{equals}}. We don't define these in the base class, and I'm not aware of any requirement we've placed on custom implementations that they must override them. URI on the other hand is already used in the {{FileSystem}} cache key, so we already have an implicit assumption in the code (for better or worse) that subclasses must return a reasonable URI. Ls should display the ACL bit - Key: HDFS-5932 URL: https://issues.apache.org/jira/browse/HDFS-5932 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode, security Reporter: Haohui Mai Assignee: Chris Nauroth Attachments: HDFS-5932.1.patch, HDFS-5932.2.patch, HDFS-5932.3.patch Based on the discussion of HDFS-5923, the ACL bit is no longer passed to the client directly. Ls should call {{getAclStatus()}} instead since it needs to display the ACL bit as a part of the permission. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5934) New Namenode UI back button doesn't work as expected
[ https://issues.apache.org/jira/browse/HDFS-5934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Travis Thompson updated HDFS-5934: -- Attachment: HDFS-5934-1.patch Added handler to watch for hash changes so that the back button works correctly. New Namenode UI back button doesn't work as expected Key: HDFS-5934 URL: https://issues.apache.org/jira/browse/HDFS-5934 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0 Reporter: Travis Thompson Assignee: Travis Thompson Priority: Minor Attachments: HDFS-5934-1.patch When I navigate to the Namenode page, and I click on the Datanodes tab, it will take me to the Datanodes page. If I click my browser back button, it does not take me back to the overview page as one would expect. This is true of choosing any tab. Another example of the back button acting weird is when browsing HDFS, if I click back one page, either the previous directory I was viewing, or the page I was viewing before entering the FS browser. Instead I am always taken back to the previous page I was viewing before entering the FS browser. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5934) New Namenode UI back button doesn't work as expected
[ https://issues.apache.org/jira/browse/HDFS-5934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Travis Thompson updated HDFS-5934: -- Status: Patch Available (was: Open) New Namenode UI back button doesn't work as expected Key: HDFS-5934 URL: https://issues.apache.org/jira/browse/HDFS-5934 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0 Reporter: Travis Thompson Assignee: Travis Thompson Priority: Minor Attachments: HDFS-5934-1.patch When I navigate to the Namenode page, and I click on the Datanodes tab, it will take me to the Datanodes page. If I click my browser back button, it does not take me back to the overview page as one would expect. This is true of choosing any tab. Another example of the back button acting weird is when browsing HDFS, if I click back one page, either the previous directory I was viewing, or the page I was viewing before entering the FS browser. Instead I am always taken back to the previous page I was viewing before entering the FS browser. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5898) Allow NFS gateway to login/relogin from its kerberos keytab
[ https://issues.apache.org/jira/browse/HDFS-5898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899595#comment-13899595 ] Daryn Sharp commented on HDFS-5898: --- Too swamped to investigate, but the NFS gateway is using UGI, correct? If so, presumably {{UGI.loginUserFromKeytab}} method is insufficient? If yes again, is HADOOP-9317 closer to what you need? Allow NFS gateway to login/relogin from its kerberos keytab --- Key: HDFS-5898 URL: https://issues.apache.org/jira/browse/HDFS-5898 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Affects Versions: 2.2.0, 2.4.0 Reporter: Jing Zhao Assignee: Abin Shahab Attachments: HDFS-5898-documentation.patch, HDFS-5898-documentation.patch According to the discussion in HDFS-5804: 1. The NFS gateway should be able to get it's own tgts, and renew them. 2. We should update the HdfsNfsGateway.apt.vm -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5934) New Namenode UI back button doesn't work as expected
[ https://issues.apache.org/jira/browse/HDFS-5934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899607#comment-13899607 ] Haohui Mai commented on HDFS-5934: -- I've tested the patch in Chrome / Firefox / Safari on Mac OS X, it works nicely. It seems that there is one remaining issue. When I directly type in the URL, it seems that explorer.html does not recognize the directory in the hash tag. For example: {noformat} http://localhost:50070/explorer.html#/foo {noformat} Always shows the information of the root directory. Maybe we can change the explicit call of {{browse_directory()}} in {{init()}} into a callback of the hashchange event. What do you think? New Namenode UI back button doesn't work as expected Key: HDFS-5934 URL: https://issues.apache.org/jira/browse/HDFS-5934 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0 Reporter: Travis Thompson Assignee: Travis Thompson Priority: Minor Attachments: HDFS-5934-1.patch When I navigate to the Namenode page, and I click on the Datanodes tab, it will take me to the Datanodes page. If I click my browser back button, it does not take me back to the overview page as one would expect. This is true of choosing any tab. Another example of the back button acting weird is when browsing HDFS, if I click back one page, either the previous directory I was viewing, or the page I was viewing before entering the FS browser. Instead I am always taken back to the previous page I was viewing before entering the FS browser. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5810) Unify mmap cache and short-circuit file descriptor cache
[ https://issues.apache.org/jira/browse/HDFS-5810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5810: --- Resolution: Fixed Fix Version/s: 2.4.0 Target Version/s: 2.4.0 (was: ) Status: Resolved (was: Patch Available) Unify mmap cache and short-circuit file descriptor cache Key: HDFS-5810 URL: https://issues.apache.org/jira/browse/HDFS-5810 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.3.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Fix For: 2.4.0 Attachments: HDFS-5810.001.patch, HDFS-5810.004.patch, HDFS-5810.006.patch, HDFS-5810.008.patch, HDFS-5810.015.patch, HDFS-5810.016.patch, HDFS-5810.018.patch, HDFS-5810.019.patch, HDFS-5810.020.patch, HDFS-5810.021.patch, HDFS-5810.022.patch We should unify the client mmap cache and the client file descriptor cache. Since mmaps are granted corresponding to file descriptors in the cache (currently FileInputStreamCache), they have to be tracked together to do smarter things like HDFS-5182. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-5864) Missing '\n' in the output of 'hdfs oiv --help'
[ https://issues.apache.org/jira/browse/HDFS-5864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA resolved HDFS-5864. - Resolution: Cannot Reproduce Target Version/s: (was: 2.4.0) I cannot reproduce this after HDFS-5698 branch was merged. Missing '\n' in the output of 'hdfs oiv --help' --- Key: HDFS-5864 URL: https://issues.apache.org/jira/browse/HDFS-5864 Project: Hadoop HDFS Issue Type: Sub-task Components: tools Affects Versions: 2.2.0 Reporter: Akira AJISAKA Priority: Trivial Labels: newbie In OfflineImageViewer.java, {code} * NameDistribution: This processor analyzes the file names\n + in the image and prints total number of file names and how frequently + file names are reused.\n + {code} should be {code} * NameDistribution: This processor analyzes the file names\n + in the image and prints total number of file names and how frequently\n + file names are reused.\n + {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5923) Do not persist the ACL bit in the FsPermission
[ https://issues.apache.org/jira/browse/HDFS-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5923: - Attachment: HDFS-5923.002.patch The v2 patch checks the layoutversion for the editlog so that the NN can consume the old edit logs. Do not persist the ACL bit in the FsPermission -- Key: HDFS-5923 URL: https://issues.apache.org/jira/browse/HDFS-5923 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode, security Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5923.000.patch, HDFS-5923.001.patch, HDFS-5923.002.patch, HDFS-5923.002.patch The current implementation persists and ACL bit in FSImage and editlogs. Moreover, the security decisions also depend on whether the bit is set. The problem here is that we have to maintain the implicit invariant, which is the ACL bit is set if and only if the the inode has AclFeature. The invariant has to be maintained everywhere otherwise it can lead to a security vulnerability. In the worst case, an attacker can toggle the bit and bypass the ACL checks. The jira proposes to treat the ACL bit as a transient bit. The bit should not be persisted onto the disk, neither it should affect any security decisions. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5938) Make BlockReaderFactory#BlockReaderPeer a static class
[ https://issues.apache.org/jira/browse/HDFS-5938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899633#comment-13899633 ] Hadoop QA commented on HDFS-5938: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12628541/HDFS-5938.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6124//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6124//console This message is automatically generated. Make BlockReaderFactory#BlockReaderPeer a static class -- Key: HDFS-5938 URL: https://issues.apache.org/jira/browse/HDFS-5938 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Trivial Attachments: HDFS-5938.001.patch Make BlockReaderFactory#BlockReaderPeer a static class to avoid a findbugs warning. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HDFS-5940) Minor cleanups to ShortCircuitReplica, FsDatasetCache, and DomainSocketWatcher
Colin Patrick McCabe created HDFS-5940: -- Summary: Minor cleanups to ShortCircuitReplica, FsDatasetCache, and DomainSocketWatcher Key: HDFS-5940 URL: https://issues.apache.org/jira/browse/HDFS-5940 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe ShortCircuitReplica#Key and FsDatasetCache#Key are pretty much identical code and should be factored out to an external class. (There will soon be a need for a third user of such an identifier.) Another minor cleanup is that DomainSocketWatcher should not implement Thread. It contains a thread currently, but I forgot to remove the extends. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5923) Do not persist the ACL bit in the FsPermission
[ https://issues.apache.org/jira/browse/HDFS-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5923: - Attachment: HDFS-5923.003.patch Renew the v2 patch into v3 to avoid confusion Do not persist the ACL bit in the FsPermission -- Key: HDFS-5923 URL: https://issues.apache.org/jira/browse/HDFS-5923 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode, security Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5923.000.patch, HDFS-5923.001.patch, HDFS-5923.002.patch, HDFS-5923.003.patch The current implementation persists and ACL bit in FSImage and editlogs. Moreover, the security decisions also depend on whether the bit is set. The problem here is that we have to maintain the implicit invariant, which is the ACL bit is set if and only if the the inode has AclFeature. The invariant has to be maintained everywhere otherwise it can lead to a security vulnerability. In the worst case, an attacker can toggle the bit and bypass the ACL checks. The jira proposes to treat the ACL bit as a transient bit. The bit should not be persisted onto the disk, neither it should affect any security decisions. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5938) Make BlockReaderFactory#BlockReaderPeer a static class
[ https://issues.apache.org/jira/browse/HDFS-5938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899631#comment-13899631 ] Andrew Wang commented on HDFS-5938: --- +1 pending Jenkins Make BlockReaderFactory#BlockReaderPeer a static class -- Key: HDFS-5938 URL: https://issues.apache.org/jira/browse/HDFS-5938 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Trivial Attachments: HDFS-5938.001.patch Make BlockReaderFactory#BlockReaderPeer a static class to avoid a findbugs warning. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5923) Do not persist the ACL bit in the FsPermission
[ https://issues.apache.org/jira/browse/HDFS-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5923: - Attachment: (was: HDFS-5923.002.patch) Do not persist the ACL bit in the FsPermission -- Key: HDFS-5923 URL: https://issues.apache.org/jira/browse/HDFS-5923 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode, security Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5923.000.patch, HDFS-5923.001.patch, HDFS-5923.002.patch, HDFS-5923.003.patch The current implementation persists and ACL bit in FSImage and editlogs. Moreover, the security decisions also depend on whether the bit is set. The problem here is that we have to maintain the implicit invariant, which is the ACL bit is set if and only if the the inode has AclFeature. The invariant has to be maintained everywhere otherwise it can lead to a security vulnerability. In the worst case, an attacker can toggle the bit and bypass the ACL checks. The jira proposes to treat the ACL bit as a transient bit. The bit should not be persisted onto the disk, neither it should affect any security decisions. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5933) Optimize the FSImage layout for ACLs
[ https://issues.apache.org/jira/browse/HDFS-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5933: - Attachment: HDFS-5933.002.patch Rebase on the HDFS-5922 v3 patch Optimize the FSImage layout for ACLs Key: HDFS-5933 URL: https://issues.apache.org/jira/browse/HDFS-5933 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode, security Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5933.000.patch, HDFS-5933.001.patch, HDFS-5933.002.patch The current serialization of the ACLs is suboptimal. ACL entries should be serialized using the same scheme that the PB-based FSImage serializes permissions. An ACL entry is represented by a 32-bit integer in Big Endian format. The bits can be divided in four segments: [0:2) || [2:26) || [26:27) || [27:29) || [29:32) [0:2) -- reserved for futute uses. [2:26) -- the name of the entry, which is an ID that points to a string in the StringTableSection. [26:27) -- the scope of the entry (AclEntryScopeProto) [27:29) -- the type of the entry (AclEntryTypeProto) [29:32) -- the permission of the entry (FsActionProto) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-5867) Clean up the output of NameDistribution processor
[ https://issues.apache.org/jira/browse/HDFS-5867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA resolved HDFS-5867. - Resolution: Cannot Reproduce Target Version/s: (was: 2.4.0) NameDistribution processor is not supported after HDFS-5698 branch was merged. Closing this issue. Clean up the output of NameDistribution processor - Key: HDFS-5867 URL: https://issues.apache.org/jira/browse/HDFS-5867 Project: Hadoop HDFS Issue Type: Sub-task Components: tools Reporter: Akira AJISAKA Priority: Minor Labels: newbie The output of 'hdfs oiv -i INPUT -o OUTPUT -p NameDistribution' is as follows: {code} Total unique file names 86 0 names are used by 0 files between 10-13 times. Heap savings ~0 bytes. 0 names are used by 0 files between 1-9 times. Heap savings ~0 bytes. 0 names are used by 0 files between 1000- times. Heap savings ~0 bytes. 0 names are used by 0 files between 100-999 times. Heap savings ~0 bytes. 1 names are used by 13 files between 10-99 times. Heap savings ~372 bytes. 4 names are used by 34 files between 5-9 times. Heap savings ~942 bytes. 2 names are used by 8 files 4 times. Heap savings ~192 bytes. 0 names are used by 0 files 3 times. Heap savings ~0 bytes. 7 names are used by 14 files 2 times. Heap savings ~222 bytes. Total saved heap ~1728bytes. {code} 'between 10-13 times' should be 'over 9 times' , or the line starting with '0 names' should not output. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5940) Minor cleanups to ShortCircuitReplica, FsDatasetCache, and DomainSocketWatcher
[ https://issues.apache.org/jira/browse/HDFS-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5940: --- Attachment: HDFS-5940.001.patch Minor cleanups to ShortCircuitReplica, FsDatasetCache, and DomainSocketWatcher -- Key: HDFS-5940 URL: https://issues.apache.org/jira/browse/HDFS-5940 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5940.001.patch ShortCircuitReplica#Key and FsDatasetCache#Key are pretty much identical code and should be factored out to an external class. (There will soon be a need for a third user of such an identifier.) Another minor cleanup is that DomainSocketWatcher should not implement Thread. It contains a thread currently, but I forgot to remove the extends. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5898) Allow NFS gateway to login/relogin from its kerberos keytab
[ https://issues.apache.org/jira/browse/HDFS-5898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899643#comment-13899643 ] Aaron T. Myers commented on HDFS-5898: -- [~daryn] - no, I think UGI#loginUserFromKeytab will be just fine for this use case, it's just that the current code doesn't ever call it, or ever attempt to relogin from the keytab. Allow NFS gateway to login/relogin from its kerberos keytab --- Key: HDFS-5898 URL: https://issues.apache.org/jira/browse/HDFS-5898 Project: Hadoop HDFS Issue Type: Sub-task Components: nfs Affects Versions: 2.2.0, 2.4.0 Reporter: Jing Zhao Assignee: Abin Shahab Attachments: HDFS-5898-documentation.patch, HDFS-5898-documentation.patch According to the discussion in HDFS-5804: 1. The NFS gateway should be able to get it's own tgts, and renew them. 2. We should update the HdfsNfsGateway.apt.vm -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5889) When rolling upgrade is in progress, standby NN should create checkpoint for downgrade.
[ https://issues.apache.org/jira/browse/HDFS-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899644#comment-13899644 ] Jing Zhao commented on HDFS-5889: - Another issue is that when checkpointing for the rollback image, we did not rename the md5 file. This can cause failure when loading fsimage for rollback. We can also fix this in HDFS-5920 since I have both fix and unit test for this failure. When rolling upgrade is in progress, standby NN should create checkpoint for downgrade. --- Key: HDFS-5889 URL: https://issues.apache.org/jira/browse/HDFS-5889 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h5889_20140211.patch, h5889_20140212b.patch, h5889_20140212c.patch After rolling upgrade is started and checkpoint is disabled, the edit log may grow to a huge size. It is not a problem if rolling upgrade is finalized normally since NN keeps the current state in memory and it writes a new checkpoint during finalize. However, it is a problem if admin decides to downgrade. It could take a long time to apply edit log. Rollback does not have such problem. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5938) Make BlockReaderFactory#BlockReaderPeer a static class
[ https://issues.apache.org/jira/browse/HDFS-5938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5938: --- Resolution: Fixed Fix Version/s: 2.4.0 Status: Resolved (was: Patch Available) Make BlockReaderFactory#BlockReaderPeer a static class -- Key: HDFS-5938 URL: https://issues.apache.org/jira/browse/HDFS-5938 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Trivial Fix For: 2.4.0 Attachments: HDFS-5938.001.patch Make BlockReaderFactory#BlockReaderPeer a static class to avoid a findbugs warning. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5865) Document 'FileDistribution' argument in 'hdfs oiv --processor' option
[ https://issues.apache.org/jira/browse/HDFS-5865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-5865: Description: The Offline Image Viewer document describes Currently valid options are {{Ls}}, {{XML}}, and {{Indented}} in {{--processor}} option, but now valid options are {{Ls}}, {{XML}}, and {{FileDistribution}}. (was: The Offline Image Viewer document now describes Currently valid options are {{Ls}}, {{XML}}, and {{Indented}} in {{--processor}} option, but there're more options such as {{Delimited}}, {{FileDistribution}}, and {{NameDistribution}}.) Priority: Minor (was: Major) Target Version/s: 3.0.0 (was: 2.4.0) Affects Version/s: (was: 2.2.0) 3.0.0 Summary: Document 'FileDistribution' argument in 'hdfs oiv --processor' option (was: Document some arguments in 'hdfs oiv --processor' option) Document 'FileDistribution' argument in 'hdfs oiv --processor' option - Key: HDFS-5865 URL: https://issues.apache.org/jira/browse/HDFS-5865 Project: Hadoop HDFS Issue Type: Sub-task Components: documentation Affects Versions: 3.0.0 Reporter: Akira AJISAKA Priority: Minor Labels: newbie The Offline Image Viewer document describes Currently valid options are {{Ls}}, {{XML}}, and {{Indented}} in {{--processor}} option, but now valid options are {{Ls}}, {{XML}}, and {{FileDistribution}}. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5866) '-maxSize' and '-step' option fail in OfflineImageViewer
[ https://issues.apache.org/jira/browse/HDFS-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-5866: Target Version/s: 3.0.0 (was: 2.4.0) Affects Version/s: 3.0.0 '-maxSize' and '-step' option fail in OfflineImageViewer Key: HDFS-5866 URL: https://issues.apache.org/jira/browse/HDFS-5866 Project: Hadoop HDFS Issue Type: Sub-task Components: tools Affects Versions: 3.0.0, 2.2.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Executing -step or/and -maxSize option will get the following error: {code} $ hdfs oiv -p FileDistribution -step 102400 -i input -o output Error parsing command-line options: Usage: bin/hdfs oiv [OPTIONS] -i INPUTFILE -o OUTPUTFILE {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5863) Improve OfflineImageViewer
[ https://issues.apache.org/jira/browse/HDFS-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899656#comment-13899656 ] Akira AJISAKA commented on HDFS-5863: - Updated sub-tasks. Improve OfflineImageViewer -- Key: HDFS-5863 URL: https://issues.apache.org/jira/browse/HDFS-5863 Project: Hadoop HDFS Issue Type: Improvement Components: tools Reporter: Akira AJISAKA This is an umbrella jira for improving Offline Image Viewer. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5920) Support rollback of rolling upgrade in NameNode and JournalNodes
[ https://issues.apache.org/jira/browse/HDFS-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899661#comment-13899661 ] Suresh Srinivas commented on HDFS-5920: --- Looks like you are waiting for HDFS-5889 to finish the TODOs in this patch. Comments: # nit: discard unnecessary editlog - discard unnecessary editlog segments # nit: trashEditlog() - better name could be discardEditLogSegments() - method javadoc could say instead of delete, Discard editlog segments by renaming them with suffix .trash? # it should be the first txid of some segment - it should be the first txid of some segment, if segment corresponding to the txid exists Support rollback of rolling upgrade in NameNode and JournalNodes Key: HDFS-5920 URL: https://issues.apache.org/jira/browse/HDFS-5920 Project: Hadoop HDFS Issue Type: Sub-task Components: journal-node, namenode Reporter: Jing Zhao Assignee: Jing Zhao Attachments: HDFS-5920.000.patch, HDFS-5920.000.patch, HDFS-5920.001.patch This jira provides rollback functionality for NameNode and JournalNode in rolling upgrade. Currently the proposed rollback for rolling upgrade is: 1. Shutdown both NN 2. Start one of the NN using -rollingUpgrade rollback option 3. This NN will load the special fsimage right before the upgrade marker, then discard all the editlog segments after the txid of the fsimage 4. The NN will also send RPC requests to all the JNs to discard editlog segments. This call expects response from all the JNs. The NN will keep running if the call succeeds. 5. We start the other NN using bootstrapstandby rather than -rollingUpgrade rollback -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5940) Minor cleanups to ShortCircuitReplica, FsDatasetCache, and DomainSocketWatcher
[ https://issues.apache.org/jira/browse/HDFS-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899663#comment-13899663 ] Andrew Wang commented on HDFS-5940: --- Few comments: * Slap a @VisibleForTesting annotation for watcherThread * BlockIdentifier javadoc refers to BlockDescriptors * We might want to call it ExtendedBlockIdentifier since an ExtendedBlock is what also has the bpid, but that is a mouthful. ExtendedBlockKey? +1 pending above and Jenkins. Minor cleanups to ShortCircuitReplica, FsDatasetCache, and DomainSocketWatcher -- Key: HDFS-5940 URL: https://issues.apache.org/jira/browse/HDFS-5940 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5940.001.patch ShortCircuitReplica#Key and FsDatasetCache#Key are pretty much identical code and should be factored out to an external class. (There will soon be a need for a third user of such an identifier.) Another minor cleanup is that DomainSocketWatcher should not implement Thread. It contains a thread currently, but I forgot to remove the extends. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5847) Consolidate INodeReference into a separate section
[ https://issues.apache.org/jira/browse/HDFS-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899673#comment-13899673 ] Hadoop QA commented on HDFS-5847: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12628550/HDFS-5847.002.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.TestCacheDirectives {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6125//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/6125//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6125//console This message is automatically generated. Consolidate INodeReference into a separate section -- Key: HDFS-5847 URL: https://issues.apache.org/jira/browse/HDFS-5847 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 3.0.0 Reporter: Haohui Mai Assignee: Jing Zhao Attachments: HDFS-5847.000.patch, HDFS-5847.001.patch, HDFS-5847.002.patch Currently each INodeDirectorySection.Entry contains variable numbers of INodeReference entries. The INodeReference entries are inlined, therefore it is difficult to quickly navigate through a INodeDirectorySection.Entry. Skipping through a INodeDirectorySection.Entry without parsing is essential to parse these entries in parallel. This jira proposes to consolidate INodeReferences into a section and give each of them an ID. The INodeDirectorySection.Entry can store the list of the IDs as a repeated field. That way we can leverage the existing code in protobuf to quickly skip through a INodeDirectorySection.Entry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5934) New Namenode UI back button doesn't work as expected
[ https://issues.apache.org/jira/browse/HDFS-5934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Travis Thompson updated HDFS-5934: -- Attachment: HDFS-5934-2.patch New patch to handle direct links. Good catch by the way. Since the hash isn't changing you still have call {{browse_directory()}} but only if we have a dir to browse to. New Namenode UI back button doesn't work as expected Key: HDFS-5934 URL: https://issues.apache.org/jira/browse/HDFS-5934 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0 Reporter: Travis Thompson Assignee: Travis Thompson Priority: Minor Attachments: HDFS-5934-1.patch, HDFS-5934-2.patch When I navigate to the Namenode page, and I click on the Datanodes tab, it will take me to the Datanodes page. If I click my browser back button, it does not take me back to the overview page as one would expect. This is true of choosing any tab. Another example of the back button acting weird is when browsing HDFS, if I click back one page, either the previous directory I was viewing, or the page I was viewing before entering the FS browser. Instead I am always taken back to the previous page I was viewing before entering the FS browser. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5938) Make BlockReaderFactory#BlockReaderPeer a static class
[ https://issues.apache.org/jira/browse/HDFS-5938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899682#comment-13899682 ] Hudson commented on HDFS-5938: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5158 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5158/]) HDFS-5938. Make BlockReaderFactory#BlockReaderPeer a static class to avoid a findbugs warning (cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1567767) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderFactory.java Make BlockReaderFactory#BlockReaderPeer a static class -- Key: HDFS-5938 URL: https://issues.apache.org/jira/browse/HDFS-5938 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Trivial Fix For: 2.4.0 Attachments: HDFS-5938.001.patch Make BlockReaderFactory#BlockReaderPeer a static class to avoid a findbugs warning. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5847) Consolidate INodeReference into a separate section
[ https://issues.apache.org/jira/browse/HDFS-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899685#comment-13899685 ] Jing Zhao commented on HDFS-5847: - The failed test should be un-related. The findbug warning has been fixed by HDFS-5938. Consolidate INodeReference into a separate section -- Key: HDFS-5847 URL: https://issues.apache.org/jira/browse/HDFS-5847 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 3.0.0 Reporter: Haohui Mai Assignee: Jing Zhao Attachments: HDFS-5847.000.patch, HDFS-5847.001.patch, HDFS-5847.002.patch Currently each INodeDirectorySection.Entry contains variable numbers of INodeReference entries. The INodeReference entries are inlined, therefore it is difficult to quickly navigate through a INodeDirectorySection.Entry. Skipping through a INodeDirectorySection.Entry without parsing is essential to parse these entries in parallel. This jira proposes to consolidate INodeReferences into a section and give each of them an ID. The INodeDirectorySection.Entry can store the list of the IDs as a repeated field. That way we can leverage the existing code in protobuf to quickly skip through a INodeDirectorySection.Entry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5923) Do not persist the ACL bit in the FsPermission
[ https://issues.apache.org/jira/browse/HDFS-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-5923: Attachment: HDFS-5923.004.patch Here is v4, merging back in the test changes. Reviewing now... Do not persist the ACL bit in the FsPermission -- Key: HDFS-5923 URL: https://issues.apache.org/jira/browse/HDFS-5923 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode, security Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5923.000.patch, HDFS-5923.001.patch, HDFS-5923.002.patch, HDFS-5923.003.patch, HDFS-5923.004.patch The current implementation persists and ACL bit in FSImage and editlogs. Moreover, the security decisions also depend on whether the bit is set. The problem here is that we have to maintain the implicit invariant, which is the ACL bit is set if and only if the the inode has AclFeature. The invariant has to be maintained everywhere otherwise it can lead to a security vulnerability. In the worst case, an attacker can toggle the bit and bypass the ACL checks. The jira proposes to treat the ACL bit as a transient bit. The bit should not be persisted onto the disk, neither it should affect any security decisions. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5621) NameNode: add indicator in web UI file system browser if a file has an ACL.
[ https://issues.apache.org/jira/browse/HDFS-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-5621: - Attachment: HDFS-5621.000.patch NameNode: add indicator in web UI file system browser if a file has an ACL. --- Key: HDFS-5621 URL: https://issues.apache.org/jira/browse/HDFS-5621 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Chris Nauroth Assignee: Haohui Mai Attachments: HDFS-5621.000.patch Change the file system browser to append the '+' character to permissions of any file or directory that has an ACL. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5940) Minor cleanups to ShortCircuitReplica, FsDatasetCache, and DomainSocketWatcher
[ https://issues.apache.org/jira/browse/HDFS-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5940: --- Attachment: HDFS-5940.002.patch Minor cleanups to ShortCircuitReplica, FsDatasetCache, and DomainSocketWatcher -- Key: HDFS-5940 URL: https://issues.apache.org/jira/browse/HDFS-5940 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5940.001.patch, HDFS-5940.002.patch ShortCircuitReplica#Key and FsDatasetCache#Key are pretty much identical code and should be factored out to an external class. (There will soon be a need for a third user of such an identifier.) Another minor cleanup is that DomainSocketWatcher should not implement Thread. It contains a thread currently, but I forgot to remove the extends. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5940) Minor cleanups to ShortCircuitReplica, FsDatasetCache, and DomainSocketWatcher
[ https://issues.apache.org/jira/browse/HDFS-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899704#comment-13899704 ] Colin Patrick McCabe commented on HDFS-5940: bq. Slap a @VisibleForTesting annotation for watcherThread added bq. BlockIdentifier javadoc refers to BlockDescriptors fixed bq. We might want to call it ExtendedBlockIdentifier since an ExtendedBlock is what also has the bpid, but that is a mouthful. ExtendedBlockKey? How about {{ExtendedBlockId}}? That seems short enough, and pretty accurate. (By the way, {{ExtendedBlock}} itself has a bunch of other fields besides just id and block pool id, and is not immutable, which is why I wanted the Id class in the first place.) Minor cleanups to ShortCircuitReplica, FsDatasetCache, and DomainSocketWatcher -- Key: HDFS-5940 URL: https://issues.apache.org/jira/browse/HDFS-5940 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5940.001.patch, HDFS-5940.002.patch ShortCircuitReplica#Key and FsDatasetCache#Key are pretty much identical code and should be factored out to an external class. (There will soon be a need for a third user of such an identifier.) Another minor cleanup is that DomainSocketWatcher should not implement Thread. It contains a thread currently, but I forgot to remove the extends. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5940) Minor cleanups to ShortCircuitReplica, FsDatasetCache, and DomainSocketWatcher
[ https://issues.apache.org/jira/browse/HDFS-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899720#comment-13899720 ] Andrew Wang commented on HDFS-5940: --- {{ExtendedBlockId}} sounds good. It sounds a bit like block id, but I think the javadoc and API make it pretty obvious that it also includes a bpid. Minor cleanups to ShortCircuitReplica, FsDatasetCache, and DomainSocketWatcher -- Key: HDFS-5940 URL: https://issues.apache.org/jira/browse/HDFS-5940 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5940.001.patch, HDFS-5940.002.patch ShortCircuitReplica#Key and FsDatasetCache#Key are pretty much identical code and should be factored out to an external class. (There will soon be a need for a third user of such an identifier.) Another minor cleanup is that DomainSocketWatcher should not implement Thread. It contains a thread currently, but I forgot to remove the extends. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-5923) Do not persist the ACL bit in the FsPermission
[ https://issues.apache.org/jira/browse/HDFS-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth resolved HDFS-5923. - Resolution: Fixed Fix Version/s: HDFS ACLs (HDFS-4685) Hadoop Flags: Reviewed +1 for the patch. Thanks for addressing the feedback. In addition to the automated tests, I manually tested upgrading a NameNode with edits from a trunk build to a HDFS-4685 build. The latest patch loaded the existing {{OP_ADD}} and {{OP_MKDIR}} ops with no problem. I've committed this to the HDFS-4685 branch. Do not persist the ACL bit in the FsPermission -- Key: HDFS-5923 URL: https://issues.apache.org/jira/browse/HDFS-5923 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode, security Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Haohui Mai Assignee: Haohui Mai Fix For: HDFS ACLs (HDFS-4685) Attachments: HDFS-5923.000.patch, HDFS-5923.001.patch, HDFS-5923.002.patch, HDFS-5923.003.patch, HDFS-5923.004.patch The current implementation persists and ACL bit in FSImage and editlogs. Moreover, the security decisions also depend on whether the bit is set. The problem here is that we have to maintain the implicit invariant, which is the ACL bit is set if and only if the the inode has AclFeature. The invariant has to be maintained everywhere otherwise it can lead to a security vulnerability. In the worst case, an attacker can toggle the bit and bypass the ACL checks. The jira proposes to treat the ACL bit as a transient bit. The bit should not be persisted onto the disk, neither it should affect any security decisions. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HDFS-5933) Optimize the FSImage layout for ACLs
[ https://issues.apache.org/jira/browse/HDFS-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth resolved HDFS-5933. - Resolution: Fixed Fix Version/s: HDFS ACLs (HDFS-4685) +1 for the v2 rebase patch. I committed it to the HDFS-4685 branch. Thanks again, Haohui. Optimize the FSImage layout for ACLs Key: HDFS-5933 URL: https://issues.apache.org/jira/browse/HDFS-5933 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client, namenode, security Affects Versions: HDFS ACLs (HDFS-4685) Reporter: Haohui Mai Assignee: Haohui Mai Fix For: HDFS ACLs (HDFS-4685) Attachments: HDFS-5933.000.patch, HDFS-5933.001.patch, HDFS-5933.002.patch The current serialization of the ACLs is suboptimal. ACL entries should be serialized using the same scheme that the PB-based FSImage serializes permissions. An ACL entry is represented by a 32-bit integer in Big Endian format. The bits can be divided in four segments: [0:2) || [2:26) || [26:27) || [27:29) || [29:32) [0:2) -- reserved for futute uses. [2:26) -- the name of the entry, which is an ID that points to a string in the StringTableSection. [26:27) -- the scope of the entry (AclEntryScopeProto) [27:29) -- the type of the entry (AclEntryTypeProto) [29:32) -- the permission of the entry (FsActionProto) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5907) Handle block deletion requests during rolling upgrades
[ https://issues.apache.org/jira/browse/HDFS-5907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-5907: Attachment: HDFS-5907.04.patch Thanks for the detailed review [~sureshms]. Addressed everything not mentioned below and also feedback from [~vinayrpet]. {quote} Why is on finalize the blocks in trash are not being deleted? {quote} They are deleted when NN signals finalize by the absence of {{RollingUpgradeStatus}} in the {{HeartbeatResponse}}. This triggers purge via {{BPOfferService#signalRollingUpgrade}}. {quote} Sorry I missed the logic of ENUM_WITH_ROLLING_UPGRAE_OPTION and its been a while, but I also forgot about why we added DFS_DATANODE_STARTUP_KEY in the first place. {quote} The logic for {{ENUM_WITH_ROLLING_UPGRADE_OPTION}} is required because the {{enum.toString}} was changed separately to include {{RollingUpgradeStartupOption}} and we need to have a corresponding parse method. This is the static {{#getEnum}}. {quote} Also the logic is not quite right. restoreDirectory is always going to be null right. Checking for null seems redundant and also you can just rename the trash to getRestoreDirectory(child). {quote} Good catch, the assignment to null should have been outside the for loop, this is to avoid recreating it if multiple times when restoring more than one child in the same parent. Fixed it. {quote} Unit tests required for patterns added to BlockPoolSliceStorage, and methods getTrashDirectory, getRestoreDirectory. Also these methods might be easier to test, if it accepts string {quote} I've added a number of unit tests in this version of the patch. Still working on an E2E test for DN finalize/rollback. Handle block deletion requests during rolling upgrades -- Key: HDFS-5907 URL: https://issues.apache.org/jira/browse/HDFS-5907 Project: Hadoop HDFS Issue Type: Sub-task Components: datanode Affects Versions: HDFS-5535 (Rolling upgrades) Reporter: Arpit Agarwal Assignee: Arpit Agarwal Attachments: HDFS-5907.01.patch, HDFS-5907.02.patch, HDFS-5907.04.patch DN changes when a rolling upgrade is in progress: # DataNode should handle block deletions by moving block files to 'trash'. # Block files should be restored to their original locations during a rollback. # Purge trash when the rolling upgrade is finalized. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5940) Minor cleanups to ShortCircuitReplica, FsDatasetCache, and DomainSocketWatcher
[ https://issues.apache.org/jira/browse/HDFS-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5940: --- Status: Patch Available (was: Open) Minor cleanups to ShortCircuitReplica, FsDatasetCache, and DomainSocketWatcher -- Key: HDFS-5940 URL: https://issues.apache.org/jira/browse/HDFS-5940 Project: Hadoop HDFS Issue Type: Sub-task Components: hdfs-client Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-5940.001.patch, HDFS-5940.002.patch ShortCircuitReplica#Key and FsDatasetCache#Key are pretty much identical code and should be factored out to an external class. (There will soon be a need for a third user of such an identifier.) Another minor cleanup is that DomainSocketWatcher should not implement Thread. It contains a thread currently, but I forgot to remove the extends. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5935) New Namenode UI FS browser should throw smarter error messages
[ https://issues.apache.org/jira/browse/HDFS-5935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899744#comment-13899744 ] Travis Thompson commented on HDFS-5935: --- Well something else to consider for security is that you'll never get to this page if you have security enabled and your browser isn't setup or your ticket is expired, because again, they're running on the same port. I'll still handle for them, I'm just not sure they'll ever get hit. New Namenode UI FS browser should throw smarter error messages -- Key: HDFS-5935 URL: https://issues.apache.org/jira/browse/HDFS-5935 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0 Reporter: Travis Thompson Assignee: Travis Thompson Priority: Minor When browsing using the new FS browser in the namenode, if I try to browse a folder that I don't have permission to view, it throws the error: {noformat} Failed to retreive data from /webhdfs/v1/system?op=LISTSTATUS, cause: Forbidden WebHDFS might be disabled. WebHDFS is required to browse the filesystem. {noformat} The reason I'm not allowed to see /system is because I don't have permission, not because WebHDFS is disabled. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5913) Nfs3Utils#getWccAttr() should check attr parameter against null
[ https://issues.apache.org/jira/browse/HDFS-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5913: - Attachment: HDFS-5913.patch Uploaded a patch to check the null reference. Nfs3Utils#getWccAttr() should check attr parameter against null --- Key: HDFS-5913 URL: https://issues.apache.org/jira/browse/HDFS-5913 Project: Hadoop HDFS Issue Type: Bug Components: nfs Reporter: Ted Yu Priority: Minor Attachments: HDFS-5913.patch In RpcProgramNfs3#commit() : {code} Nfs3FileAttributes postOpAttr = null; try { postOpAttr = writeManager.getFileAttr(dfsClient, handle, iug); } catch (IOException e1) { LOG.info(Can't get postOpAttr for fileId: + handle.getFileId()); } WccData fileWcc = new WccData(Nfs3Utils.getWccAttr(preOpAttr), postOpAttr); {code} If there is exception, postOpAttr would be null. However, Nfs3Utils#getWccAttr() dereferences attr parameter directly. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5913) Nfs3Utils#getWccAttr() should check attr parameter against null
[ https://issues.apache.org/jira/browse/HDFS-5913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5913: - Status: Patch Available (was: Open) Nfs3Utils#getWccAttr() should check attr parameter against null --- Key: HDFS-5913 URL: https://issues.apache.org/jira/browse/HDFS-5913 Project: Hadoop HDFS Issue Type: Bug Components: nfs Reporter: Ted Yu Priority: Minor Attachments: HDFS-5913.patch In RpcProgramNfs3#commit() : {code} Nfs3FileAttributes postOpAttr = null; try { postOpAttr = writeManager.getFileAttr(dfsClient, handle, iug); } catch (IOException e1) { LOG.info(Can't get postOpAttr for fileId: + handle.getFileId()); } WccData fileWcc = new WccData(Nfs3Utils.getWccAttr(preOpAttr), postOpAttr); {code} If there is exception, postOpAttr would be null. However, Nfs3Utils#getWccAttr() dereferences attr parameter directly. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5891) webhdfs should not try connecting the DN during redirection
[ https://issues.apache.org/jira/browse/HDFS-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899759#comment-13899759 ] Hadoop QA commented on HDFS-5891: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12628557/HDFS-5891.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6126//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/6126//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6126//console This message is automatically generated. webhdfs should not try connecting the DN during redirection --- Key: HDFS-5891 URL: https://issues.apache.org/jira/browse/HDFS-5891 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5891.000.patch, HDFS-5891.001.patch When the webhdfs server in NN serves an {{OPEN}} request, the NN will eventually redirect the request to a DN. The current implementation intends to choose the active DNs. The code always connects to the DN in a deterministic order to see whether it is active during redirection. Although it reduces the chance of the client from connecting to a failed DN, this is problematic because: # It has no guarantees that the client can connect to that DN even if the NN can connect to it. # It requires an additional network round-trip for every {{OPEN}} / {{CREATE}} request. This jira proposes that the NN should choose the DN based on the information of the data node manager. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5776) Support 'hedged' reads in DFSClient
[ https://issues.apache.org/jira/browse/HDFS-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HDFS-5776: Attachment: HDFS-5776-v17.txt Address the nice feedback by [~jingzhao]. Removed being able to enable/disable/resize post construction of DFSClient and then added handling for case where the pipeline member count could change under us while doing hedged reads because of node death. Tests pass locally. Will try on a cluster now but this posting should be good for review (thanks in advance). Support 'hedged' reads in DFSClient --- Key: HDFS-5776 URL: https://issues.apache.org/jira/browse/HDFS-5776 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs-client Affects Versions: 3.0.0 Reporter: Liang Xie Assignee: Liang Xie Attachments: HDFS-5776-v10.txt, HDFS-5776-v11.txt, HDFS-5776-v12.txt, HDFS-5776-v12.txt, HDFS-5776-v13.wip.txt, HDFS-5776-v14.txt, HDFS-5776-v15.txt, HDFS-5776-v17.txt, HDFS-5776-v2.txt, HDFS-5776-v3.txt, HDFS-5776-v4.txt, HDFS-5776-v5.txt, HDFS-5776-v6.txt, HDFS-5776-v7.txt, HDFS-5776-v8.txt, HDFS-5776-v9.txt, HDFS-5776.txt This is a placeholder of hdfs related stuff backport from https://issues.apache.org/jira/browse/HBASE-7509 The quorum read ability should be helpful especially to optimize read outliers we can utilize dfs.dfsclient.quorum.read.threshold.millis dfs.dfsclient.quorum.read.threadpool.size to enable/disable the hedged read ability from client side(e.g. HBase), and by using DFSQuorumReadMetrics, we could export the interested metric valus into client system(e.g. HBase's regionserver metric). The core logic is in pread code path, we decide to goto the original fetchBlockByteRange or the new introduced fetchBlockByteRangeSpeculative per the above config items. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5847) Consolidate INodeReference into a separate section
[ https://issues.apache.org/jira/browse/HDFS-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899771#comment-13899771 ] Haohui Mai commented on HDFS-5847: -- +1 Consolidate INodeReference into a separate section -- Key: HDFS-5847 URL: https://issues.apache.org/jira/browse/HDFS-5847 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 3.0.0 Reporter: Haohui Mai Assignee: Jing Zhao Attachments: HDFS-5847.000.patch, HDFS-5847.001.patch, HDFS-5847.002.patch Currently each INodeDirectorySection.Entry contains variable numbers of INodeReference entries. The INodeReference entries are inlined, therefore it is difficult to quickly navigate through a INodeDirectorySection.Entry. Skipping through a INodeDirectorySection.Entry without parsing is essential to parse these entries in parallel. This jira proposes to consolidate INodeReferences into a section and give each of them an ID. The INodeDirectorySection.Entry can store the list of the IDs as a repeated field. That way we can leverage the existing code in protobuf to quickly skip through a INodeDirectorySection.Entry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5889) When rolling upgrade is in progress, standby NN should create checkpoint for downgrade.
[ https://issues.apache.org/jira/browse/HDFS-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899775#comment-13899775 ] Tsz Wo (Nicholas), SZE commented on HDFS-5889: -- Jing, thanks for making the change. It looks good. We also need to change FSImage format for adding upgrade info. Otherwise, the upgrade info will be lost if it uses a checkpoint to restart. Let's commit this first and do the work separately so that this will unblock HDFS-5920. Do you agree? When rolling upgrade is in progress, standby NN should create checkpoint for downgrade. --- Key: HDFS-5889 URL: https://issues.apache.org/jira/browse/HDFS-5889 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h5889_20140211.patch, h5889_20140212b.patch, h5889_20140212c.patch After rolling upgrade is started and checkpoint is disabled, the edit log may grow to a huge size. It is not a problem if rolling upgrade is finalized normally since NN keeps the current state in memory and it writes a new checkpoint during finalize. However, it is a problem if admin decides to downgrade. It could take a long time to apply edit log. Rollback does not have such problem. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5891) webhdfs should not try connecting the DN during redirection
[ https://issues.apache.org/jira/browse/HDFS-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899773#comment-13899773 ] Haohui Mai commented on HDFS-5891: -- The findbugs warning is fixed by HDFS-5938. webhdfs should not try connecting the DN during redirection --- Key: HDFS-5891 URL: https://issues.apache.org/jira/browse/HDFS-5891 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5891.000.patch, HDFS-5891.001.patch When the webhdfs server in NN serves an {{OPEN}} request, the NN will eventually redirect the request to a DN. The current implementation intends to choose the active DNs. The code always connects to the DN in a deterministic order to see whether it is active during redirection. Although it reduces the chance of the client from connecting to a failed DN, this is problematic because: # It has no guarantees that the client can connect to that DN even if the NN can connect to it. # It requires an additional network round-trip for every {{OPEN}} / {{CREATE}} request. This jira proposes that the NN should choose the DN based on the information of the data node manager. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5866) '-maxSize' and '-step' option fail in OfflineImageViewer
[ https://issues.apache.org/jira/browse/HDFS-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-5866: Attachment: HDFS-5866.patch '-maxSize' and '-step' option fail in OfflineImageViewer Key: HDFS-5866 URL: https://issues.apache.org/jira/browse/HDFS-5866 Project: Hadoop HDFS Issue Type: Sub-task Components: tools Affects Versions: 3.0.0, 2.2.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Attachments: HDFS-5866.patch Executing -step or/and -maxSize option will get the following error: {code} $ hdfs oiv -p FileDistribution -step 102400 -i input -o output Error parsing command-line options: Usage: bin/hdfs oiv [OPTIONS] -i INPUTFILE -o OUTPUTFILE {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5866) '-maxSize' and '-step' option fail in OfflineImageViewer
[ https://issues.apache.org/jira/browse/HDFS-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira AJISAKA updated HDFS-5866: Status: Patch Available (was: Open) '-maxSize' and '-step' option fail in OfflineImageViewer Key: HDFS-5866 URL: https://issues.apache.org/jira/browse/HDFS-5866 Project: Hadoop HDFS Issue Type: Sub-task Components: tools Affects Versions: 2.2.0, 3.0.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Attachments: HDFS-5866.patch Executing -step or/and -maxSize option will get the following error: {code} $ hdfs oiv -p FileDistribution -step 102400 -i input -o output Error parsing command-line options: Usage: bin/hdfs oiv [OPTIONS] -i INPUTFILE -o OUTPUTFILE {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5866) '-maxSize' and '-step' option fail in OfflineImageViewer
[ https://issues.apache.org/jira/browse/HDFS-5866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899777#comment-13899777 ] Akira AJISAKA commented on HDFS-5866: - Attaching a patch to enable these options. I built with the patch and confirmed the options worked. '-maxSize' and '-step' option fail in OfflineImageViewer Key: HDFS-5866 URL: https://issues.apache.org/jira/browse/HDFS-5866 Project: Hadoop HDFS Issue Type: Sub-task Components: tools Affects Versions: 3.0.0, 2.2.0 Reporter: Akira AJISAKA Assignee: Akira AJISAKA Attachments: HDFS-5866.patch Executing -step or/and -maxSize option will get the following error: {code} $ hdfs oiv -p FileDistribution -step 102400 -i input -o output Error parsing command-line options: Usage: bin/hdfs oiv [OPTIONS] -i INPUTFILE -o OUTPUTFILE {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5847) Consolidate INodeReference into a separate section
[ https://issues.apache.org/jira/browse/HDFS-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-5847: Resolution: Fixed Fix Version/s: 3.0.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed this. Consolidate INodeReference into a separate section -- Key: HDFS-5847 URL: https://issues.apache.org/jira/browse/HDFS-5847 Project: Hadoop HDFS Issue Type: Sub-task Affects Versions: 3.0.0 Reporter: Haohui Mai Assignee: Jing Zhao Fix For: 3.0.0 Attachments: HDFS-5847.000.patch, HDFS-5847.001.patch, HDFS-5847.002.patch Currently each INodeDirectorySection.Entry contains variable numbers of INodeReference entries. The INodeReference entries are inlined, therefore it is difficult to quickly navigate through a INodeDirectorySection.Entry. Skipping through a INodeDirectorySection.Entry without parsing is essential to parse these entries in parallel. This jira proposes to consolidate INodeReferences into a section and give each of them an ID. The INodeDirectorySection.Entry can store the list of the IDs as a repeated field. That way we can leverage the existing code in protobuf to quickly skip through a INodeDirectorySection.Entry. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5891) webhdfs should not try connecting the DN during redirection
[ https://issues.apache.org/jira/browse/HDFS-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5891: - Component/s: webhdfs namenode webhdfs should not try connecting the DN during redirection --- Key: HDFS-5891 URL: https://issues.apache.org/jira/browse/HDFS-5891 Project: Hadoop HDFS Issue Type: Bug Components: namenode, webhdfs Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.4.0 Attachments: HDFS-5891.000.patch, HDFS-5891.001.patch When the webhdfs server in NN serves an {{OPEN}} request, the NN will eventually redirect the request to a DN. The current implementation intends to choose the active DNs. The code always connects to the DN in a deterministic order to see whether it is active during redirection. Although it reduces the chance of the client from connecting to a failed DN, this is problematic because: # It has no guarantees that the client can connect to that DN even if the NN can connect to it. # It requires an additional network round-trip for every {{OPEN}} / {{CREATE}} request. This jira proposes that the NN should choose the DN based on the information of the data node manager. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5891) webhdfs should not try connecting the DN during redirection
[ https://issues.apache.org/jira/browse/HDFS-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5891: - Fix Version/s: 2.4.0 webhdfs should not try connecting the DN during redirection --- Key: HDFS-5891 URL: https://issues.apache.org/jira/browse/HDFS-5891 Project: Hadoop HDFS Issue Type: Bug Components: namenode, webhdfs Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.4.0 Attachments: HDFS-5891.000.patch, HDFS-5891.001.patch When the webhdfs server in NN serves an {{OPEN}} request, the NN will eventually redirect the request to a DN. The current implementation intends to choose the active DNs. The code always connects to the DN in a deterministic order to see whether it is active during redirection. Although it reduces the chance of the client from connecting to a failed DN, this is problematic because: # It has no guarantees that the client can connect to that DN even if the NN can connect to it. # It requires an additional network round-trip for every {{OPEN}} / {{CREATE}} request. This jira proposes that the NN should choose the DN based on the information of the data node manager. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HDFS-5891) webhdfs should not try connecting the DN during redirection
[ https://issues.apache.org/jira/browse/HDFS-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-5891: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) webhdfs should not try connecting the DN during redirection --- Key: HDFS-5891 URL: https://issues.apache.org/jira/browse/HDFS-5891 Project: Hadoop HDFS Issue Type: Bug Components: namenode, webhdfs Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-5891.000.patch, HDFS-5891.001.patch When the webhdfs server in NN serves an {{OPEN}} request, the NN will eventually redirect the request to a DN. The current implementation intends to choose the active DNs. The code always connects to the DN in a deterministic order to see whether it is active during redirection. Although it reduces the chance of the client from connecting to a failed DN, this is problematic because: # It has no guarantees that the client can connect to that DN even if the NN can connect to it. # It requires an additional network round-trip for every {{OPEN}} / {{CREATE}} request. This jira proposes that the NN should choose the DN based on the information of the data node manager. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5891) webhdfs should not try connecting the DN during redirection
[ https://issues.apache.org/jira/browse/HDFS-5891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899794#comment-13899794 ] Brandon Li commented on HDFS-5891: -- I've committed the patch. Thank you, Haohui and Jing. webhdfs should not try connecting the DN during redirection --- Key: HDFS-5891 URL: https://issues.apache.org/jira/browse/HDFS-5891 Project: Hadoop HDFS Issue Type: Bug Components: namenode, webhdfs Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.4.0 Attachments: HDFS-5891.000.patch, HDFS-5891.001.patch When the webhdfs server in NN serves an {{OPEN}} request, the NN will eventually redirect the request to a DN. The current implementation intends to choose the active DNs. The code always connects to the DN in a deterministic order to see whether it is active during redirection. Although it reduces the chance of the client from connecting to a failed DN, this is problematic because: # It has no guarantees that the client can connect to that DN even if the NN can connect to it. # It requires an additional network round-trip for every {{OPEN}} / {{CREATE}} request. This jira proposes that the NN should choose the DN based on the information of the data node manager. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5889) When rolling upgrade is in progress, standby NN should create checkpoint for downgrade.
[ https://issues.apache.org/jira/browse/HDFS-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899796#comment-13899796 ] Jing Zhao commented on HDFS-5889: - Yes. +1 for the current patch. When rolling upgrade is in progress, standby NN should create checkpoint for downgrade. --- Key: HDFS-5889 URL: https://issues.apache.org/jira/browse/HDFS-5889 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo (Nicholas), SZE Assignee: Tsz Wo (Nicholas), SZE Attachments: h5889_20140211.patch, h5889_20140212b.patch, h5889_20140212c.patch After rolling upgrade is started and checkpoint is disabled, the edit log may grow to a huge size. It is not a problem if rolling upgrade is finalized normally since NN keeps the current state in memory and it writes a new checkpoint during finalize. However, it is a problem if admin decides to downgrade. It could take a long time to apply edit log. Rollback does not have such problem. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HDFS-5934) New Namenode UI back button doesn't work as expected
[ https://issues.apache.org/jira/browse/HDFS-5934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13899824#comment-13899824 ] Hadoop QA commented on HDFS-5934: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12628564/HDFS-5934-1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-hdfs-project/hadoop-hdfs. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/6127//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HDFS-Build/6127//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6127//console This message is automatically generated. New Namenode UI back button doesn't work as expected Key: HDFS-5934 URL: https://issues.apache.org/jira/browse/HDFS-5934 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.3.0 Reporter: Travis Thompson Assignee: Travis Thompson Priority: Minor Attachments: HDFS-5934-1.patch, HDFS-5934-2.patch When I navigate to the Namenode page, and I click on the Datanodes tab, it will take me to the Datanodes page. If I click my browser back button, it does not take me back to the overview page as one would expect. This is true of choosing any tab. Another example of the back button acting weird is when browsing HDFS, if I click back one page, either the previous directory I was viewing, or the page I was viewing before entering the FS browser. Instead I am always taken back to the previous page I was viewing before entering the FS browser. -- This message was sent by Atlassian JIRA (v6.1.5#6160)