[jira] [Updated] (HDFS-6681) TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN is flaky and sometimes gets stuck in infinite loops
[ https://issues.apache.org/jira/browse/HDFS-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ratandeep Ratti updated HDFS-6681: -- Attachment: HDFS-6681.patch TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN is flaky and sometimes gets stuck in infinite loops -- Key: HDFS-6681 URL: https://issues.apache.org/jira/browse/HDFS-6681 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.1 Environment: Java(TM) SE Runtime Environment (build 1.6.0_31-b04) Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode) Linux [hostname] 2.6.32-279.14.1.el6.x86_64 #1 SMP Mon Oct 15 13:44:51 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux Reporter: Ratandeep Ratti Attachments: HDFS-6681.patch This testcase has 3 infinite loops which break only on certain conditions being satisfied. 1st loop checks if there should be a single live replica. It assumes this to be true since it has just corrupted a block on one of the datanodes (testcase has replication factor as 2). One scenario in which this loop will never break is if the Namenode invalidates the corrupt replica, schedules a replication command, and the new copied replica is added all before this testcase has the chance to check the live-replica count. 2nd loop checks there should be 2 live replicas. It assumes this to be true (in some time) since the first loop has broken implying there is a single replica and now it is only a matter of time when the Namenode schedules a replication command to copy a replica to another datanode. One scenario in which this loop will never break is when the Namenode tries to schedule a new replica on the same node on which we actually corrupted the block. That dst. datanode will not copy the block, complaining that it already has the (corrupted) replica in the create state. The situation that results is that Namenode has scheduled a copy to a datanode, the block is now in the namenode's pending replication queue, this block will never be removed from the pending replication queue because the namenode will never receive a report from the datanodes that the block is 'added'. Note: The block can be transferred from the 'pending replication' to needed replication queue once the pending timeout (5 minutes) expires. The Namenode then actively tries to schedule a replication for blocks in 'needed replication' queue. This can cause the 2nd loop to break but the time in which this process gets kicked in is more than 5 minutes. 3rd loop: This loops checks if there are no corrupt replicas. I don't see a scenario in which this loop can go on for ever, since once the live replica count goes back to normal (2), the corrupted block will be removed I guess increasing the heart beat interval time, so that the testcase has enough time to check condition in loop 1 before a datanode reports a successful copy should help avoid race condition in loop1. Regarding loop2 I guess we can reduce the timeout after which the block is transferred from the pending replication to the needed replication queue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6681) TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN is flaky and sometimes gets stuck in infinite loops
[ https://issues.apache.org/jira/browse/HDFS-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ratandeep Ratti updated HDFS-6681: -- Status: Patch Available (was: Open) TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN is flaky and sometimes gets stuck in infinite loops -- Key: HDFS-6681 URL: https://issues.apache.org/jira/browse/HDFS-6681 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.1 Environment: Java(TM) SE Runtime Environment (build 1.6.0_31-b04) Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode) Linux [hostname] 2.6.32-279.14.1.el6.x86_64 #1 SMP Mon Oct 15 13:44:51 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux Reporter: Ratandeep Ratti Attachments: HDFS-6681.patch This testcase has 3 infinite loops which break only on certain conditions being satisfied. 1st loop checks if there should be a single live replica. It assumes this to be true since it has just corrupted a block on one of the datanodes (testcase has replication factor as 2). One scenario in which this loop will never break is if the Namenode invalidates the corrupt replica, schedules a replication command, and the new copied replica is added all before this testcase has the chance to check the live-replica count. 2nd loop checks there should be 2 live replicas. It assumes this to be true (in some time) since the first loop has broken implying there is a single replica and now it is only a matter of time when the Namenode schedules a replication command to copy a replica to another datanode. One scenario in which this loop will never break is when the Namenode tries to schedule a new replica on the same node on which we actually corrupted the block. That dst. datanode will not copy the block, complaining that it already has the (corrupted) replica in the create state. The situation that results is that Namenode has scheduled a copy to a datanode, the block is now in the namenode's pending replication queue, this block will never be removed from the pending replication queue because the namenode will never receive a report from the datanodes that the block is 'added'. Note: The block can be transferred from the 'pending replication' to needed replication queue once the pending timeout (5 minutes) expires. The Namenode then actively tries to schedule a replication for blocks in 'needed replication' queue. This can cause the 2nd loop to break but the time in which this process gets kicked in is more than 5 minutes. 3rd loop: This loops checks if there are no corrupt replicas. I don't see a scenario in which this loop can go on for ever, since once the live replica count goes back to normal (2), the corrupted block will be removed I guess increasing the heart beat interval time, so that the testcase has enough time to check condition in loop 1 before a datanode reports a successful copy should help avoid race condition in loop1. Regarding loop2 I guess we can reduce the timeout after which the block is transferred from the pending replication to the needed replication queue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6641) [ HDFS- File Concat ] Concat will fail when target file is having one block which is not full
[ https://issues.apache.org/jira/browse/HDFS-6641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brahma Reddy Battula updated HDFS-6641: --- Summary: [ HDFS- File Concat ] Concat will fail when target file is having one block which is not full (was: [ HDFS- File Concat ] Concat will fail when Src/target file is having one block which is not full ) [ HDFS- File Concat ] Concat will fail when target file is having one block which is not full -- Key: HDFS-6641 URL: https://issues.apache.org/jira/browse/HDFS-6641 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 2.4.1 Reporter: Brahma Reddy Battula sually we can't ensure lastblock alwaysfull...please let me know purpose of following check.. long blockSize = trgInode.getPreferredBlockSize(); // check the end block to be full final BlockInfo last = trgInode.getLastBlock(); if(blockSize != last.getNumBytes()) { throw new HadoopIllegalArgumentException(The last block in + target + is not full; last block size = + last.getNumBytes() + but file block size = + blockSize); } If it is issue, I'll file jira. Following is the trace.. exception in thread main org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.HadoopIllegalArgumentException): The last block in /Test.txt is not full; last block size = 14 but file block size = 134217728 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.concatInternal(FSNamesystem.java:1887) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.concatInt(FSNamesystem.java:1833) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.concat(FSNamesystem.java:1795) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.concat(NameNodeRpcServer.java:704) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.concat(ClientNamenodeProtocolServerSideTranslatorPB.java:512) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6667) In HDFS HA mode, Distcp/SLive with webhdfs on secure cluster fails with Client cannot authenticate via:[TOKEN, KERBEROS] error
[ https://issues.apache.org/jira/browse/HDFS-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061738#comment-14061738 ] Hadoop QA commented on HDFS-6667: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12655621/HDFS-6667.000.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.server.namenode.TestProcessCorruptBlocks {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7344//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7344//console This message is automatically generated. In HDFS HA mode, Distcp/SLive with webhdfs on secure cluster fails with Client cannot authenticate via:[TOKEN, KERBEROS] error -- Key: HDFS-6667 URL: https://issues.apache.org/jira/browse/HDFS-6667 Project: Hadoop HDFS Issue Type: Bug Components: security Reporter: Jian He Assignee: Jing Zhao Attachments: HDFS-6667.000.patch Opening on [~arpitgupta]'s behalf. We observed that, in HDFS HA mode, running Distcp/SLive with webhdfs will fail on YARN. In non-HA mode, it'll pass. The reason is in HA mode, only webhdfs delegation token is generated for the job, but YARN also requires the regular hdfs token to do localization, log-aggregation etc. In non-HA mode, both tokens are generated for the job. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6619) Clean up encryption-related tests
[ https://issues.apache.org/jira/browse/HDFS-6619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-6619: -- Attachment: hdfs-6619.001.patch Patch attached. High level sketch of changes: - TestHDFSEncryption wasn't testing anything beyond normal stream operations, which are already tested in a number of other HDFS tests. I removed this file entirely. - Renamed TestEncryptionZonesAPI to TestEncryptionZones - The FileContext test extending TestEncryptionZonesAPI was running all the inherited tests again, when all it wanted was to run that one rename test. I folded that one test into TestEncryptionZones. - I combined a bunch of small test cases into a single test case to save on minicluster invocations. I'd like to see us extend some of the existing stream tests to operate on encryption zones to capture the intent of TestHdfsEncryption, but let's do that in a different JIRA. There are probably also some more tests that could be written for HDFS-6474 as well. Clean up encryption-related tests - Key: HDFS-6619 URL: https://issues.apache.org/jira/browse/HDFS-6619 Project: Hadoop HDFS Issue Type: Sub-task Components: test Reporter: Andrew Wang Assignee: Andrew Wang Priority: Minor Attachments: hdfs-6619.001.patch Would be good to clean up TestHDFSEncryption and TestEncryptionZonesAPI. These tests could be renamed, test timeouts added/adjusted, reduced number of minicluster start/stops, whitespace, etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Work started] (HDFS-6619) Clean up encryption-related tests
[ https://issues.apache.org/jira/browse/HDFS-6619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HDFS-6619 started by Andrew Wang. Clean up encryption-related tests - Key: HDFS-6619 URL: https://issues.apache.org/jira/browse/HDFS-6619 Project: Hadoop HDFS Issue Type: Sub-task Components: test Reporter: Andrew Wang Assignee: Andrew Wang Priority: Minor Attachments: hdfs-6619.001.patch Would be good to clean up TestHDFSEncryption and TestEncryptionZonesAPI. These tests could be renamed, test timeouts added/adjusted, reduced number of minicluster start/stops, whitespace, etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6509) distcp vs Data At Rest Encryption
[ https://issues.apache.org/jira/browse/HDFS-6509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-6509: -- Affects Version/s: fs-encryption (HADOOP-10150 and HDFS-6134) distcp vs Data At Rest Encryption - Key: HDFS-6509 URL: https://issues.apache.org/jira/browse/HDFS-6509 Project: Hadoop HDFS Issue Type: Sub-task Components: security Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Charles Lamb Assignee: Charles Lamb Attachments: HDFS-6509distcpandDataatRestEncryption.pdf distcp needs to work with Data At Rest Encryption -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6134) Transparent data at rest encryption
[ https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061756#comment-14061756 ] Andrew Wang commented on HDFS-6134: --- Charles posted a design doc for how distcp will work with encryption at HDFS-6509. [~sanjay.radia] and [~owen.omalley], I think this is essentially the raw directory discussed earlier, but it'd be appreciated if you gave it a once over. Thanks! Transparent data at rest encryption --- Key: HDFS-6134 URL: https://issues.apache.org/jira/browse/HDFS-6134 Project: Hadoop HDFS Issue Type: New Feature Components: security Affects Versions: 2.3.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur Attachments: HDFSDataatRestEncryptionProposal_obsolete.pdf, HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf Because of privacy and security regulations, for many industries, sensitive data at rest must be in encrypted form. For example: the healthÂcare industry (HIPAA regulations), the card payment industry (PCI DSS regulations) or the US government (FISMA regulations). This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can be used transparently by any application accessing HDFS via Hadoop Filesystem Java API, Hadoop libhdfs C library, or WebHDFS REST API. The resulting implementation should be able to be used in compliance with different regulation requirements. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6450) Support non-positional hedged reads in HDFS
[ https://issues.apache.org/jira/browse/HDFS-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061760#comment-14061760 ] Liang Xie commented on HDFS-6450: - After a deep looking, it's kind of hard to reuse/maintain block reader as before. In pread(), we don't have this trouble, because we always create new block reader. In read(), if we want to support hedged read ability, in general: 1) first read(r1) using the old block reader if possible, then wait hedged read timeout setting 2) second read(r2) must create a new block reader, and submit into thread pool 3) wait the first completed task, and return final read result to client side. Here we need to set(remember) this task's block reader to DFIS's block reader variable, and should keep it open, but we also need to close the other block reader to avoid leak. Another thing need to know is that if we remember the faster block reader, if it's a remote block reader, then the following read() will bypass local read in the following r1 operations... Any thought ? [~cmccabe], [~saint@gmail.com] ... Support non-positional hedged reads in HDFS --- Key: HDFS-6450 URL: https://issues.apache.org/jira/browse/HDFS-6450 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.4.0 Reporter: Colin Patrick McCabe Assignee: Liang Xie Attachments: HDFS-6450-like-pread.txt HDFS-5776 added support for hedged positional reads. We should also support hedged non-position reads (aka regular reads). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6619) Clean up encryption-related tests
[ https://issues.apache.org/jira/browse/HDFS-6619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061765#comment-14061765 ] Yi Liu commented on HDFS-6619: -- LGTM, +1, thanks [~andrew.wang] for refining the tests. Clean up encryption-related tests - Key: HDFS-6619 URL: https://issues.apache.org/jira/browse/HDFS-6619 Project: Hadoop HDFS Issue Type: Sub-task Components: test Reporter: Andrew Wang Assignee: Andrew Wang Priority: Minor Attachments: hdfs-6619.001.patch Would be good to clean up TestHDFSEncryption and TestEncryptionZonesAPI. These tests could be renamed, test timeouts added/adjusted, reduced number of minicluster start/stops, whitespace, etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6676) KMS throws AuthenticationException when enabling kerberos authentication
[ https://issues.apache.org/jira/browse/HDFS-6676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyunzhang updated HDFS-6676: - Description: When I made a request http://server-1941.novalocal:16000/kms/v1/names in firefox. (before, i set configs in firefox according https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Deployment_Guide/sso-config-firefox.html), following info was found in logs/kms.log. 2014-07-14 19:18:30,461 WARN AuthenticationFilter - Authentication exception: GSSException: Failure unspecified at GSS-API level (Mechanism level: EncryptedData is encrypted using keytype DES CBC mode with CRC-32 but decryption key is of type NULL) org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: Failure unspecified at GSS-API level (Mechanism levelis of type NULL) at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:380) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:357) at org.apache.hadoop.crypto.key.kms.server.KMSAuthenticationFilter.doFilter(KMSAuthenticationFilter.java:100) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:745) Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism level: EncryptedData is encrypted using keytype DES CBC mode with CRC-32 but decryption key is of type NULL) at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:788) at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342) at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285) at sun.security.jgss.spnego.SpNegoContext.GSS_acceptSecContext(SpNegoContext.java:875) at sun.security.jgss.spnego.SpNegoContext.acceptSecContext(SpNegoContext.java:548) at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342) at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285) at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler$2.run(KerberosAuthenticationHandler.java:347) at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler$2.run(KerberosAuthenticationHandler.java:329) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:329) ... 14 more Caused by: KrbException: EncryptedData is encrypted using keytype DES CBC mode with CRC-32 but decryption key is of type NULL at sun.security.krb5.EncryptedData.decrypt(EncryptedData.java:169) at sun.security.krb5.KrbCred.init(KrbCred.java:131) at sun.security.jgss.krb5.InitialToken$OverloadedChecksum.init(InitialToken.java:282) at sun.security.jgss.krb5.InitSecContextToken.init(InitSecContextToken.java:130) at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:771) ... 25 more Kerberos is enabled successful in my environment: klist Ticket cache: FILE:/tmp/krb5cc_0 Default principal: HTTP/server-1941.novalocal@NOVALOCAL Valid starting ExpiresService principal 07/14/14 19:18:10 07/15/14 19:18:09 krbtgt/NOVALOCAL@NOVALOCAL renew until 07/14/14 19:18:10 07/14/14 19:18:30 07/15/14 19:18:09 HTTP/server-1941.novalocal@NOVALOCAL renew until 07/14/14 19:18:10 Following are kdc configs: cat /etc/krb5.conf [logging] default = FILE:/var/log/krb5libs.log kdc = FILE:/var/log/krb5kdc.log admin_server = FILE:/var/log/kadmind.log [libdefaults] default_realm = NOVALOCAL dns_lookup_realm = false
[jira] [Commented] (HDFS-6667) In HDFS HA mode, Distcp/SLive with webhdfs on secure cluster fails with Client cannot authenticate via:[TOKEN, KERBEROS] error
[ https://issues.apache.org/jira/browse/HDFS-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061783#comment-14061783 ] Jing Zhao commented on HDFS-6667: - The unit test failures should be unrelated. TestDFSAdminWithHA and TestPipelinesFailover were also seen in recent Jenkins run such as [here|https://issues.apache.org/jira/browse/HDFS-2856?focusedCommentId=14059617page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14059617]. TestProcessCorruptBlocks has been reported in HDFS-6656. In HDFS HA mode, Distcp/SLive with webhdfs on secure cluster fails with Client cannot authenticate via:[TOKEN, KERBEROS] error -- Key: HDFS-6667 URL: https://issues.apache.org/jira/browse/HDFS-6667 Project: Hadoop HDFS Issue Type: Bug Components: security Reporter: Jian He Assignee: Jing Zhao Attachments: HDFS-6667.000.patch Opening on [~arpitgupta]'s behalf. We observed that, in HDFS HA mode, running Distcp/SLive with webhdfs will fail on YARN. In non-HA mode, it'll pass. The reason is in HA mode, only webhdfs delegation token is generated for the job, but YARN also requires the regular hdfs token to do localization, log-aggregation etc. In non-HA mode, both tokens are generated for the job. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-2892) Some of property descriptions are not given(hdfs-default.xml)
[ https://issues.apache.org/jira/browse/HDFS-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chunjun Xiao updated HDFS-2892: --- Description: dfs.datanode.https.addressHi..I taken 23.0 release form http://hadoop.apache.org/common/releases.html#11+Nov%2C+2011%3A+release+0.23.0+available I just gone through all properties provided in the hdfs-default.xml..Some of the property description not mentioned..It's better to give description of property and usage(how to configure ) and Only MapReduce related jars only provided..Please check following two configurations *No Description* {noformat} property namedfs.datanode.https.address/name value0.0.0.0:50475/value /property property namedfs.namenode.https-address/name value0.0.0.0:50470/value /property {noformat} Better to mention example usage (what to configure...format(syntax))in desc,here I did not get what default mean whether this name of n/w interface or something else property namedfs.datanode.dns.interface/name valuedefault/value descriptionThe name of the Network Interface from which a data node should report its IP address. /description /property The following property is commented..If it is not supported better to remove. property namedfs.cluster.administrators/name valueACL for the admins/value descriptionThis configuration is used to control who can access the default servlets in the namenode, etc. /description /property Small clarification for following property..if some value configured this then NN will be safe mode upto this much time.. May I know usage of the following property... property namedfs.blockreport.initialDelay/name value0/value descriptionDelay for first block report in seconds./description /property was: Hi..I taken 23.0 release form http://hadoop.apache.org/common/releases.html#11+Nov%2C+2011%3A+release+0.23.0+available I just gone through all properties provided in the hdfs-default.xml..Some of the property description not mentioned..It's better to give description of property and usage(how to configure ) and Only MapReduce related jars only provided..Please check following two configurations *No Description* {noformat} property namedfs.datanode.https.address/name value0.0.0.0:50475/value /property property namedfs.namenode.https-address/name value0.0.0.0:50470/value /property {noformat} Better to mention example usage (what to configure...format(syntax))in desc,here I did not get what default mean whether this name of n/w interface or something else property namedfs.datanode.dns.interface/name valuedefault/value descriptionThe name of the Network Interface from which a data node should report its IP address. /description /property The following property is commented..If it is not supported better to remove. property namedfs.cluster.administrators/name valueACL for the admins/value descriptionThis configuration is used to control who can access the default servlets in the namenode, etc. /description /property Small clarification for following property..if some value configured this then NN will be safe mode upto this much time.. May I know usage of the following property... property namedfs.blockreport.initialDelay/name value0/value descriptionDelay for first block report in seconds./description /property Some of property descriptions are not given(hdfs-default.xml) -- Key: HDFS-2892 URL: https://issues.apache.org/jira/browse/HDFS-2892 Project: Hadoop HDFS Issue Type: Bug Components: namenode Affects Versions: 0.23.0 Reporter: Brahma Reddy Battula Priority: Trivial dfs.datanode.https.addressHi..I taken 23.0 release form http://hadoop.apache.org/common/releases.html#11+Nov%2C+2011%3A+release+0.23.0+available I just gone through all properties provided in the hdfs-default.xml..Some of the property description not mentioned..It's better to give description of property and usage(how to configure ) and Only MapReduce related jars only provided..Please check following two configurations *No Description* {noformat} property namedfs.datanode.https.address/name value0.0.0.0:50475/value /property property namedfs.namenode.https-address/name value0.0.0.0:50470/value /property {noformat} Better to mention example usage (what to configure...format(syntax))in desc,here I did not get what default mean whether this name of n/w interface or something else property namedfs.datanode.dns.interface/name valuedefault/value descriptionThe name of the Network Interface from which a data node should report its IP address. /description /property The following property is commented..If it is not supported better
[jira] [Created] (HDFS-6682) Add a metric to expose the timestamp of the oldest under-replicated block
Akira AJISAKA created HDFS-6682: --- Summary: Add a metric to expose the timestamp of the oldest under-replicated block Key: HDFS-6682 URL: https://issues.apache.org/jira/browse/HDFS-6682 Project: Hadoop HDFS Issue Type: Improvement Reporter: Akira AJISAKA Assignee: Akira AJISAKA In the following case, the data in the HDFS is lost and a client needs to put the same file again. # A Client puts a file to HDFS # A DataNode crashes before replicating a block of the file to other DataNodes I propose a metric to expose the timestamp of the oldest under-replicated/corrupt block. That way client can know what file to retain for the re-try. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6588) Investigating removing getTrueCause method in Server.java
[ https://issues.apache.org/jira/browse/HDFS-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061836#comment-14061836 ] Hadoop QA commented on HDFS-6588: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12655689/HDFS-6588.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.fs.shell.TestCopyPreserveFlag org.apache.hadoop.fs.TestSymlinkLocalFSFileContext org.apache.hadoop.fs.shell.TestTextCommand org.apache.hadoop.ipc.TestIPC org.apache.hadoop.fs.TestSymlinkLocalFSFileSystem org.apache.hadoop.fs.shell.TestPathData org.apache.hadoop.fs.TestDFVariations org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7345//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7345//console This message is automatically generated. Investigating removing getTrueCause method in Server.java - Key: HDFS-6588 URL: https://issues.apache.org/jira/browse/HDFS-6588 Project: Hadoop HDFS Issue Type: Bug Components: security, webhdfs Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6588.001.patch When addressing Daryn Sharp's comment for HDFS-6475 quoted below: {quote} What I'm saying is I think the patch adds too much unnecessary code. Filing an improvement to delete all but a few lines of the code changed in this patch seems a bit odd. I think you just need to: - Delete getTrueCause entirely instead of moving it elsewhere - In saslProcess, just throw the exception instead of running it through getTrueCause since it's not a InvalidToken wrapping another exception anymore. - Keep your 3-line change to unwrap SecurityException in toResponse {quote} There are multiple test failures, after making the suggested changes, Filing this jira to dedicate to the investigation of removing getTrueCause method. More detail will be put in the first comment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6681) TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN is flaky and sometimes gets stuck in infinite loops
[ https://issues.apache.org/jira/browse/HDFS-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061854#comment-14061854 ] Hadoop QA commented on HDFS-6681: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12655697/HDFS-6681.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7346//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7346//console This message is automatically generated. TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN is flaky and sometimes gets stuck in infinite loops -- Key: HDFS-6681 URL: https://issues.apache.org/jira/browse/HDFS-6681 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.1 Environment: Java(TM) SE Runtime Environment (build 1.6.0_31-b04) Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode) Linux [hostname] 2.6.32-279.14.1.el6.x86_64 #1 SMP Mon Oct 15 13:44:51 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux Reporter: Ratandeep Ratti Attachments: HDFS-6681.patch This testcase has 3 infinite loops which break only on certain conditions being satisfied. 1st loop checks if there should be a single live replica. It assumes this to be true since it has just corrupted a block on one of the datanodes (testcase has replication factor as 2). One scenario in which this loop will never break is if the Namenode invalidates the corrupt replica, schedules a replication command, and the new copied replica is added all before this testcase has the chance to check the live-replica count. 2nd loop checks there should be 2 live replicas. It assumes this to be true (in some time) since the first loop has broken implying there is a single replica and now it is only a matter of time when the Namenode schedules a replication command to copy a replica to another datanode. One scenario in which this loop will never break is when the Namenode tries to schedule a new replica on the same node on which we actually corrupted the block. That dst. datanode will not copy the block, complaining that it already has the (corrupted) replica in the create state. The situation that results is that Namenode has scheduled a copy to a datanode, the block is now in the namenode's pending replication queue, this block will never be removed from the pending replication queue because the namenode will never receive a report from the datanodes that the block is 'added'. Note: The block can be transferred from the 'pending replication' to needed replication queue once the pending timeout (5 minutes) expires. The Namenode then actively tries to schedule a replication for blocks in 'needed replication' queue. This can cause the 2nd loop to break but the time in which this process gets kicked in is more than 5 minutes. 3rd loop: This loops checks if there are no corrupt replicas. I don't see a scenario in which this loop can go on for ever, since once the live replica count goes back to normal (2), the corrupted block will be removed I guess increasing the heart beat interval time, so that the testcase has enough time to check condition in loop 1 before a datanode reports a successful copy should help avoid race condition in loop1. Regarding loop2 I guess we can reduce the timeout after which the block is transferred from the pending replication to the needed replication queue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6590) NullPointerException was generated in getBlockLocalPathInfo when datanode restarts
[ https://issues.apache.org/jira/browse/HDFS-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061863#comment-14061863 ] Guo Ruijing commented on HDFS-6590: --- Root-cause: data is not initialized and referenced in data.getBlockLocalPathInfo(block); fix solution 1: existing: in getBlockLocalPathInfo() { BlockLocalPathInfo info = data.getBlockLocalPathInfo(block); } new: in getBlockLocalPathInfo() { BlockLocalPathInfo info = null; if (data != null) { info = data.getBlockLocalPathInfo(block); } } NullPointerException was generated in getBlockLocalPathInfo when datanode restarts -- Key: HDFS-6590 URL: https://issues.apache.org/jira/browse/HDFS-6590 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.2.0 Reporter: Guo Ruijing 2014-06-11 20:34:40.240119, p43949, th140725562181728, ERROR cannot setup block reader for Block: [block pool ID: BP-1901161041-172.28.1.251-1402542341112 block ID 1073741926_1102] on Datanode: sdw3(172.28.1.3). RpcHelper.h: 74: HdfsIOException: Unexpected exception: when unwrap the rpc remote exception java.lang.NullPointerException, java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.DataNode.getBlockLocalPathInfo(DataNode.java:1014) at org.apache.hadoop.hdfs.protocolPB.ClientDatanodeProtocolServerSideTranslatorPB.getBlockLocalPathInfo(ClientDatanodeProtocolServerSideTranslatorPB.java:112) at org.apache.hadoop.hdfs.protocol.proto.ClientDatanodeProtocolProtos$ClientDatanodeProtocolService$2.callBlockingMethod(ClientDatanodeProtocolProtos.java:6373) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6590) NullPointerException was generated in getBlockLocalPathInfo when datanode restarts
[ https://issues.apache.org/jira/browse/HDFS-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061869#comment-14061869 ] Guo Ruijing commented on HDFS-6590: --- fix solution 2: move initIpcServer after initStorage like: void initBlockPool { initStorage(nsInfo); initPeriodicScanners(conf); initIpcServer(conf);//move initIpcServer after initStorage. in this case, data is initialized before getBlockLocalPathInfo is called in IPC. } NullPointerException was generated in getBlockLocalPathInfo when datanode restarts -- Key: HDFS-6590 URL: https://issues.apache.org/jira/browse/HDFS-6590 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.2.0 Reporter: Guo Ruijing 2014-06-11 20:34:40.240119, p43949, th140725562181728, ERROR cannot setup block reader for Block: [block pool ID: BP-1901161041-172.28.1.251-1402542341112 block ID 1073741926_1102] on Datanode: sdw3(172.28.1.3). RpcHelper.h: 74: HdfsIOException: Unexpected exception: when unwrap the rpc remote exception java.lang.NullPointerException, java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.DataNode.getBlockLocalPathInfo(DataNode.java:1014) at org.apache.hadoop.hdfs.protocolPB.ClientDatanodeProtocolServerSideTranslatorPB.getBlockLocalPathInfo(ClientDatanodeProtocolServerSideTranslatorPB.java:112) at org.apache.hadoop.hdfs.protocol.proto.ClientDatanodeProtocolProtos$ClientDatanodeProtocolService$2.callBlockingMethod(ClientDatanodeProtocolProtos.java:6373) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042) -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6667) In HDFS HA mode, Distcp/SLive with webhdfs on secure cluster fails with Client cannot authenticate via:[TOKEN, KERBEROS] error
[ https://issues.apache.org/jira/browse/HDFS-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061882#comment-14061882 ] Hadoop QA commented on HDFS-6667: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12655621/HDFS-6667.000.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7347//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7347//console This message is automatically generated. In HDFS HA mode, Distcp/SLive with webhdfs on secure cluster fails with Client cannot authenticate via:[TOKEN, KERBEROS] error -- Key: HDFS-6667 URL: https://issues.apache.org/jira/browse/HDFS-6667 Project: Hadoop HDFS Issue Type: Bug Components: security Reporter: Jian He Assignee: Jing Zhao Attachments: HDFS-6667.000.patch Opening on [~arpitgupta]'s behalf. We observed that, in HDFS HA mode, running Distcp/SLive with webhdfs will fail on YARN. In non-HA mode, it'll pass. The reason is in HA mode, only webhdfs delegation token is generated for the job, but YARN also requires the regular hdfs token to do localization, log-aggregation etc. In non-HA mode, both tokens are generated for the job. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6114) Block Scan log rolling will never happen if blocks written continuously leading to huge size of dncp_block_verification.log.curr
[ https://issues.apache.org/jira/browse/HDFS-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061895#comment-14061895 ] Vinayakumar B commented on HDFS-6114: - bq. I don't really see a good reason to separate delBlockInfo and delNewBlockInfo. It seems like this could just lead to scenarios where we think we're deleting a block but it pops back up (because we deleted, but did not delete new) Here, both are working on different set. {{delBlockInfo}} is used in someother places as well while updating the scantime and resort the blockInfoSet. {{delNewBlockInfo}} is only needs to be called while deleting the block itself, as intermediate updates will not happen on this set data. So {{delBlockInfo}} and {{delNewBlockInfo}} serves separate purposes and both are required. bq. I guess maybe it makes sense to separate addBlockInfo from addNewBlockInfo, just because there are places in the setup code where we're willing to add stuff directly to blockInfoSet. Even in that case, I would argue it might be easier to call addNewBlockInfo and then later roll all the newBlockInfoSet items into blockInfoSet. The problem is that having both functions creates confusion and increase the chance that someone will add an incorrect call to the wrong one later on in another change. As I am seeing, both these methods are private and acts on different sets. since method name itself suggests {{addNewBlockInfo}} is only for the new blocks. I am not seeing any confusion here. bq. It seems like a bad idea to use BlockScanInfo.LAST_SCAN_TIME_COMPARATOR for blockInfoSet, but BlockScanInfo#hashCode (i.e. the HashSet strategy) for newBlockInfoSet. Let's just use a SortedSet for both so we don't have to ponder any possible discrepancies between the comparator and the hash function. {{blockInfoSet}} is required to be sorted based on the lastScanTime, as oldest scanned block will be picked for scanning, which will be the first element in this set always. BlockScanInfo.LAST_SCAN_TIME_COMPARATOR is used because {{BlockScanInfo#hashCode()}} is default which will sort based on the blockId rather than scan time. Do you suggest me to update this {{hashCode()}} itself? bq. Another problem with HashSet (compared with TreeSet) is that it never shrinks down after enlarging... a bad property for a temporary holding area Yes, this I agree, will update in the next patch. Block Scan log rolling will never happen if blocks written continuously leading to huge size of dncp_block_verification.log.curr Key: HDFS-6114 URL: https://issues.apache.org/jira/browse/HDFS-6114 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.3.0, 2.4.0 Reporter: Vinayakumar B Assignee: Vinayakumar B Priority: Critical Attachments: HDFS-6114.patch, HDFS-6114.patch 1. {{BlockPoolSliceScanner#scan()}} will not return until all the blocks are scanned. 2. If the blocks (with size in several MBs) to datanode are written continuously then one iteration of {{BlockPoolSliceScanner#scan()}} will be continously scanning the blocks 3. These blocks will be deleted after some time (enough to get block scanned) 4. As Block Scanning is throttled, So verification of all blocks will take so much time. 5. Rolling will never happen, so even though the total number of blocks in datanode doesn't increases, entries ( which contains stale entries of deleted blocks) in *dncp_block_verification.log.curr* continuously increases leading to huge size. In one of our env, it grown more than 1TB where total number of blocks were only ~45k. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6678) MiniDFSCluster may still be partially running after initialization fails.
[ https://issues.apache.org/jira/browse/HDFS-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061929#comment-14061929 ] Hudson commented on HDFS-6678: -- FAILURE: Integrated in Hadoop-Yarn-trunk #613 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/613/]) HDFS-6678. MiniDFSCluster may still be partially running after initialization fails. Contributed by Chris Nauroth. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1610549) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java MiniDFSCluster may still be partially running after initialization fails. - Key: HDFS-6678 URL: https://issues.apache.org/jira/browse/HDFS-6678 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0, 2.5.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Minor Fix For: 3.0.0, 2.6.0 Attachments: HDFS-6678.1.patch {{MiniDFSCluster}} initializes the daemons (NameNodes, DataNodes) as part of object construction. If initialization fails, then the constructor throws an exception. When this happens, it's possible that daemons are left running in the background. There is effectively no way to clean up after this state, because the constructor failed, and therefore the caller has no way to trigger a shutdown. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6378) NFS registration should timeout instead of hanging when portmap/rpcbind is not available
[ https://issues.apache.org/jira/browse/HDFS-6378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061928#comment-14061928 ] Hudson commented on HDFS-6378: -- FAILURE: Integrated in Hadoop-Yarn-trunk #613 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/613/]) HDFS-6378. NFS registration should timeout instead of hanging when portmap/rpcbind is not available. Contributed by Abhiraj Butala (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1610543) * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/mount/MountdBase.java * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/Nfs3Base.java * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/RpcProgram.java * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/SimpleUdpClient.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt NFS registration should timeout instead of hanging when portmap/rpcbind is not available Key: HDFS-6378 URL: https://issues.apache.org/jira/browse/HDFS-6378 Project: Hadoop HDFS Issue Type: Bug Components: nfs Reporter: Brandon Li Assignee: Abhiraj Butala Fix For: 2.5.0 Attachments: HDFS-6378.002.patch, HDFS-6378.003.patch, HDFS-6378.patch When portmap/rpcbind is not available, NFS could be stuck at registration. Instead, NFS gateway should shut down automatically with proper error message. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-2856) Fix block protocol so that Datanodes don't require root or jsvc
[ https://issues.apache.org/jira/browse/HDFS-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061926#comment-14061926 ] Hudson commented on HDFS-2856: -- FAILURE: Integrated in Hadoop-Yarn-trunk #613 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/613/]) HDFS-2856. Fix block protocol so that Datanodes don't require root or jsvc. Contributed by Chris Nauroth. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1610474) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderFactory.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemotePeerFactory.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/net/EncryptedPeer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/net/TcpPeerServer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferEncryptor.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/DataEncryptionKeyFactory.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/DataTransferSaslUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/InvalidMagicNumberException.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferClient.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferServer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslParticipant.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferTestCase.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/TestSaslDataTransfer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancerWithSaslDataTransfer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockTokenWithDFS.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java Fix block protocol so that Datanodes don't require root or jsvc --- Key: HDFS-2856 URL: https://issues.apache.org/jira/browse/HDFS-2856 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, security Affects Versions: 3.0.0, 2.4.0 Reporter: Owen O'Malley Assignee: Chris Nauroth
[jira] [Commented] (HDFS-6671) Archival Storage: Consider block storage policy in replicaiton
[ https://issues.apache.org/jira/browse/HDFS-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061933#comment-14061933 ] Vinayakumar B commented on HDFS-6671: - Thanks [~szetszwo], I have found following things 1. {code} +return new IteratorStorageType() { + final IteratorDatanodeStorageInfo i = chosen.iterator(); + @Override + public boolean hasNext() {return i.hasNext();} + @Override + public StorageType next() {return i.next().getStorageType();} +};{code} Here one more method remove() needs to be implemented to fix the compilation errors. 2. typo in TestBlockStoragePolicy.DEFAULT_STORAGE_POICY 3. As of now, BlockPlacementPolicyDefault#getStorageType() will result in NPE, since the storagePolicyId is set to 0 in INodeFile, and this will return null storagePolicy. Would it be better of default policy is returned if the policy with id is found null? Archival Storage: Consider block storage policy in replicaiton -- Key: HDFS-6671 URL: https://issues.apache.org/jira/browse/HDFS-6671 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h6671_20140714.patch In order to satisfy storage policy requirement, replication monitor in addition reads storage policy information from INodeFile when performing replication. As before, it only adds replicas if a block is under replicated, and deletes replicas if a block is over replicated. It will NOT move replicas around for satisfying storage policy requirement. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-2856) Fix block protocol so that Datanodes don't require root or jsvc
[ https://issues.apache.org/jira/browse/HDFS-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062069#comment-14062069 ] Hudson commented on HDFS-2856: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1805 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1805/]) HDFS-2856. Fix block protocol so that Datanodes don't require root or jsvc. Contributed by Chris Nauroth. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1610474) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderFactory.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemotePeerFactory.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/net/EncryptedPeer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/net/TcpPeerServer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferEncryptor.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/DataEncryptionKeyFactory.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/DataTransferSaslUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/InvalidMagicNumberException.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferClient.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferServer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslParticipant.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferTestCase.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/TestSaslDataTransfer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancerWithSaslDataTransfer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockTokenWithDFS.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java Fix block protocol so that Datanodes don't require root or jsvc --- Key: HDFS-2856 URL: https://issues.apache.org/jira/browse/HDFS-2856 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, security Affects Versions: 3.0.0, 2.4.0 Reporter: Owen O'Malley Assignee: Chris Nauroth
[jira] [Commented] (HDFS-6678) MiniDFSCluster may still be partially running after initialization fails.
[ https://issues.apache.org/jira/browse/HDFS-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062072#comment-14062072 ] Hudson commented on HDFS-6678: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1805 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1805/]) HDFS-6678. MiniDFSCluster may still be partially running after initialization fails. Contributed by Chris Nauroth. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1610549) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java MiniDFSCluster may still be partially running after initialization fails. - Key: HDFS-6678 URL: https://issues.apache.org/jira/browse/HDFS-6678 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0, 2.5.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Minor Fix For: 3.0.0, 2.6.0 Attachments: HDFS-6678.1.patch {{MiniDFSCluster}} initializes the daemons (NameNodes, DataNodes) as part of object construction. If initialization fails, then the constructor throws an exception. When this happens, it's possible that daemons are left running in the background. There is effectively no way to clean up after this state, because the constructor failed, and therefore the caller has no way to trigger a shutdown. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6378) NFS registration should timeout instead of hanging when portmap/rpcbind is not available
[ https://issues.apache.org/jira/browse/HDFS-6378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062071#comment-14062071 ] Hudson commented on HDFS-6378: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #1805 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/1805/]) HDFS-6378. NFS registration should timeout instead of hanging when portmap/rpcbind is not available. Contributed by Abhiraj Butala (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1610543) * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/mount/MountdBase.java * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/Nfs3Base.java * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/RpcProgram.java * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/SimpleUdpClient.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt NFS registration should timeout instead of hanging when portmap/rpcbind is not available Key: HDFS-6378 URL: https://issues.apache.org/jira/browse/HDFS-6378 Project: Hadoop HDFS Issue Type: Bug Components: nfs Reporter: Brandon Li Assignee: Abhiraj Butala Fix For: 2.5.0 Attachments: HDFS-6378.002.patch, HDFS-6378.003.patch, HDFS-6378.patch When portmap/rpcbind is not available, NFS could be stuck at registration. Instead, NFS gateway should shut down automatically with proper error message. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6619) Clean up encryption-related tests
[ https://issues.apache.org/jira/browse/HDFS-6619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062078#comment-14062078 ] Charles Lamb commented on HDFS-6619: Piling on... +1 Clean up encryption-related tests - Key: HDFS-6619 URL: https://issues.apache.org/jira/browse/HDFS-6619 Project: Hadoop HDFS Issue Type: Sub-task Components: test Reporter: Andrew Wang Assignee: Andrew Wang Priority: Minor Attachments: hdfs-6619.001.patch Would be good to clean up TestHDFSEncryption and TestEncryptionZonesAPI. These tests could be renamed, test timeouts added/adjusted, reduced number of minicluster start/stops, whitespace, etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6422) getfattr in CLI doesn't throw exception or return non-0 return code when xattr doesn't exist
[ https://issues.apache.org/jira/browse/HDFS-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated HDFS-6422: --- Status: In Progress (was: Patch Available) getfattr in CLI doesn't throw exception or return non-0 return code when xattr doesn't exist Key: HDFS-6422 URL: https://issues.apache.org/jira/browse/HDFS-6422 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.5.0 Reporter: Charles Lamb Assignee: Charles Lamb Attachments: HDFS-6422.1.patch, HDFS-6422.2.patch, HDFS-6422.3.patch If you do hdfs dfs -getfattr -n user.blah /foo and user.blah doesn't exist, the command prints # file: /foo and a 0 return code. It should print an exception and return a non-0 return code instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6597) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed
[ https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danilo Vunjak updated HDFS-6597: Attachment: JIRA-HDFS-6597.02.patch Add a new option to NN upgrade to terminate the process after upgrade on NN is completed Key: HDFS-6597 URL: https://issues.apache.org/jira/browse/HDFS-6597 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Danilo Vunjak Attachments: JIRA-HDFS-30.patch, JIRA-HDFS-6597.02.patch, JIRA-HDFS-6597.patch Currently when namenode is started for upgrade (hadoop namenode -upgrade command), after finishing upgrade of metadata, namenode starts working normally and wait for datanodes to upgrade itself and connect to to NN. We need to have option for upgrading only NN metadata, so after upgrade is finished on NN, process should terminate. I have tested it by changing in file: hdfs.server.namenode.NameNode.java, method: public static NameNode createNameNode(String argv[], Configuration conf): in switch added case UPGRADE: case UPGRADE: { DefaultMetricsSystem.initialize(NameNode); NameNode nameNode = new NameNode(conf); if (startOpt.getForceUpgrade()) { terminate(0); return null; } return nameNode; } This did upgrade of metadata, closed process after finished, and later when all services were started, upgrade of datanodes finished sucessfully and system run . What I'm suggesting right now is to add new startup parameter -force, so namenode can be started like this hadoop namenode -upgrade -force, so we can indicate that we want to terminate process after upgrade metadata on NN is finished. Old functionality should be preserved, so users can run hadoop namenode -upgrade on same way and with same behaviour as it was previous. Thanks, Danilo -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6597) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed
[ https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danilo Vunjak updated HDFS-6597: Status: Patch Available (was: Open) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed Key: HDFS-6597 URL: https://issues.apache.org/jira/browse/HDFS-6597 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Danilo Vunjak Attachments: JIRA-HDFS-30.patch, JIRA-HDFS-6597.02.patch, JIRA-HDFS-6597.patch Currently when namenode is started for upgrade (hadoop namenode -upgrade command), after finishing upgrade of metadata, namenode starts working normally and wait for datanodes to upgrade itself and connect to to NN. We need to have option for upgrading only NN metadata, so after upgrade is finished on NN, process should terminate. I have tested it by changing in file: hdfs.server.namenode.NameNode.java, method: public static NameNode createNameNode(String argv[], Configuration conf): in switch added case UPGRADE: case UPGRADE: { DefaultMetricsSystem.initialize(NameNode); NameNode nameNode = new NameNode(conf); if (startOpt.getForceUpgrade()) { terminate(0); return null; } return nameNode; } This did upgrade of metadata, closed process after finished, and later when all services were started, upgrade of datanodes finished sucessfully and system run . What I'm suggesting right now is to add new startup parameter -force, so namenode can be started like this hadoop namenode -upgrade -force, so we can indicate that we want to terminate process after upgrade metadata on NN is finished. Old functionality should be preserved, so users can run hadoop namenode -upgrade on same way and with same behaviour as it was previous. Thanks, Danilo -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6597) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed
[ https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062145#comment-14062145 ] Hadoop QA commented on HDFS-6597: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12655777/JIRA-HDFS-6597.02.patch against trunk revision . {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7348//console This message is automatically generated. Add a new option to NN upgrade to terminate the process after upgrade on NN is completed Key: HDFS-6597 URL: https://issues.apache.org/jira/browse/HDFS-6597 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Danilo Vunjak Attachments: JIRA-HDFS-30.patch, JIRA-HDFS-6597.02.patch, JIRA-HDFS-6597.patch Currently when namenode is started for upgrade (hadoop namenode -upgrade command), after finishing upgrade of metadata, namenode starts working normally and wait for datanodes to upgrade itself and connect to to NN. We need to have option for upgrading only NN metadata, so after upgrade is finished on NN, process should terminate. I have tested it by changing in file: hdfs.server.namenode.NameNode.java, method: public static NameNode createNameNode(String argv[], Configuration conf): in switch added case UPGRADE: case UPGRADE: { DefaultMetricsSystem.initialize(NameNode); NameNode nameNode = new NameNode(conf); if (startOpt.getForceUpgrade()) { terminate(0); return null; } return nameNode; } This did upgrade of metadata, closed process after finished, and later when all services were started, upgrade of datanodes finished sucessfully and system run . What I'm suggesting right now is to add new startup parameter -force, so namenode can be started like this hadoop namenode -upgrade -force, so we can indicate that we want to terminate process after upgrade metadata on NN is finished. Old functionality should be preserved, so users can run hadoop namenode -upgrade on same way and with same behaviour as it was previous. Thanks, Danilo -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6378) NFS registration should timeout instead of hanging when portmap/rpcbind is not available
[ https://issues.apache.org/jira/browse/HDFS-6378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062169#comment-14062169 ] Hudson commented on HDFS-6378: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1832 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1832/]) HDFS-6378. NFS registration should timeout instead of hanging when portmap/rpcbind is not available. Contributed by Abhiraj Butala (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1610543) * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/mount/MountdBase.java * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/Nfs3Base.java * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/RpcProgram.java * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/SimpleUdpClient.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt NFS registration should timeout instead of hanging when portmap/rpcbind is not available Key: HDFS-6378 URL: https://issues.apache.org/jira/browse/HDFS-6378 Project: Hadoop HDFS Issue Type: Bug Components: nfs Reporter: Brandon Li Assignee: Abhiraj Butala Fix For: 2.5.0 Attachments: HDFS-6378.002.patch, HDFS-6378.003.patch, HDFS-6378.patch When portmap/rpcbind is not available, NFS could be stuck at registration. Instead, NFS gateway should shut down automatically with proper error message. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6678) MiniDFSCluster may still be partially running after initialization fails.
[ https://issues.apache.org/jira/browse/HDFS-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062170#comment-14062170 ] Hudson commented on HDFS-6678: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1832 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1832/]) HDFS-6678. MiniDFSCluster may still be partially running after initialization fails. Contributed by Chris Nauroth. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1610549) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java MiniDFSCluster may still be partially running after initialization fails. - Key: HDFS-6678 URL: https://issues.apache.org/jira/browse/HDFS-6678 Project: Hadoop HDFS Issue Type: Bug Components: test Affects Versions: 3.0.0, 2.5.0 Reporter: Chris Nauroth Assignee: Chris Nauroth Priority: Minor Fix For: 3.0.0, 2.6.0 Attachments: HDFS-6678.1.patch {{MiniDFSCluster}} initializes the daemons (NameNodes, DataNodes) as part of object construction. If initialization fails, then the constructor throws an exception. When this happens, it's possible that daemons are left running in the background. There is effectively no way to clean up after this state, because the constructor failed, and therefore the caller has no way to trigger a shutdown. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-2856) Fix block protocol so that Datanodes don't require root or jsvc
[ https://issues.apache.org/jira/browse/HDFS-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062167#comment-14062167 ] Hudson commented on HDFS-2856: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1832 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1832/]) HDFS-2856. Fix block protocol so that Datanodes don't require root or jsvc. Contributed by Chris Nauroth. (cnauroth: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1610474) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderFactory.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemotePeerFactory.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/net/EncryptedPeer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/net/TcpPeerServer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferEncryptor.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/DataEncryptionKeyFactory.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/DataTransferSaslUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/InvalidMagicNumberException.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferClient.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferServer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslParticipant.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferTestCase.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/TestSaslDataTransfer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancerWithSaslDataTransfer.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockTokenWithDFS.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java Fix block protocol so that Datanodes don't require root or jsvc --- Key: HDFS-2856 URL: https://issues.apache.org/jira/browse/HDFS-2856 Project: Hadoop HDFS Issue Type: Improvement Components: datanode, security Affects Versions: 3.0.0, 2.4.0 Reporter: Owen O'Malley Assignee: Chris Nauroth
[jira] [Updated] (HDFS-6597) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed
[ https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danilo Vunjak updated HDFS-6597: Status: Open (was: Patch Available) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed Key: HDFS-6597 URL: https://issues.apache.org/jira/browse/HDFS-6597 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Danilo Vunjak Attachments: JIRA-HDFS-30.patch, JIRA-HDFS-6597.02.patch, JIRA-HDFS-6597.patch Currently when namenode is started for upgrade (hadoop namenode -upgrade command), after finishing upgrade of metadata, namenode starts working normally and wait for datanodes to upgrade itself and connect to to NN. We need to have option for upgrading only NN metadata, so after upgrade is finished on NN, process should terminate. I have tested it by changing in file: hdfs.server.namenode.NameNode.java, method: public static NameNode createNameNode(String argv[], Configuration conf): in switch added case UPGRADE: case UPGRADE: { DefaultMetricsSystem.initialize(NameNode); NameNode nameNode = new NameNode(conf); if (startOpt.getForceUpgrade()) { terminate(0); return null; } return nameNode; } This did upgrade of metadata, closed process after finished, and later when all services were started, upgrade of datanodes finished sucessfully and system run . What I'm suggesting right now is to add new startup parameter -force, so namenode can be started like this hadoop namenode -upgrade -force, so we can indicate that we want to terminate process after upgrade metadata on NN is finished. Old functionality should be preserved, so users can run hadoop namenode -upgrade on same way and with same behaviour as it was previous. Thanks, Danilo -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6671) Archival Storage: Consider block storage policy in replicaiton
[ https://issues.apache.org/jira/browse/HDFS-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-6671: -- Attachment: h6671_20140715.patch Thanks Vinay for reveiwing the patch. # Good catch. Fixed. My IDE somehow does not show this error. # Fixed. # You are right that it should return default since storagePolicyId == 0 means policy not specified. Fixed. Here is a new patch: h6671_20140715.patch Archival Storage: Consider block storage policy in replicaiton -- Key: HDFS-6671 URL: https://issues.apache.org/jira/browse/HDFS-6671 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h6671_20140714.patch, h6671_20140715.patch In order to satisfy storage policy requirement, replication monitor in addition reads storage policy information from INodeFile when performing replication. As before, it only adds replicas if a block is under replicated, and deletes replicas if a block is over replicated. It will NOT move replicas around for satisfying storage policy requirement. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6597) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed
[ https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danilo Vunjak updated HDFS-6597: Attachment: JIRA-HDFS-6597.03.patch Add a new option to NN upgrade to terminate the process after upgrade on NN is completed Key: HDFS-6597 URL: https://issues.apache.org/jira/browse/HDFS-6597 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Danilo Vunjak Attachments: JIRA-HDFS-30.patch, JIRA-HDFS-6597.02.patch, JIRA-HDFS-6597.03.patch, JIRA-HDFS-6597.patch Currently when namenode is started for upgrade (hadoop namenode -upgrade command), after finishing upgrade of metadata, namenode starts working normally and wait for datanodes to upgrade itself and connect to to NN. We need to have option for upgrading only NN metadata, so after upgrade is finished on NN, process should terminate. I have tested it by changing in file: hdfs.server.namenode.NameNode.java, method: public static NameNode createNameNode(String argv[], Configuration conf): in switch added case UPGRADE: case UPGRADE: { DefaultMetricsSystem.initialize(NameNode); NameNode nameNode = new NameNode(conf); if (startOpt.getForceUpgrade()) { terminate(0); return null; } return nameNode; } This did upgrade of metadata, closed process after finished, and later when all services were started, upgrade of datanodes finished sucessfully and system run . What I'm suggesting right now is to add new startup parameter -force, so namenode can be started like this hadoop namenode -upgrade -force, so we can indicate that we want to terminate process after upgrade metadata on NN is finished. Old functionality should be preserved, so users can run hadoop namenode -upgrade on same way and with same behaviour as it was previous. Thanks, Danilo -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6597) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed
[ https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danilo Vunjak updated HDFS-6597: Status: Patch Available (was: Open) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed Key: HDFS-6597 URL: https://issues.apache.org/jira/browse/HDFS-6597 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Danilo Vunjak Attachments: JIRA-HDFS-30.patch, JIRA-HDFS-6597.02.patch, JIRA-HDFS-6597.03.patch, JIRA-HDFS-6597.patch Currently when namenode is started for upgrade (hadoop namenode -upgrade command), after finishing upgrade of metadata, namenode starts working normally and wait for datanodes to upgrade itself and connect to to NN. We need to have option for upgrading only NN metadata, so after upgrade is finished on NN, process should terminate. I have tested it by changing in file: hdfs.server.namenode.NameNode.java, method: public static NameNode createNameNode(String argv[], Configuration conf): in switch added case UPGRADE: case UPGRADE: { DefaultMetricsSystem.initialize(NameNode); NameNode nameNode = new NameNode(conf); if (startOpt.getForceUpgrade()) { terminate(0); return null; } return nameNode; } This did upgrade of metadata, closed process after finished, and later when all services were started, upgrade of datanodes finished sucessfully and system run . What I'm suggesting right now is to add new startup parameter -force, so namenode can be started like this hadoop namenode -upgrade -force, so we can indicate that we want to terminate process after upgrade metadata on NN is finished. Old functionality should be preserved, so users can run hadoop namenode -upgrade on same way and with same behaviour as it was previous. Thanks, Danilo -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6597) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed
[ https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danilo Vunjak updated HDFS-6597: Status: Patch Available (was: Open) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed Key: HDFS-6597 URL: https://issues.apache.org/jira/browse/HDFS-6597 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Danilo Vunjak Attachments: HDFS-6597.04.patch, JIRA-HDFS-30.patch, JIRA-HDFS-6597.02.patch, JIRA-HDFS-6597.03.patch, JIRA-HDFS-6597.patch Currently when namenode is started for upgrade (hadoop namenode -upgrade command), after finishing upgrade of metadata, namenode starts working normally and wait for datanodes to upgrade itself and connect to to NN. We need to have option for upgrading only NN metadata, so after upgrade is finished on NN, process should terminate. I have tested it by changing in file: hdfs.server.namenode.NameNode.java, method: public static NameNode createNameNode(String argv[], Configuration conf): in switch added case UPGRADE: case UPGRADE: { DefaultMetricsSystem.initialize(NameNode); NameNode nameNode = new NameNode(conf); if (startOpt.getForceUpgrade()) { terminate(0); return null; } return nameNode; } This did upgrade of metadata, closed process after finished, and later when all services were started, upgrade of datanodes finished sucessfully and system run . What I'm suggesting right now is to add new startup parameter -force, so namenode can be started like this hadoop namenode -upgrade -force, so we can indicate that we want to terminate process after upgrade metadata on NN is finished. Old functionality should be preserved, so users can run hadoop namenode -upgrade on same way and with same behaviour as it was previous. Thanks, Danilo -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6597) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed
[ https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danilo Vunjak updated HDFS-6597: Attachment: HDFS-6597.04.patch Add a new option to NN upgrade to terminate the process after upgrade on NN is completed Key: HDFS-6597 URL: https://issues.apache.org/jira/browse/HDFS-6597 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Danilo Vunjak Attachments: HDFS-6597.04.patch, JIRA-HDFS-30.patch, JIRA-HDFS-6597.02.patch, JIRA-HDFS-6597.03.patch, JIRA-HDFS-6597.patch Currently when namenode is started for upgrade (hadoop namenode -upgrade command), after finishing upgrade of metadata, namenode starts working normally and wait for datanodes to upgrade itself and connect to to NN. We need to have option for upgrading only NN metadata, so after upgrade is finished on NN, process should terminate. I have tested it by changing in file: hdfs.server.namenode.NameNode.java, method: public static NameNode createNameNode(String argv[], Configuration conf): in switch added case UPGRADE: case UPGRADE: { DefaultMetricsSystem.initialize(NameNode); NameNode nameNode = new NameNode(conf); if (startOpt.getForceUpgrade()) { terminate(0); return null; } return nameNode; } This did upgrade of metadata, closed process after finished, and later when all services were started, upgrade of datanodes finished sucessfully and system run . What I'm suggesting right now is to add new startup parameter -force, so namenode can be started like this hadoop namenode -upgrade -force, so we can indicate that we want to terminate process after upgrade metadata on NN is finished. Old functionality should be preserved, so users can run hadoop namenode -upgrade on same way and with same behaviour as it was previous. Thanks, Danilo -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6597) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed
[ https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Danilo Vunjak updated HDFS-6597: Status: Open (was: Patch Available) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed Key: HDFS-6597 URL: https://issues.apache.org/jira/browse/HDFS-6597 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Danilo Vunjak Attachments: HDFS-6597.04.patch, JIRA-HDFS-30.patch, JIRA-HDFS-6597.02.patch, JIRA-HDFS-6597.03.patch, JIRA-HDFS-6597.patch Currently when namenode is started for upgrade (hadoop namenode -upgrade command), after finishing upgrade of metadata, namenode starts working normally and wait for datanodes to upgrade itself and connect to to NN. We need to have option for upgrading only NN metadata, so after upgrade is finished on NN, process should terminate. I have tested it by changing in file: hdfs.server.namenode.NameNode.java, method: public static NameNode createNameNode(String argv[], Configuration conf): in switch added case UPGRADE: case UPGRADE: { DefaultMetricsSystem.initialize(NameNode); NameNode nameNode = new NameNode(conf); if (startOpt.getForceUpgrade()) { terminate(0); return null; } return nameNode; } This did upgrade of metadata, closed process after finished, and later when all services were started, upgrade of datanodes finished sucessfully and system run . What I'm suggesting right now is to add new startup parameter -force, so namenode can be started like this hadoop namenode -upgrade -force, so we can indicate that we want to terminate process after upgrade metadata on NN is finished. Old functionality should be preserved, so users can run hadoop namenode -upgrade on same way and with same behaviour as it was previous. Thanks, Danilo -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6490) Fix the keyid format for generated keys in FSNamesystem.createEncryptionZone
[ https://issues.apache.org/jira/browse/HDFS-6490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062224#comment-14062224 ] Uma Maheswara Rao G commented on HDFS-6490: --- Hi [~clamb], I have reviewed the patch. Please find the comments below. Patch need update with latest code I think now we are passing keyid from outside to createNewKey. In the case nameserviceID null we can use assume non federated cluster and use DFS_NAMENODE_RPC_ADDRESS_KEY? seems like when you have path ends with '/', you want to pass last char, that means again '/'. so can we use directly '/' instead of substring? sb.append(src.endsWith(/) ? / : src); -- sb.append(src.endsWith(/) ? '/' : src); sb.append(/); -- sb.append('/'); Fix the keyid format for generated keys in FSNamesystem.createEncryptionZone - Key: HDFS-6490 URL: https://issues.apache.org/jira/browse/HDFS-6490 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode, security Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Charles Lamb Assignee: Charles Lamb Attachments: HDFS-6490.001.patch FSNamesystem.createEncryptionZone needs to create key ids with the format hdfs://HOST:PORT/pathOfEZ -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6671) Archival Storage: Consider block storage policy in replicaiton
[ https://issues.apache.org/jira/browse/HDFS-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-6671: -- Attachment: h6671_20140715b.patch h6671_20140715b.patch: adds more checks on parsing storage policies. Archival Storage: Consider block storage policy in replicaiton -- Key: HDFS-6671 URL: https://issues.apache.org/jira/browse/HDFS-6671 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: h6671_20140714.patch, h6671_20140715.patch, h6671_20140715b.patch In order to satisfy storage policy requirement, replication monitor in addition reads storage policy information from INodeFile when performing replication. As before, it only adds replicas if a block is under replicated, and deletes replicas if a block is over replicated. It will NOT move replicas around for satisfying storage policy requirement. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5809) BlockPoolSliceScanner and high speed hdfs appending make datanode to drop into infinite loop
[ https://issues.apache.org/jira/browse/HDFS-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5809: --- Description: {{BlockPoolSliceScanner#scan}} contains a while loop that continues to verify (i.e. scan) blocks until the {{blockInfoSet}} is empty (or some other conditions like a timeout have occurred.) In order to do this, it calls {{BlockPoolSliceScanner#verifyFirstBlock}}. This is intended to grab the first block in the {{blockInfoSet}}, verify it, and remove it from that set. ({{blockInfoSet}} is sorted by last scan time.) Unfortunately, if we hit a certain bug in {{updateScanStatus}}, the block may never be removed from {{blockInfoSet}}. When this happens, we keep rescanning the exact same block until the timeout hits. The bug is triggered when a block winds up in {{blockInfoSet}} but not in {{blockMap}}. You can see it clearly in this code: {code} private synchronized void updateScanStatus(Block block, ScanType type, boolean scanOk) { BlockScanInfo info = blockMap.get(block); if ( info != null ) { delBlockInfo(info); } else { // It might already be removed. Thats ok, it will be caught next time. info = new BlockScanInfo(block); } {code} If {{info == null}}, we never call {{delBlockInfo}}, the function which is intended to remove the {{blockInfoSet}} entry. Luckily, there is a simple fix here... the variable that {{updateScanStatus}} is being passed is actually a BlockInfo object, so we can simply call {{delBlockInfo}} on it directly, without doing a lookup in the {{blockMap}}. This is both faster and more robust. was: Hello, everyone. When hadoop cluster starts, BlockPoolSliceScanner start scanning the blocks in my cluster. Then, randomly one datanode drop into infinite loop as the log show, and finally all datanodes drop into infinite loop. Every datanode just verify fail by one block. When i check the fail block like this : hadoop fsck / -files -blocks | grep blk_1223474551535936089_4702249, no hdfs file contains the block. It seems that in while block of BlockPoolSliceScanner's scan method drop into infinite loop . BlockPoolSliceScanner: 650 while (datanode.shouldRun !datanode.blockScanner.blockScannerThread.isInterrupted() datanode.isBPServiceAlive(blockPoolId)) { The log finally printed in method verifyBlock(BlockPoolSliceScanner:453). Please excuse my poor English. - LOG: 2014-01-21 18:36:50,582 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification failed for BP-1040548460-58.229.158.13-1385606058039:blk_6833233229840997944_4702634 - may be due to race with write 2014-01-21 18:36:50,582 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification failed for BP-1040548460-58.229.158.13-1385606058039:blk_6833233229840997944_4702634 - may be due to race with write 2014-01-21 18:36:50,582 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification failed for BP-1040548460-58.229.158.13-1385606058039:blk_6833233229840997944_4702634 - may be due to race with write BlockPoolSliceScanner and high speed hdfs appending make datanode to drop into infinite loop Key: HDFS-5809 URL: https://issues.apache.org/jira/browse/HDFS-5809 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.0.0-alpha Environment: jdk1.6, centos6.4, 2.0.0-cdh4.5.0 Reporter: ikweesung Assignee: Colin Patrick McCabe Priority: Critical Labels: blockpoolslicescanner, datanode, infinite-loop Attachments: HDFS-5809.001.patch {{BlockPoolSliceScanner#scan}} contains a while loop that continues to verify (i.e. scan) blocks until the {{blockInfoSet}} is empty (or some other conditions like a timeout have occurred.) In order to do this, it calls {{BlockPoolSliceScanner#verifyFirstBlock}}. This is intended to grab the first block in the {{blockInfoSet}}, verify it, and remove it from that set. ({{blockInfoSet}} is sorted by last scan time.) Unfortunately, if we hit a certain bug in {{updateScanStatus}}, the block may never be removed from {{blockInfoSet}}. When this happens, we keep rescanning the exact same block until the timeout hits. The
[jira] [Commented] (HDFS-6114) Block Scan log rolling will never happen if blocks written continuously leading to huge size of dncp_block_verification.log.curr
[ https://issues.apache.org/jira/browse/HDFS-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062322#comment-14062322 ] Colin Patrick McCabe commented on HDFS-6114: bq. blockInfoSet is required to be sorted based on the lastScanTime, as oldest scanned block will be picked for scanning, which will be the first element in this set always. BlockScanInfo.LAST_SCAN_TIME_COMPARATOR is used because BlockScanInfo#hashCode() is default which will sort based on the blockId rather than scan time. Do you suggest me to update this hashCode() itself? I was suggesting that you use a {{TreeSet}} or {{TreeMap}} with the same comparator as {{blockInfoSet}}. All the hash sets that I'm aware of do not shrink down after enlarging. bq. So delBlockInfo and delNewBlockInfo serves separate purposes and both are required. I can write a version of the patch that only has one del function and only one add function. I am really reluctant to put in another set of add/del functions on top of what's already there, since I think it will make things hard to understand for people trying to modify this code later or backport this patch to other branches. Block Scan log rolling will never happen if blocks written continuously leading to huge size of dncp_block_verification.log.curr Key: HDFS-6114 URL: https://issues.apache.org/jira/browse/HDFS-6114 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.3.0, 2.4.0 Reporter: Vinayakumar B Assignee: Vinayakumar B Priority: Critical Attachments: HDFS-6114.patch, HDFS-6114.patch 1. {{BlockPoolSliceScanner#scan()}} will not return until all the blocks are scanned. 2. If the blocks (with size in several MBs) to datanode are written continuously then one iteration of {{BlockPoolSliceScanner#scan()}} will be continously scanning the blocks 3. These blocks will be deleted after some time (enough to get block scanned) 4. As Block Scanning is throttled, So verification of all blocks will take so much time. 5. Rolling will never happen, so even though the total number of blocks in datanode doesn't increases, entries ( which contains stale entries of deleted blocks) in *dncp_block_verification.log.curr* continuously increases leading to huge size. In one of our env, it grown more than 1TB where total number of blocks were only ~45k. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5809) BlockPoolSliceScanner and high speed hdfs appending make datanode to drop into infinite loop
[ https://issues.apache.org/jira/browse/HDFS-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062385#comment-14062385 ] Aaron T. Myers commented on HDFS-5809: -- +1, the patch looks good to me. I agree that writing a unit test for this would be fairly difficult, and the fix is really quite clear, so I'm OK committing it without a test. Thanks a lot for taking care of this, Colin, and tanks much to ikeweesung for reporting this issue. BlockPoolSliceScanner and high speed hdfs appending make datanode to drop into infinite loop Key: HDFS-5809 URL: https://issues.apache.org/jira/browse/HDFS-5809 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.0.0-alpha Environment: jdk1.6, centos6.4, 2.0.0-cdh4.5.0 Reporter: ikweesung Assignee: Colin Patrick McCabe Priority: Critical Labels: blockpoolslicescanner, datanode, infinite-loop Attachments: HDFS-5809.001.patch {{BlockPoolSliceScanner#scan}} contains a while loop that continues to verify (i.e. scan) blocks until the {{blockInfoSet}} is empty (or some other conditions like a timeout have occurred.) In order to do this, it calls {{BlockPoolSliceScanner#verifyFirstBlock}}. This is intended to grab the first block in the {{blockInfoSet}}, verify it, and remove it from that set. ({{blockInfoSet}} is sorted by last scan time.) Unfortunately, if we hit a certain bug in {{updateScanStatus}}, the block may never be removed from {{blockInfoSet}}. When this happens, we keep rescanning the exact same block until the timeout hits. The bug is triggered when a block winds up in {{blockInfoSet}} but not in {{blockMap}}. You can see it clearly in this code: {code} private synchronized void updateScanStatus(Block block, ScanType type, boolean scanOk) { BlockScanInfo info = blockMap.get(block); if ( info != null ) { delBlockInfo(info); } else { // It might already be removed. Thats ok, it will be caught next time. info = new BlockScanInfo(block); } {code} If {{info == null}}, we never call {{delBlockInfo}}, the function which is intended to remove the {{blockInfoSet}} entry. Luckily, there is a simple fix here... the variable that {{updateScanStatus}} is being passed is actually a BlockInfo object, so we can simply call {{delBlockInfo}} on it directly, without doing a lookup in the {{blockMap}}. This is both faster and more robust. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6667) In HDFS HA mode, Distcp/SLive with webhdfs on secure cluster fails with Client cannot authenticate via:[TOKEN, KERBEROS] error
[ https://issues.apache.org/jira/browse/HDFS-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062340#comment-14062340 ] Haohui Mai commented on HDFS-6667: -- Looks good to me. +1. I think that this patch implements the approach proposed by Daryn. [~daryn], do you have any comments? In HDFS HA mode, Distcp/SLive with webhdfs on secure cluster fails with Client cannot authenticate via:[TOKEN, KERBEROS] error -- Key: HDFS-6667 URL: https://issues.apache.org/jira/browse/HDFS-6667 Project: Hadoop HDFS Issue Type: Bug Components: security Reporter: Jian He Assignee: Jing Zhao Attachments: HDFS-6667.000.patch Opening on [~arpitgupta]'s behalf. We observed that, in HDFS HA mode, running Distcp/SLive with webhdfs will fail on YARN. In non-HA mode, it'll pass. The reason is in HA mode, only webhdfs delegation token is generated for the job, but YARN also requires the regular hdfs token to do localization, log-aggregation etc. In non-HA mode, both tokens are generated for the job. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6683) TestDFSAdminWithHA.testSaveNamespace failed with Timed out waiting for Mini HDFS Cluster to start
Yongjun Zhang created HDFS-6683: --- Summary: TestDFSAdminWithHA.testSaveNamespace failed with Timed out waiting for Mini HDFS Cluster to start Key: HDFS-6683 URL: https://issues.apache.org/jira/browse/HDFS-6683 Project: Hadoop HDFS Issue Type: Bug Components: ha, tools Affects Versions: 3.0.0 Reporter: Yongjun Zhang Test failure in quite some recent test runs: {code} org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.testSaveNamespace Failing for the past 9 builds (Since Failed#7337 ) Took 12 sec. Error Message Timed out waiting for Mini HDFS Cluster to start Stacktrace java.io.IOException: Timed out waiting for Mini HDFS Cluster to start at org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1097) at org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:732) at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:378) at org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:359) at org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster.init(MiniQJMHACluster.java:102) at org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster.init(MiniQJMHACluster.java:40) at org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster$Builder.build(MiniQJMHACluster.java:67) at org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.setUpHaCluster(TestDFSAdminWithHA.java:82) at org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.testSaveNamespace(TestDFSAdminWithHA.java:134) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6588) Investigating removing getTrueCause method in Server.java
[ https://issues.apache.org/jira/browse/HDFS-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062400#comment-14062400 ] Yongjun Zhang commented on HDFS-6588: - It appeared to be a glitch in the testing, re-upload the same patch to trigger a new run. However, there seems to be a real problem with org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.testSaveNamespace, which failed in other runs many times, filed HDFS-6683. Investigating removing getTrueCause method in Server.java - Key: HDFS-6588 URL: https://issues.apache.org/jira/browse/HDFS-6588 Project: Hadoop HDFS Issue Type: Bug Components: security, webhdfs Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6588.001.patch, HDFS-6588.001.patch When addressing Daryn Sharp's comment for HDFS-6475 quoted below: {quote} What I'm saying is I think the patch adds too much unnecessary code. Filing an improvement to delete all but a few lines of the code changed in this patch seems a bit odd. I think you just need to: - Delete getTrueCause entirely instead of moving it elsewhere - In saslProcess, just throw the exception instead of running it through getTrueCause since it's not a InvalidToken wrapping another exception anymore. - Keep your 3-line change to unwrap SecurityException in toResponse {quote} There are multiple test failures, after making the suggested changes, Filing this jira to dedicate to the investigation of removing getTrueCause method. More detail will be put in the first comment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6588) Investigating removing getTrueCause method in Server.java
[ https://issues.apache.org/jira/browse/HDFS-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-6588: Attachment: HDFS-6588.001.patch Investigating removing getTrueCause method in Server.java - Key: HDFS-6588 URL: https://issues.apache.org/jira/browse/HDFS-6588 Project: Hadoop HDFS Issue Type: Bug Components: security, webhdfs Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6588.001.patch, HDFS-6588.001.patch When addressing Daryn Sharp's comment for HDFS-6475 quoted below: {quote} What I'm saying is I think the patch adds too much unnecessary code. Filing an improvement to delete all but a few lines of the code changed in this patch seems a bit odd. I think you just need to: - Delete getTrueCause entirely instead of moving it elsewhere - In saslProcess, just throw the exception instead of running it through getTrueCause since it's not a InvalidToken wrapping another exception anymore. - Keep your 3-line change to unwrap SecurityException in toResponse {quote} There are multiple test failures, after making the suggested changes, Filing this jira to dedicate to the investigation of removing getTrueCause method. More detail will be put in the first comment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6683) TestDFSAdminWithHA.testSaveNamespace failed with Timed out waiting for Mini HDFS Cluster to start
[ https://issues.apache.org/jira/browse/HDFS-6683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-6683: Description: Test failure in quite some recent test runs: ... https://builds.apache.org/job/PreCommit-HDFS-Build/7344/ https://builds.apache.org/job/PreCommit-HDFS-Build/7345/ https://builds.apache.org/job/PreCommit-HDFS-Build/7346/ ... {code} org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.testSaveNamespace Failing for the past 9 builds (Since Failed#7337 ) Took 12 sec. Error Message Timed out waiting for Mini HDFS Cluster to start Stacktrace java.io.IOException: Timed out waiting for Mini HDFS Cluster to start at org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1097) at org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:732) at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:378) at org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:359) at org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster.init(MiniQJMHACluster.java:102) at org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster.init(MiniQJMHACluster.java:40) at org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster$Builder.build(MiniQJMHACluster.java:67) at org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.setUpHaCluster(TestDFSAdminWithHA.java:82) at org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.testSaveNamespace(TestDFSAdminWithHA.java:134) {code} was: Test failure in quite some recent test runs: {code} org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.testSaveNamespace Failing for the past 9 builds (Since Failed#7337 ) Took 12 sec. Error Message Timed out waiting for Mini HDFS Cluster to start Stacktrace java.io.IOException: Timed out waiting for Mini HDFS Cluster to start at org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1097) at org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:732) at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:378) at org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:359) at org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster.init(MiniQJMHACluster.java:102) at org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster.init(MiniQJMHACluster.java:40) at org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster$Builder.build(MiniQJMHACluster.java:67) at org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.setUpHaCluster(TestDFSAdminWithHA.java:82) at org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.testSaveNamespace(TestDFSAdminWithHA.java:134) {code} TestDFSAdminWithHA.testSaveNamespace failed with Timed out waiting for Mini HDFS Cluster to start --- Key: HDFS-6683 URL: https://issues.apache.org/jira/browse/HDFS-6683 Project: Hadoop HDFS Issue Type: Bug Components: ha, tools Affects Versions: 3.0.0 Reporter: Yongjun Zhang Test failure in quite some recent test runs: ... https://builds.apache.org/job/PreCommit-HDFS-Build/7344/ https://builds.apache.org/job/PreCommit-HDFS-Build/7345/ https://builds.apache.org/job/PreCommit-HDFS-Build/7346/ ... {code} org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.testSaveNamespace Failing for the past 9 builds (Since Failed#7337 ) Took 12 sec. Error Message Timed out waiting for Mini HDFS Cluster to start Stacktrace java.io.IOException: Timed out waiting for Mini HDFS Cluster to start at org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1097) at org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:732) at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:378) at org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:359) at org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster.init(MiniQJMHACluster.java:102) at org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster.init(MiniQJMHACluster.java:40) at org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster$Builder.build(MiniQJMHACluster.java:67) at org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.setUpHaCluster(TestDFSAdminWithHA.java:82) at org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.testSaveNamespace(TestDFSAdminWithHA.java:134) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6684) HDFS NN and DN JSP pages do not check for script injection.
Jinghui Wang created HDFS-6684: -- Summary: HDFS NN and DN JSP pages do not check for script injection. Key: HDFS-6684 URL: https://issues.apache.org/jira/browse/HDFS-6684 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.1, 2.3.0, 2.2.0, 2.1.0-beta Reporter: Jinghui Wang Assignee: Jinghui Wang Datanode's browseDirectory.jsp is not filtering script injection, able to inject a script with dir parameter using dir=/hadoop'\/scriptalert(759)/script. NameNode's dfsnodelist.sjp is not filtering script injection either. Able to set the sorter/order parameter to DSC%20onMouseOver=alert(959)//. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6684) HDFS NN and DN JSP pages do not check for script injection.
[ https://issues.apache.org/jira/browse/HDFS-6684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinghui Wang updated HDFS-6684: --- Attachment: HDFS-6684.patch HDFS NN and DN JSP pages do not check for script injection. --- Key: HDFS-6684 URL: https://issues.apache.org/jira/browse/HDFS-6684 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.1.0-beta, 2.2.0, 2.3.0, 2.4.1 Reporter: Jinghui Wang Assignee: Jinghui Wang Attachments: HDFS-6684.patch Datanode's browseDirectory.jsp is not filtering script injection, able to inject a script with dir parameter using dir=/hadoop'\/scriptalert(759)/script. NameNode's dfsnodelist.sjp is not filtering script injection either. Able to set the sorter/order parameter to DSC%20onMouseOver=alert(959)//. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6684) HDFS NN and DN JSP pages do not check for script injection.
[ https://issues.apache.org/jira/browse/HDFS-6684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062406#comment-14062406 ] Jinghui Wang commented on HDFS-6684: Patch attached. HDFS NN and DN JSP pages do not check for script injection. --- Key: HDFS-6684 URL: https://issues.apache.org/jira/browse/HDFS-6684 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.1.0-beta, 2.2.0, 2.3.0, 2.4.1 Reporter: Jinghui Wang Assignee: Jinghui Wang Attachments: HDFS-6684.patch Datanode's browseDirectory.jsp is not filtering script injection, able to inject a script with dir parameter using dir=/hadoop'\/scriptalert(759)/script. NameNode's dfsnodelist.sjp is not filtering script injection either. Able to set the sorter/order parameter to DSC%20onMouseOver=alert(959)//. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6584) Support archival storage
[ https://issues.apache.org/jira/browse/HDFS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-6584: -- Attachment: HDFSArchivalStorageDesign20140715.pdf HDFSArchivalStorageDesign20140715.pdf: revised design doc. Support archival storage Key: HDFS-6584 URL: https://issues.apache.org/jira/browse/HDFS-6584 Project: Hadoop HDFS Issue Type: New Feature Components: datanode, namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: HDFSArchivalStorageDesign20140623.pdf, HDFSArchivalStorageDesign20140715.pdf In most of the Hadoop clusters, as more and more data is stored for longer time, the demand for storage is outstripping the compute. Hadoop needs a cost effective and easy to manage solution to meet this demand for storage. Current solution is: - Delete the old unused data. This comes at operational cost of identifying unnecessary data and deleting them manually. - Add more nodes to the clusters. This adds along with storage capacity unnecessary compute capacity to the cluster. Hadoop needs a solution to decouple growing storage capacity from compute capacity. Nodes with higher density and less expensive storage with low compute power are becoming available and can be used as cold storage in the clusters. Based on policy the data from hot storage can be moved to cold storage. Adding more nodes to the cold storage can grow the storage independent of the compute capacity in the cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6584) Support Archival Storage
[ https://issues.apache.org/jira/browse/HDFS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-6584: -- Component/s: (was: datanode) balancer Summary: Support Archival Storage (was: Support archival storage) Support Archival Storage Key: HDFS-6584 URL: https://issues.apache.org/jira/browse/HDFS-6584 Project: Hadoop HDFS Issue Type: New Feature Components: balancer, namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Attachments: HDFSArchivalStorageDesign20140623.pdf, HDFSArchivalStorageDesign20140715.pdf In most of the Hadoop clusters, as more and more data is stored for longer time, the demand for storage is outstripping the compute. Hadoop needs a cost effective and easy to manage solution to meet this demand for storage. Current solution is: - Delete the old unused data. This comes at operational cost of identifying unnecessary data and deleting them manually. - Add more nodes to the clusters. This adds along with storage capacity unnecessary compute capacity to the cluster. Hadoop needs a solution to decouple growing storage capacity from compute capacity. Nodes with higher density and less expensive storage with low compute power are becoming available and can be used as cold storage in the clusters. Based on policy the data from hot storage can be moved to cold storage. Adding more nodes to the cold storage can grow the storage independent of the compute capacity in the cluster. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HDFS-6679) Archival Storage: Bump NameNodeLayoutVersion and update editsStored test files
[ https://issues.apache.org/jira/browse/HDFS-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze reassigned HDFS-6679: - Assignee: (was: Tsz Wo Nicholas Sze) Archival Storage: Bump NameNodeLayoutVersion and update editsStored test files -- Key: HDFS-6679 URL: https://issues.apache.org/jira/browse/HDFS-6679 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo Nicholas Sze HDFS-6677 changed fsimage for storing storage policy IDs. We should bump the NameNodeLayoutVersion and as well fix the tests. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6685) Archival Storage: Update Balancer to preserve storage type of replicas
Tsz Wo Nicholas Sze created HDFS-6685: - Summary: Archival Storage: Update Balancer to preserve storage type of replicas Key: HDFS-6685 URL: https://issues.apache.org/jira/browse/HDFS-6685 Project: Hadoop HDFS Issue Type: Sub-task Components: balancer Reporter: Tsz Wo Nicholas Sze In order to maintain storage policy requirement, Balancer always moves replicas from a storage with any type to another storage with the same type, i.e. it preserves storage type of replicas. In this way, Balancer does not require to know storage policy information. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5809) BlockPoolSliceScanner and high speed hdfs appending make datanode to drop into infinite loop
[ https://issues.apache.org/jira/browse/HDFS-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HDFS-5809: --- Resolution: Fixed Fix Version/s: 2.6.0 Status: Resolved (was: Patch Available) BlockPoolSliceScanner and high speed hdfs appending make datanode to drop into infinite loop Key: HDFS-5809 URL: https://issues.apache.org/jira/browse/HDFS-5809 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.0.0-alpha Environment: jdk1.6, centos6.4, 2.0.0-cdh4.5.0 Reporter: ikweesung Assignee: Colin Patrick McCabe Priority: Critical Labels: blockpoolslicescanner, datanode, infinite-loop Fix For: 2.6.0 Attachments: HDFS-5809.001.patch {{BlockPoolSliceScanner#scan}} contains a while loop that continues to verify (i.e. scan) blocks until the {{blockInfoSet}} is empty (or some other conditions like a timeout have occurred.) In order to do this, it calls {{BlockPoolSliceScanner#verifyFirstBlock}}. This is intended to grab the first block in the {{blockInfoSet}}, verify it, and remove it from that set. ({{blockInfoSet}} is sorted by last scan time.) Unfortunately, if we hit a certain bug in {{updateScanStatus}}, the block may never be removed from {{blockInfoSet}}. When this happens, we keep rescanning the exact same block until the timeout hits. The bug is triggered when a block winds up in {{blockInfoSet}} but not in {{blockMap}}. You can see it clearly in this code: {code} private synchronized void updateScanStatus(Block block, ScanType type, boolean scanOk) { BlockScanInfo info = blockMap.get(block); if ( info != null ) { delBlockInfo(info); } else { // It might already be removed. Thats ok, it will be caught next time. info = new BlockScanInfo(block); } {code} If {{info == null}}, we never call {{delBlockInfo}}, the function which is intended to remove the {{blockInfoSet}} entry. Luckily, there is a simple fix here... the variable that {{updateScanStatus}} is being passed is actually a BlockInfo object, so we can simply call {{delBlockInfo}} on it directly, without doing a lookup in the {{blockMap}}. This is both faster and more robust. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6686) Archival Storage: Use fallback storage types
Tsz Wo Nicholas Sze created HDFS-6686: - Summary: Archival Storage: Use fallback storage types Key: HDFS-6686 URL: https://issues.apache.org/jira/browse/HDFS-6686 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze HDFS-6671 changes replication monitor to use block storage policy for replication. It should also use the fallback storage types when a particular type of storage is full. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-5809) BlockPoolSliceScanner and high speed hdfs appending make datanode to drop into infinite loop
[ https://issues.apache.org/jira/browse/HDFS-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062449#comment-14062449 ] Hudson commented on HDFS-5809: -- SUCCESS: Integrated in Hadoop-trunk-Commit #5883 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5883/]) HDFS-5809. BlockPoolSliceScanner and high speed hdfs appending make datanode to drop into infinite loop (cmccabe) (cmccabe: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1610790) * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java BlockPoolSliceScanner and high speed hdfs appending make datanode to drop into infinite loop Key: HDFS-5809 URL: https://issues.apache.org/jira/browse/HDFS-5809 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.0.0-alpha Environment: jdk1.6, centos6.4, 2.0.0-cdh4.5.0 Reporter: ikweesung Assignee: Colin Patrick McCabe Priority: Critical Labels: blockpoolslicescanner, datanode, infinite-loop Fix For: 2.6.0 Attachments: HDFS-5809.001.patch {{BlockPoolSliceScanner#scan}} contains a while loop that continues to verify (i.e. scan) blocks until the {{blockInfoSet}} is empty (or some other conditions like a timeout have occurred.) In order to do this, it calls {{BlockPoolSliceScanner#verifyFirstBlock}}. This is intended to grab the first block in the {{blockInfoSet}}, verify it, and remove it from that set. ({{blockInfoSet}} is sorted by last scan time.) Unfortunately, if we hit a certain bug in {{updateScanStatus}}, the block may never be removed from {{blockInfoSet}}. When this happens, we keep rescanning the exact same block until the timeout hits. The bug is triggered when a block winds up in {{blockInfoSet}} but not in {{blockMap}}. You can see it clearly in this code: {code} private synchronized void updateScanStatus(Block block, ScanType type, boolean scanOk) { BlockScanInfo info = blockMap.get(block); if ( info != null ) { delBlockInfo(info); } else { // It might already be removed. Thats ok, it will be caught next time. info = new BlockScanInfo(block); } {code} If {{info == null}}, we never call {{delBlockInfo}}, the function which is intended to remove the {{blockInfoSet}} entry. Luckily, there is a simple fix here... the variable that {{updateScanStatus}} is being passed is actually a BlockInfo object, so we can simply call {{delBlockInfo}} on it directly, without doing a lookup in the {{blockMap}}. This is both faster and more robust. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6340) DN can't finalize upgrade
[ https://issues.apache.org/jira/browse/HDFS-6340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062470#comment-14062470 ] Vitaliy Fuks commented on HDFS-6340: Is anyone aware of any way to work around this issue, with upgrading? DN can't finalize upgrade - Key: HDFS-6340 URL: https://issues.apache.org/jira/browse/HDFS-6340 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.4.0 Reporter: Rahul Singhal Assignee: Rahul Singhal Priority: Blocker Fix For: 3.0.0, 2.4.1 Attachments: HDFS-6340-branch-2.4.0.patch, HDFS-6340.02.patch, HDFS-6340.patch I upgraded a (NN) HA cluster from 2.2.0 to 2.4.0. After I issued the '-finalizeUpgarde' command, NN was able to finalize the upgrade but DN couldn't (I waited for the next block report). I think I have found the problem to be due to HDFS-5153. I will attach a proposed fix. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6597) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed
[ https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062471#comment-14062471 ] Hadoop QA commented on HDFS-6597: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12655785/HDFS-6597.04.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7349//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7349//console This message is automatically generated. Add a new option to NN upgrade to terminate the process after upgrade on NN is completed Key: HDFS-6597 URL: https://issues.apache.org/jira/browse/HDFS-6597 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Danilo Vunjak Attachments: HDFS-6597.04.patch, JIRA-HDFS-30.patch, JIRA-HDFS-6597.02.patch, JIRA-HDFS-6597.03.patch, JIRA-HDFS-6597.patch Currently when namenode is started for upgrade (hadoop namenode -upgrade command), after finishing upgrade of metadata, namenode starts working normally and wait for datanodes to upgrade itself and connect to to NN. We need to have option for upgrading only NN metadata, so after upgrade is finished on NN, process should terminate. I have tested it by changing in file: hdfs.server.namenode.NameNode.java, method: public static NameNode createNameNode(String argv[], Configuration conf): in switch added case UPGRADE: case UPGRADE: { DefaultMetricsSystem.initialize(NameNode); NameNode nameNode = new NameNode(conf); if (startOpt.getForceUpgrade()) { terminate(0); return null; } return nameNode; } This did upgrade of metadata, closed process after finished, and later when all services were started, upgrade of datanodes finished sucessfully and system run . What I'm suggesting right now is to add new startup parameter -force, so namenode can be started like this hadoop namenode -upgrade -force, so we can indicate that we want to terminate process after upgrade metadata on NN is finished. Old functionality should be preserved, so users can run hadoop namenode -upgrade on same way and with same behaviour as it was previous. Thanks, Danilo -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6340) DN can't finalize upgrade
[ https://issues.apache.org/jira/browse/HDFS-6340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062480#comment-14062480 ] Arpit Agarwal commented on HDFS-6340: - You can manually delete the 'previous' directory on each DN and also 'blocksBeingWritten', if it exists. This will effectively finalize the DN upgrade. DN can't finalize upgrade - Key: HDFS-6340 URL: https://issues.apache.org/jira/browse/HDFS-6340 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.4.0 Reporter: Rahul Singhal Assignee: Rahul Singhal Priority: Blocker Fix For: 3.0.0, 2.4.1 Attachments: HDFS-6340-branch-2.4.0.patch, HDFS-6340.02.patch, HDFS-6340.patch I upgraded a (NN) HA cluster from 2.2.0 to 2.4.0. After I issued the '-finalizeUpgarde' command, NN was able to finalize the upgrade but DN couldn't (I waited for the next block report). I think I have found the problem to be due to HDFS-5153. I will attach a proposed fix. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6616) bestNode shouldn't always return the first DataNode
[ https://issues.apache.org/jira/browse/HDFS-6616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062482#comment-14062482 ] Tsz Wo Nicholas Sze commented on HDFS-6616: --- @ zhaoyunjiong, are you going to post a new patch? bestNode shouldn't always return the first DataNode --- Key: HDFS-6616 URL: https://issues.apache.org/jira/browse/HDFS-6616 Project: Hadoop HDFS Issue Type: Bug Components: webhdfs Reporter: zhaoyunjiong Assignee: zhaoyunjiong Priority: Minor Attachments: HDFS-6616.patch When we are doing distcp between clusters, job failed: 014-06-30 20:56:28,430 INFO org.apache.hadoop.tools.DistCp: FAIL part-r-00101.avro : java.net.NoRouteToHostException: No route to host at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1491) at java.security.AccessController.doPrivileged(Native Method) at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1485) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1139) at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379) at org.apache.hadoop.hdfs.HftpFileSystem.open(HftpFileSystem.java:322) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427) at org.apache.hadoop.tools.DistCp$CopyFilesMapper.copy(DistCp.java:419) at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:547) at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:314) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:365) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) at org.apache.hadoop.mapred.Child.main(Child.java:249) The root reason is one of the DataNode can't access from outside, but inside cluster, it's health. In NamenodeWebHdfsMethods.java:bestNode, it always return the first DataNode, so even after the distcp retries, it still failed. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6658) Namenode memory optimization - Block replicas list
[ https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062498#comment-14062498 ] Amir Langer commented on HDFS-6658: --- We explored the idea of off-heap memory for the Namenode. It makes sense for Inodes and there was already some work on that done at Hortonworks. For the blocks however there is a problem - Blocks data has two very different access patterns. Clients will typically access a few blocks (from same or similar files) and mostly the recent ones, while block reports can scan the entire block space. This means there is no locality of reference and caching is not going to work. If we don't have caching, we need to cope with the added latency of off-heap memory - It is after all backed up by a file. From our measurements - this cost seems too high with some block reports seem to never be able to finish. (Just think of the cost of the off-heap management keep needing to load pages from the file into its memory and its page caching not having any effect). Namenode memory optimization - Block replicas list --- Key: HDFS-6658 URL: https://issues.apache.org/jira/browse/HDFS-6658 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.4.1 Reporter: Amir Langer Assignee: Amir Langer Attachments: Namenode Memory Optimizations - Block replicas list.docx Part of the memory consumed by every BlockInfo object in the Namenode is a linked list of block references for every DatanodeStorageInfo (called triplets). We propose to change the way we store the list in memory. Using primitive integer indexes instead of object references will reduce the memory needed for every block replica (when compressed oops is disabled) and in our new design the list overhead will be per DatanodeStorageInfo and not per block replica. see attached design doc. for details and evaluation results. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6469) Coordinated replication of the namespace using ConsensusNode
[ https://issues.apache.org/jira/browse/HDFS-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062502#comment-14062502 ] Sanjay Radia commented on HDFS-6469: bq. I meant that if you use QJM then every update on the NameNode results in writing into two journals: first into edits log and then into QJM journal. Konstantine, HDFS has supported parallel journals (ie multiple editlogs for a long time.) that are written in parallel. A customer can use just QJM (which gives at least 3 replicas) and can optionally have a local parallel editlog if they want additional redundancy. What you are proposing is dual *serial* journals. Coordinated replication of the namespace using ConsensusNode Key: HDFS-6469 URL: https://issues.apache.org/jira/browse/HDFS-6469 Project: Hadoop HDFS Issue Type: New Feature Components: namenode Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Attachments: CNodeDesign.pdf This is a proposal to introduce ConsensusNode - an evolution of the NameNode, which enables replication of the namespace on multiple nodes of an HDFS cluster by means of a Coordination Engine. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6687) nn.getNamesystem() may return NPE from JspHelper
Mit Desai created HDFS-6687: --- Summary: nn.getNamesystem() may return NPE from JspHelper Key: HDFS-6687 URL: https://issues.apache.org/jira/browse/HDFS-6687 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Mit Desai Assignee: Mit Desai In hadoop-2, the http server is started in the very early stage to show the progress. If the user tries to get the name system, it may not be completely up and the NN logs will have this kind of error. {noformat} 2014-07-14 15:49:03,521 [***] WARN resources.ExceptionHandler: INTERNAL_SERVER_ERROR java.lang.NullPointerException at org.apache.hadoop.hdfs.server.common.JspHelper.getTokenUGI(JspHelper.java:661) at org.apache.hadoop.hdfs.server.common.JspHelper.getUGI(JspHelper.java:604) at org.apache.hadoop.hdfs.web.resources.UserProvider.getValue(UserProvider.java:53) at org.apache.hadoop.hdfs.web.resources.UserProvider.getValue(UserProvider.java:41) at com.sun.jersey.server.impl.inject.InjectableValuesProvider.getInjectableValues(InjectableValuesProvider.java:46) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$EntityParamInInvoker.getParams(AbstractResourceMethodDispatchProvider.java:153) at com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:203) at com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) at org.apache.hadoop.hdfs.web.AuthFilter.doFilter(AuthFilter.java:84) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:78) at com.yahoo.hadoop.GzipFilter.doFilter(GzipFilter.java:220) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1223) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at
[jira] [Commented] (HDFS-6658) Namenode memory optimization - Block replicas list
[ https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062514#comment-14062514 ] Colin Patrick McCabe commented on HDFS-6658: bq. If we don't have caching, we need to cope with the added latency of off-heap memory - It is after all backed up by a file. Amir, there's no file involved. See my comment here: https://issues.apache.org/jira/browse/HDFS-6658?focusedCommentId=14061374page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14061374 I'm talking about memory. Memory, not disk. It is simply RAM that is not managed by the JVM. There's more information here: http://stackoverflow.com/questions/6091615/difference-between-on-heap-and-off-heap. Namenode memory optimization - Block replicas list --- Key: HDFS-6658 URL: https://issues.apache.org/jira/browse/HDFS-6658 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.4.1 Reporter: Amir Langer Assignee: Amir Langer Attachments: Namenode Memory Optimizations - Block replicas list.docx Part of the memory consumed by every BlockInfo object in the Namenode is a linked list of block references for every DatanodeStorageInfo (called triplets). We propose to change the way we store the list in memory. Using primitive integer indexes instead of object references will reduce the memory needed for every block replica (when compressed oops is disabled) and in our new design the list overhead will be per DatanodeStorageInfo and not per block replica. see attached design doc. for details and evaluation results. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6688) Hadoop JMX stats are not refreshed
Biju Nair created HDFS-6688: --- Summary: Hadoop JMX stats are not refreshed Key: HDFS-6688 URL: https://issues.apache.org/jira/browse/HDFS-6688 Project: Hadoop HDFS Issue Type: Bug Environment: Ubuntu Reporter: Biju Nair Even when the HDFS datanode process is stopped the JMX attribute Hadoop.NameNode.FSNamesystemState.NumLiveDataNodes/NumDeadDataNodes attribute values doesn't change. Also Hadoop.NameNode.NameNodeInfo.Attributes.LiveNodes shows the stopped datanode details. If these attributes reflect the actual changes in the datanode, they can be used to monitor the health of the HDFS cluster which currently can't be used. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6469) Coordinated replication of the namespace using ConsensusNode
[ https://issues.apache.org/jira/browse/HDFS-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062559#comment-14062559 ] Sanjay Radia commented on HDFS-6469: Todd said: bq. a fully usable solution would be available to the community at large, whereas the design you're proposing seems like it will only be usably implemented by a proprietary extension (I don't consider the ZK reference implementation likely to actually work in a usable fashion). Konstanine I had mentioned exactly the above point to you at the Hadoop summit Europe. ZK is a coordination service and for this to be practical it needs to be an inline Paxos protocol. We had also discussed 2 potential paxos libraries that could come into open source: I believe Facebook has one that they may contribute and CMU has one called E-Paxos; if either of these become available then it addresses this particular issue. I have no objections to a customer going to Wandisco for the enterprise supported version, but if the community is going to maintain such an extension then there needs to a practical, in-production-usable free solution; sending offline messages to a coordinator service for each transaction is not usable. Lets discuss the performance part in a separate comment. Let me comment on your comparisons to the topology and windows examples that the community supported in the past: * Topology - these changes allowed Hadoop to be used on containers such as VMs. ** Both KVM and VirtualBox offer free VM solutions - the customer does not need to buy ESX. ** The topology solution would will also help with a Docker container deployment which is freely available and offers even better performance than VMs. ** Hadoop is commonly used in cloud environment (e.g. AWS, or Azure, or Altiscale) which all use VMs or containers ** Further, it was recognized that while, in the past, we had considered racks to be a failure zone, that there could be other failure zones: nodes (for the case of VMs or containers on a host) and also groups of machines. * Windows - this was done for platform support which is very different than what we are talking about here; many open source solutions support multiple platforms to enable the widest adoption. BTW Hadoop supported windows via cygwin but we made it first class since the initial support via cygwin was messy. Coordinated replication of the namespace using ConsensusNode Key: HDFS-6469 URL: https://issues.apache.org/jira/browse/HDFS-6469 Project: Hadoop HDFS Issue Type: New Feature Components: namenode Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Assignee: Konstantin Shvachko Attachments: CNodeDesign.pdf This is a proposal to introduce ConsensusNode - an evolution of the NameNode, which enables replication of the namespace on multiple nodes of an HDFS cluster by means of a Coordination Engine. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6560) Byte array native checksumming on DN side
[ https://issues.apache.org/jira/browse/HDFS-6560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HDFS-6560: -- Issue Type: Improvement (was: Sub-task) Parent: (was: HDFS-3528) Byte array native checksumming on DN side - Key: HDFS-6560 URL: https://issues.apache.org/jira/browse/HDFS-6560 Project: Hadoop HDFS Issue Type: Improvement Components: performance Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-3528.patch, HDFS-6560.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6560) Byte array native checksumming on DN side
[ https://issues.apache.org/jira/browse/HDFS-6560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062561#comment-14062561 ] Todd Lipcon commented on HDFS-6560: --- A few notes on the patch: {code} + sums_addr = (*env)-GetPrimitiveArrayCritical(env, j_sums, NULL); + data_addr = (*env)-GetPrimitiveArrayCritical(env, j_data, NULL); + + if (unlikely(!sums_addr || !data_addr)) { +(*env)-ReleasePrimitiveArrayCritical(env, j_data, data_addr, 0); +(*env)-ReleasePrimitiveArrayCritical(env, j_sums, sums_addr, 0); {code} Here it seems like you might call Release() on a NULL address. I can't tell from reading the spec whether that's safe or not, but maybe best to guard the Release calls and only release non-NULL addresses. {code} + ret = bulk_verify_crc(data, MIN(numChecksumsInMB * bytes_per_checksum, +data_len - checksumNum * bytes_per_checksum), sums, +crc_type, bytes_per_checksum, error_data); {code} style nit: given that the second line here is an argument to MIN, it should probably wrap more like: {code} ret = bulk_verify_crc(data, MIN(numChecksumsInMB * bytes_per_checksum, data_len - checksumNum * bytes_per_checksum), sums, crc_type, bytes_per_checksum, error_data); {code} or assign the MIN result to a temporary value like 'len' {code} +long pos = base_pos + (error_data.bad_data - data) + checksumNum * +bytes_per_checksum; {code} style: indentation is off a bit here (continuation line should indent) Also, I'll move this to the HADOOP project since it only affects code in common/ Byte array native checksumming on DN side - Key: HDFS-6560 URL: https://issues.apache.org/jira/browse/HDFS-6560 Project: Hadoop HDFS Issue Type: Sub-task Components: performance Reporter: James Thomas Assignee: James Thomas Attachments: HDFS-3528.patch, HDFS-6560.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6455) NFS: Exception should be added in NFS log for invalid separator in allowed.hosts
[ https://issues.apache.org/jira/browse/HDFS-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062574#comment-14062574 ] Brandon Li commented on HDFS-6455: -- Sorry for the late reply. The time out is possibly due to this code in the patch: {noformat} + if (hostsMatcher != null) { +hostsMatchers.add(hostsMatcher); +out = MountResponse.writeExportList(out, xid, exports, hostsMatchers); + } {noformat} If hostMatcher is null, it doesn't send response back. I would suggest fixing HDFS-6456 first since it will fix part of the problem. After patch to HDFS-6456 is committed, this problem will be easier to fix. NFS: Exception should be added in NFS log for invalid separator in allowed.hosts Key: HDFS-6455 URL: https://issues.apache.org/jira/browse/HDFS-6455 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.2.0 Reporter: Yesha Vora Attachments: HDFS-6455.patch The error for invalid separator in dfs.nfs.exports.allowed.hosts property should be added in nfs log file instead nfs.out file. Steps to reproduce: 1. Pass invalid separator in dfs.nfs.exports.allowed.hosts {noformat} propertynamedfs.nfs.exports.allowed.hosts/namevaluehost1 ro:host2 rw/value/property {noformat} 2. restart NFS server. NFS server fails to start and print exception console. {noformat} [hrt_qa@host1 hwqe]$ ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null host1 sudo su - -c \/usr/lib/hadoop/sbin/hadoop-daemon.sh start nfs3\ hdfs starting nfs3, logging to /tmp/log/hadoop/hdfs/hadoop-hdfs-nfs3-horst1.out DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Exception in thread main java.lang.IllegalArgumentException: Incorrectly formatted line 'host1 ro:host2 rw' at org.apache.hadoop.nfs.NfsExports.getMatch(NfsExports.java:356) at org.apache.hadoop.nfs.NfsExports.init(NfsExports.java:151) at org.apache.hadoop.nfs.NfsExports.getInstance(NfsExports.java:54) at org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.init(RpcProgramNfs3.java:176) at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.init(Nfs3.java:43) at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.main(Nfs3.java:59) {noformat} NFS log does not print any error message. It directly shuts down. {noformat} STARTUP_MSG: java = 1.6.0_31 / 2014-05-27 18:47:13,972 INFO nfs3.Nfs3Base (SignalLogger.java:register(91)) - registered UNIX signal handlers for [TERM, HUP, INT] 2014-05-27 18:47:14,169 INFO nfs3.IdUserGroup (IdUserGroup.java:updateMapInternal(159)) - Updated user map size:259 2014-05-27 18:47:14,179 INFO nfs3.IdUserGroup (IdUserGroup.java:updateMapInternal(159)) - Updated group map size:73 2014-05-27 18:47:14,192 INFO nfs3.Nfs3Base (StringUtils.java:run(640)) - SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down Nfs3 at {noformat} NFS.out file has exception. {noformat} EPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Exception in thread main java.lang.IllegalArgumentException: Incorrectly formatted line 'host1 ro:host2 rw' at org.apache.hadoop.nfs.NfsExports.getMatch(NfsExports.java:356) at org.apache.hadoop.nfs.NfsExports.init(NfsExports.java:151) at org.apache.hadoop.nfs.NfsExports.getInstance(NfsExports.java:54) at org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.init(RpcProgramNfs3.java:176) at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.init(Nfs3.java:43) at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.main(Nfs3.java:59) ulimit -a for user hdfs core file size (blocks, -c) 409600 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 188893 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 32768 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 65536 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6658) Namenode memory optimization - Block replicas list
[ https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062593#comment-14062593 ] Tsz Wo Nicholas Sze commented on HDFS-6658: --- Nice idea! For processing block report using BitSet, are the bits correspond to the block indices in DatanodeStorageInfo? I think it can be eliminated by overwriting length, say setting it to (- length -1). Set it back when computing the toRemove list. Namenode memory optimization - Block replicas list --- Key: HDFS-6658 URL: https://issues.apache.org/jira/browse/HDFS-6658 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.4.1 Reporter: Amir Langer Assignee: Amir Langer Attachments: Namenode Memory Optimizations - Block replicas list.docx Part of the memory consumed by every BlockInfo object in the Namenode is a linked list of block references for every DatanodeStorageInfo (called triplets). We propose to change the way we store the list in memory. Using primitive integer indexes instead of object references will reduce the memory needed for every block replica (when compressed oops is disabled) and in our new design the list overhead will be per DatanodeStorageInfo and not per block replica. see attached design doc. for details and evaluation results. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6456) NFS should throw error for invalid entry in dfs.nfs.exports.allowed.hosts
[ https://issues.apache.org/jira/browse/HDFS-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-6456: - Summary: NFS should throw error for invalid entry in dfs.nfs.exports.allowed.hosts (was: NFS: NFS server should throw error for invalid entry in dfs.nfs.exports.allowed.hosts) NFS should throw error for invalid entry in dfs.nfs.exports.allowed.hosts - Key: HDFS-6456 URL: https://issues.apache.org/jira/browse/HDFS-6456 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.2.0 Reporter: Yesha Vora Assignee: Abhiraj Butala Attachments: HDFS-6456.patch Pass invalid entry in dfs.nfs.exports.allowed.hosts. Use - as separator between hostname and access permission {noformat} propertynamedfs.nfs.exports.allowed.hosts/namevaluehost1-rw/value/property {noformat} This misconfiguration is not detected by NFS server. It does not print any error message. The host passed in this configuration is also not able to mount nfs. In conclusion, no node can mount the nfs with this value. A format check is required for this property. If the value of this property does not follow the format, an error should be thrown. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6456) NFS: NFS server should throw error for invalid entry in dfs.nfs.exports.allowed.hosts
[ https://issues.apache.org/jira/browse/HDFS-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062600#comment-14062600 ] Brandon Li commented on HDFS-6456: -- +1. Patch looks good to me. NFS: NFS server should throw error for invalid entry in dfs.nfs.exports.allowed.hosts - Key: HDFS-6456 URL: https://issues.apache.org/jira/browse/HDFS-6456 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.2.0 Reporter: Yesha Vora Assignee: Abhiraj Butala Attachments: HDFS-6456.patch Pass invalid entry in dfs.nfs.exports.allowed.hosts. Use - as separator between hostname and access permission {noformat} propertynamedfs.nfs.exports.allowed.hosts/namevaluehost1-rw/value/property {noformat} This misconfiguration is not detected by NFS server. It does not print any error message. The host passed in this configuration is also not able to mount nfs. In conclusion, no node can mount the nfs with this value. A format check is required for this property. If the value of this property does not follow the format, an error should be thrown. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6456) NFS should throw error for invalid entry in dfs.nfs.exports.allowed.hosts
[ https://issues.apache.org/jira/browse/HDFS-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-6456: - Fix Version/s: 2.6.0 NFS should throw error for invalid entry in dfs.nfs.exports.allowed.hosts - Key: HDFS-6456 URL: https://issues.apache.org/jira/browse/HDFS-6456 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.2.0 Reporter: Yesha Vora Assignee: Abhiraj Butala Fix For: 2.6.0 Attachments: HDFS-6456.patch Pass invalid entry in dfs.nfs.exports.allowed.hosts. Use - as separator between hostname and access permission {noformat} propertynamedfs.nfs.exports.allowed.hosts/namevaluehost1-rw/value/property {noformat} This misconfiguration is not detected by NFS server. It does not print any error message. The host passed in this configuration is also not able to mount nfs. In conclusion, no node can mount the nfs with this value. A format check is required for this property. If the value of this property does not follow the format, an error should be thrown. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6456) NFS should throw error for invalid entry in dfs.nfs.exports.allowed.hosts
[ https://issues.apache.org/jira/browse/HDFS-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-6456: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) NFS should throw error for invalid entry in dfs.nfs.exports.allowed.hosts - Key: HDFS-6456 URL: https://issues.apache.org/jira/browse/HDFS-6456 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.2.0 Reporter: Yesha Vora Assignee: Abhiraj Butala Fix For: 2.6.0 Attachments: HDFS-6456.patch Pass invalid entry in dfs.nfs.exports.allowed.hosts. Use - as separator between hostname and access permission {noformat} propertynamedfs.nfs.exports.allowed.hosts/namevaluehost1-rw/value/property {noformat} This misconfiguration is not detected by NFS server. It does not print any error message. The host passed in this configuration is also not able to mount nfs. In conclusion, no node can mount the nfs with this value. A format check is required for this property. If the value of this property does not follow the format, an error should be thrown. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6456) NFS should throw error for invalid entry in dfs.nfs.exports.allowed.hosts
[ https://issues.apache.org/jira/browse/HDFS-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062622#comment-14062622 ] Hudson commented on HDFS-6456: -- FAILURE: Integrated in Hadoop-trunk-Commit #5886 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/5886/]) HDFS-6456. NFS should throw error for invalid entry in dfs.nfs.exports.allowed.hosts. Contributed by Abhiraj Butala (brandonli: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1610840) * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/NfsExports.java * /hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/test/java/org/apache/hadoop/nfs/TestNfsExports.java * /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt NFS should throw error for invalid entry in dfs.nfs.exports.allowed.hosts - Key: HDFS-6456 URL: https://issues.apache.org/jira/browse/HDFS-6456 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.2.0 Reporter: Yesha Vora Assignee: Abhiraj Butala Fix For: 2.6.0 Attachments: HDFS-6456.patch Pass invalid entry in dfs.nfs.exports.allowed.hosts. Use - as separator between hostname and access permission {noformat} propertynamedfs.nfs.exports.allowed.hosts/namevaluehost1-rw/value/property {noformat} This misconfiguration is not detected by NFS server. It does not print any error message. The host passed in this configuration is also not able to mount nfs. In conclusion, no node can mount the nfs with this value. A format check is required for this property. If the value of this property does not follow the format, an error should be thrown. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6455) NFS: Exception should be added in NFS log for invalid separator in allowed.hosts
[ https://issues.apache.org/jira/browse/HDFS-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062621#comment-14062621 ] Brandon Li commented on HDFS-6455: -- [~abutala], HDFS-6456 has been fixed. Please rebased the current patch. I think it should be a smaller change now. :-) NFS: Exception should be added in NFS log for invalid separator in allowed.hosts Key: HDFS-6455 URL: https://issues.apache.org/jira/browse/HDFS-6455 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.2.0 Reporter: Yesha Vora Attachments: HDFS-6455.patch The error for invalid separator in dfs.nfs.exports.allowed.hosts property should be added in nfs log file instead nfs.out file. Steps to reproduce: 1. Pass invalid separator in dfs.nfs.exports.allowed.hosts {noformat} propertynamedfs.nfs.exports.allowed.hosts/namevaluehost1 ro:host2 rw/value/property {noformat} 2. restart NFS server. NFS server fails to start and print exception console. {noformat} [hrt_qa@host1 hwqe]$ ssh -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null host1 sudo su - -c \/usr/lib/hadoop/sbin/hadoop-daemon.sh start nfs3\ hdfs starting nfs3, logging to /tmp/log/hadoop/hdfs/hadoop-hdfs-nfs3-horst1.out DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Exception in thread main java.lang.IllegalArgumentException: Incorrectly formatted line 'host1 ro:host2 rw' at org.apache.hadoop.nfs.NfsExports.getMatch(NfsExports.java:356) at org.apache.hadoop.nfs.NfsExports.init(NfsExports.java:151) at org.apache.hadoop.nfs.NfsExports.getInstance(NfsExports.java:54) at org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.init(RpcProgramNfs3.java:176) at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.init(Nfs3.java:43) at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.main(Nfs3.java:59) {noformat} NFS log does not print any error message. It directly shuts down. {noformat} STARTUP_MSG: java = 1.6.0_31 / 2014-05-27 18:47:13,972 INFO nfs3.Nfs3Base (SignalLogger.java:register(91)) - registered UNIX signal handlers for [TERM, HUP, INT] 2014-05-27 18:47:14,169 INFO nfs3.IdUserGroup (IdUserGroup.java:updateMapInternal(159)) - Updated user map size:259 2014-05-27 18:47:14,179 INFO nfs3.IdUserGroup (IdUserGroup.java:updateMapInternal(159)) - Updated group map size:73 2014-05-27 18:47:14,192 INFO nfs3.Nfs3Base (StringUtils.java:run(640)) - SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down Nfs3 at {noformat} NFS.out file has exception. {noformat} EPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Exception in thread main java.lang.IllegalArgumentException: Incorrectly formatted line 'host1 ro:host2 rw' at org.apache.hadoop.nfs.NfsExports.getMatch(NfsExports.java:356) at org.apache.hadoop.nfs.NfsExports.init(NfsExports.java:151) at org.apache.hadoop.nfs.NfsExports.getInstance(NfsExports.java:54) at org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.init(RpcProgramNfs3.java:176) at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.init(Nfs3.java:43) at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.main(Nfs3.java:59) ulimit -a for user hdfs core file size (blocks, -c) 409600 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 188893 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 32768 pipe size(512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 65536 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Reopened] (HDFS-5464) Simplify block report diff calculation
[ https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze reopened HDFS-5464: --- Simplify block report diff calculation -- Key: HDFS-5464 URL: https://issues.apache.org/jira/browse/HDFS-5464 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h5464_20131105.patch, h5464_20131105b.patch, h5464_20131105c.patch The current calculation in BlockManager.reportDiff(..) is unnecessarily complicated. We could simplify the calculation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6509) distcp vs Data At Rest Encryption
[ https://issues.apache.org/jira/browse/HDFS-6509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated HDFS-6509: --- Attachment: HDFS-6509distcpandDataatRestEncryption-2.pdf I've made some revisions to the doc: . Fixed some typos . Added an alternative proposal made by [~andrew.wang] to have a raw.* extended attribute namespace. . Made the wording about the raw namespace only being accessible by the HDFS super user. distcp vs Data At Rest Encryption - Key: HDFS-6509 URL: https://issues.apache.org/jira/browse/HDFS-6509 Project: Hadoop HDFS Issue Type: Sub-task Components: security Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Charles Lamb Assignee: Charles Lamb Attachments: HDFS-6509distcpandDataatRestEncryption-2.pdf, HDFS-6509distcpandDataatRestEncryption.pdf distcp needs to work with Data At Rest Encryption -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HDFS-6619) Clean up encryption-related tests
[ https://issues.apache.org/jira/browse/HDFS-6619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang resolved HDFS-6619. --- Resolution: Fixed Fix Version/s: fs-encryption (HADOOP-10150 and HDFS-6134) Thanks for the reviews guys, committed to fs-encryption branch. Clean up encryption-related tests - Key: HDFS-6619 URL: https://issues.apache.org/jira/browse/HDFS-6619 Project: Hadoop HDFS Issue Type: Sub-task Components: test Reporter: Andrew Wang Assignee: Andrew Wang Priority: Minor Fix For: fs-encryption (HADOOP-10150 and HDFS-6134) Attachments: hdfs-6619.001.patch Would be good to clean up TestHDFSEncryption and TestEncryptionZonesAPI. These tests could be renamed, test timeouts added/adjusted, reduced number of minicluster start/stops, whitespace, etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6688) Hadoop JMX stats are not refreshed
[ https://issues.apache.org/jira/browse/HDFS-6688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062640#comment-14062640 ] Andrew Wang commented on HDFS-6688: --- Hi Biju, did you wait 10.5 minutes for the default dead nodes timeout before checking these stats? Did you also compare the JMX stats with the display on the webui? I'd expect these to be the same, but we have had some issues here in the past. Hadoop JMX stats are not refreshed -- Key: HDFS-6688 URL: https://issues.apache.org/jira/browse/HDFS-6688 Project: Hadoop HDFS Issue Type: Bug Environment: Ubuntu Reporter: Biju Nair Even when the HDFS datanode process is stopped the JMX attribute Hadoop.NameNode.FSNamesystemState.NumLiveDataNodes/NumDeadDataNodes attribute values doesn't change. Also Hadoop.NameNode.NameNodeInfo.Attributes.LiveNodes shows the stopped datanode details. If these attributes reflect the actual changes in the datanode, they can be used to monitor the health of the HDFS cluster which currently can't be used. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6658) Namenode memory optimization - Block replicas list
[ https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062648#comment-14062648 ] Tsz Wo Nicholas Sze commented on HDFS-6658: --- Let me try it in HDFS-5464. Namenode memory optimization - Block replicas list --- Key: HDFS-6658 URL: https://issues.apache.org/jira/browse/HDFS-6658 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.4.1 Reporter: Amir Langer Assignee: Amir Langer Attachments: Namenode Memory Optimizations - Block replicas list.docx Part of the memory consumed by every BlockInfo object in the Namenode is a linked list of block references for every DatanodeStorageInfo (called triplets). We propose to change the way we store the list in memory. Using primitive integer indexes instead of object references will reduce the memory needed for every block replica (when compressed oops is disabled) and in our new design the list overhead will be per DatanodeStorageInfo and not per block replica. see attached design doc. for details and evaluation results. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5464) Simplify block report diff calculation
[ https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-5464: -- Attachment: h5464_20140715.patch Some new idea inspired by HDFS-6658. h5464_20140715.patch: marks visited blocks by setting length n to (-n-1). Simplify block report diff calculation -- Key: HDFS-5464 URL: https://issues.apache.org/jira/browse/HDFS-5464 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h5464_20131105.patch, h5464_20131105b.patch, h5464_20131105c.patch, h5464_20140715.patch The current calculation in BlockManager.reportDiff(..) is unnecessarily complicated. We could simplify the calculation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-5464) Simplify block report diff calculation
[ https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz Wo Nicholas Sze updated HDFS-5464: -- Status: Patch Available (was: Reopened) Simplify block report diff calculation -- Key: HDFS-5464 URL: https://issues.apache.org/jira/browse/HDFS-5464 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Tsz Wo Nicholas Sze Priority: Minor Attachments: h5464_20131105.patch, h5464_20131105b.patch, h5464_20131105c.patch, h5464_20140715.patch The current calculation in BlockManager.reportDiff(..) is unnecessarily complicated. We could simplify the calculation. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6689) NFS: can't access file under directory with 711 access right as other user
Yesha Vora created HDFS-6689: Summary: NFS: can't access file under directory with 711 access right as other user Key: HDFS-6689 URL: https://issues.apache.org/jira/browse/HDFS-6689 Project: Hadoop HDFS Issue Type: Bug Reporter: Yesha Vora NFS does not allow other user to access a file with 644 permission and a parent with 711 access right. Steps to reproduce: 1. Create a directory /user/userX with 711 permissions 2. Upload a file at /user/userX/TestFile with 644 as userX 3. Try to access WriteTest as userY. HDFS will allow to read TestFile. {noformat} bash-4.1$ id uid=661(userY) gid=100(users) groups=100(users),13016(groupY) bash-4.1$ hdfs dfs -cat /user/userX/TestFile create a file with some content {noformat} NFS will not allow to read TestFile. {noformat} bash-4.1$ cat /tmp/tmp_mnt/user/userX/TestFile cat: /tmp/tmp_mnt/user/userX/TestFile: Permission denied {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6689) NFS: can't access file under directory with 711 access right as other user
[ https://issues.apache.org/jira/browse/HDFS-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-6689: - Affects Version/s: 2.2.0 NFS: can't access file under directory with 711 access right as other user -- Key: HDFS-6689 URL: https://issues.apache.org/jira/browse/HDFS-6689 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.2.0 Reporter: Yesha Vora NFS does not allow other user to access a file with 644 permission and a parent with 711 access right. Steps to reproduce: 1. Create a directory /user/userX with 711 permissions 2. Upload a file at /user/userX/TestFile with 644 as userX 3. Try to access WriteTest as userY. HDFS will allow to read TestFile. {noformat} bash-4.1$ id uid=661(userY) gid=100(users) groups=100(users),13016(groupY) bash-4.1$ hdfs dfs -cat /user/userX/TestFile create a file with some content {noformat} NFS will not allow to read TestFile. {noformat} bash-4.1$ cat /tmp/tmp_mnt/user/userX/TestFile cat: /tmp/tmp_mnt/user/userX/TestFile: Permission denied {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6689) NFS: can't access file under directory with 711 access right as other user
[ https://issues.apache.org/jira/browse/HDFS-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-6689: - Labels: (was: nfs) NFS: can't access file under directory with 711 access right as other user -- Key: HDFS-6689 URL: https://issues.apache.org/jira/browse/HDFS-6689 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.2.0 Reporter: Yesha Vora NFS does not allow other user to access a file with 644 permission and a parent with 711 access right. Steps to reproduce: 1. Create a directory /user/userX with 711 permissions 2. Upload a file at /user/userX/TestFile with 644 as userX 3. Try to access WriteTest as userY. HDFS will allow to read TestFile. {noformat} bash-4.1$ id uid=661(userY) gid=100(users) groups=100(users),13016(groupY) bash-4.1$ hdfs dfs -cat /user/userX/TestFile create a file with some content {noformat} NFS will not allow to read TestFile. {noformat} bash-4.1$ cat /tmp/tmp_mnt/user/userX/TestFile cat: /tmp/tmp_mnt/user/userX/TestFile: Permission denied {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6689) NFS: can't access file under directory with 711 access right as other user
[ https://issues.apache.org/jira/browse/HDFS-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Li updated HDFS-6689: - Component/s: nfs NFS: can't access file under directory with 711 access right as other user -- Key: HDFS-6689 URL: https://issues.apache.org/jira/browse/HDFS-6689 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.2.0 Reporter: Yesha Vora NFS does not allow other user to access a file with 644 permission and a parent with 711 access right. Steps to reproduce: 1. Create a directory /user/userX with 711 permissions 2. Upload a file at /user/userX/TestFile with 644 as userX 3. Try to access WriteTest as userY. HDFS will allow to read TestFile. {noformat} bash-4.1$ id uid=661(userY) gid=100(users) groups=100(users),13016(groupY) bash-4.1$ hdfs dfs -cat /user/userX/TestFile create a file with some content {noformat} NFS will not allow to read TestFile. {noformat} bash-4.1$ cat /tmp/tmp_mnt/user/userX/TestFile cat: /tmp/tmp_mnt/user/userX/TestFile: Permission denied {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6422) getfattr in CLI doesn't throw exception or return non-0 return code when xattr doesn't exist
[ https://issues.apache.org/jira/browse/HDFS-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb updated HDFS-6422: --- Attachment: HDFS-6474.4.patch This patch is not completely done yet. I am submitting it to see what the jenkins run looks like so please don't review it yet. getfattr in CLI doesn't throw exception or return non-0 return code when xattr doesn't exist Key: HDFS-6422 URL: https://issues.apache.org/jira/browse/HDFS-6422 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 2.5.0 Reporter: Charles Lamb Assignee: Charles Lamb Attachments: HDFS-6422.1.patch, HDFS-6422.2.patch, HDFS-6422.3.patch, HDFS-6474.4.patch If you do hdfs dfs -getfattr -n user.blah /foo and user.blah doesn't exist, the command prints # file: /foo and a 0 return code. It should print an exception and return a non-0 return code instead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6690) Deduplicate xattr names in memory
Andrew Wang created HDFS-6690: - Summary: Deduplicate xattr names in memory Key: HDFS-6690 URL: https://issues.apache.org/jira/browse/HDFS-6690 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.5.0 Reporter: Andrew Wang Assignee: Andrew Wang When the same string is used repeatedly for an xattr name, we could potentially save some NN memory by deduplicating the strings. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6588) Investigating removing getTrueCause method in Server.java
[ https://issues.apache.org/jira/browse/HDFS-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062668#comment-14062668 ] Hadoop QA commented on HDFS-6588: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12655819/HDFS-6588.001.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs: org.apache.hadoop.fs.TestSymlinkLocalFSFileSystem org.apache.hadoop.fs.TestSymlinkLocalFSFileContext org.apache.hadoop.ipc.TestIPC org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HDFS-Build/7350//testReport/ Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7350//console This message is automatically generated. Investigating removing getTrueCause method in Server.java - Key: HDFS-6588 URL: https://issues.apache.org/jira/browse/HDFS-6588 Project: Hadoop HDFS Issue Type: Bug Components: security, webhdfs Affects Versions: 2.5.0 Reporter: Yongjun Zhang Assignee: Yongjun Zhang Attachments: HDFS-6588.001.patch, HDFS-6588.001.patch When addressing Daryn Sharp's comment for HDFS-6475 quoted below: {quote} What I'm saying is I think the patch adds too much unnecessary code. Filing an improvement to delete all but a few lines of the code changed in this patch seems a bit odd. I think you just need to: - Delete getTrueCause entirely instead of moving it elsewhere - In saslProcess, just throw the exception instead of running it through getTrueCause since it's not a InvalidToken wrapping another exception anymore. - Keep your 3-line change to unwrap SecurityException in toResponse {quote} There are multiple test failures, after making the suggested changes, Filing this jira to dedicate to the investigation of removing getTrueCause method. More detail will be put in the first comment. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6689) NFS: can't access file under directory with 711 access right as other user
[ https://issues.apache.org/jira/browse/HDFS-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062670#comment-14062670 ] Brandon Li commented on HDFS-6689: -- This is due to a bug in Nfs3Utils#getAccessRights(), which doesn't give execution permission to directories. {noformat} if (isSet(mode, Nfs3Constant.ACCESS_MODE_EXECUTE)) { if (type == NfsFileType.NFSREG.toValue()) { rtn |= Nfs3Constant.ACCESS3_EXECUTE; } } {noformat} NFS: can't access file under directory with 711 access right as other user -- Key: HDFS-6689 URL: https://issues.apache.org/jira/browse/HDFS-6689 Project: Hadoop HDFS Issue Type: Bug Components: nfs Affects Versions: 2.2.0 Reporter: Yesha Vora NFS does not allow other user to access a file with 644 permission and a parent with 711 access right. Steps to reproduce: 1. Create a directory /user/userX with 711 permissions 2. Upload a file at /user/userX/TestFile with 644 as userX 3. Try to access WriteTest as userY. HDFS will allow to read TestFile. {noformat} bash-4.1$ id uid=661(userY) gid=100(users) groups=100(users),13016(groupY) bash-4.1$ hdfs dfs -cat /user/userX/TestFile create a file with some content {noformat} NFS will not allow to read TestFile. {noformat} bash-4.1$ cat /tmp/tmp_mnt/user/userX/TestFile cat: /tmp/tmp_mnt/user/userX/TestFile: Permission denied {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HDFS-6597) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed
[ https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated HDFS-6597: Assignee: Danilo Vunjak Add a new option to NN upgrade to terminate the process after upgrade on NN is completed Key: HDFS-6597 URL: https://issues.apache.org/jira/browse/HDFS-6597 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Danilo Vunjak Assignee: Danilo Vunjak Attachments: HDFS-6597.04.patch, JIRA-HDFS-30.patch, JIRA-HDFS-6597.02.patch, JIRA-HDFS-6597.03.patch, JIRA-HDFS-6597.patch Currently when namenode is started for upgrade (hadoop namenode -upgrade command), after finishing upgrade of metadata, namenode starts working normally and wait for datanodes to upgrade itself and connect to to NN. We need to have option for upgrading only NN metadata, so after upgrade is finished on NN, process should terminate. I have tested it by changing in file: hdfs.server.namenode.NameNode.java, method: public static NameNode createNameNode(String argv[], Configuration conf): in switch added case UPGRADE: case UPGRADE: { DefaultMetricsSystem.initialize(NameNode); NameNode nameNode = new NameNode(conf); if (startOpt.getForceUpgrade()) { terminate(0); return null; } return nameNode; } This did upgrade of metadata, closed process after finished, and later when all services were started, upgrade of datanodes finished sucessfully and system run . What I'm suggesting right now is to add new startup parameter -force, so namenode can be started like this hadoop namenode -upgrade -force, so we can indicate that we want to terminate process after upgrade metadata on NN is finished. Old functionality should be preserved, so users can run hadoop namenode -upgrade on same way and with same behaviour as it was previous. Thanks, Danilo -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6597) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed
[ https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062687#comment-14062687 ] Chris Nauroth commented on HDFS-6597: - Hi, [~dvunjak]. This mostly looks good. I have 2 comments: # I built the distro after adding a bogus new feature to bump the version to -58 in {{NameNodeLayoutVersion}}. I tried running -upgradeOnly, but it didn't actually upgrade the metadata files. It looks like you'll need another change in {{FSImage#recoverTransitionRead}}. There is a switch statement that looks for the {{UPGRADE}} option, but not the new {{UPGRADEONLY}} option. # The new test suite is a copy of the existing {{TestStartupOptionUpgrade}} with the option changed to {{UPGRADEONLY}}. Instead of cloning the code, this looks like a good opportunity for a JUnit {{Parameterized}} test. See {{TestNameNodeHttpServer}} for an existing example of a {{Parameterized}} test. I think you can make a fairly small change in the existing {{TestStartupOptionUpgrade}} so that it's parameterized to run on both options: {{UPGRADE}} and {{UPGRADEONLY}}. Add a new option to NN upgrade to terminate the process after upgrade on NN is completed Key: HDFS-6597 URL: https://issues.apache.org/jira/browse/HDFS-6597 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Danilo Vunjak Assignee: Danilo Vunjak Attachments: HDFS-6597.04.patch, JIRA-HDFS-30.patch, JIRA-HDFS-6597.02.patch, JIRA-HDFS-6597.03.patch, JIRA-HDFS-6597.patch Currently when namenode is started for upgrade (hadoop namenode -upgrade command), after finishing upgrade of metadata, namenode starts working normally and wait for datanodes to upgrade itself and connect to to NN. We need to have option for upgrading only NN metadata, so after upgrade is finished on NN, process should terminate. I have tested it by changing in file: hdfs.server.namenode.NameNode.java, method: public static NameNode createNameNode(String argv[], Configuration conf): in switch added case UPGRADE: case UPGRADE: { DefaultMetricsSystem.initialize(NameNode); NameNode nameNode = new NameNode(conf); if (startOpt.getForceUpgrade()) { terminate(0); return null; } return nameNode; } This did upgrade of metadata, closed process after finished, and later when all services were started, upgrade of datanodes finished sucessfully and system run . What I'm suggesting right now is to add new startup parameter -force, so namenode can be started like this hadoop namenode -upgrade -force, so we can indicate that we want to terminate process after upgrade metadata on NN is finished. Old functionality should be preserved, so users can run hadoop namenode -upgrade on same way and with same behaviour as it was previous. Thanks, Danilo -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-6691) The message on NN UI can be confusing during a rolling upgrade
Mit Desai created HDFS-6691: --- Summary: The message on NN UI can be confusing during a rolling upgrade Key: HDFS-6691 URL: https://issues.apache.org/jira/browse/HDFS-6691 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.5.0 Reporter: Mit Desai Assignee: Mit Desai Attachments: ha1.png On ANN, it says rollback image was created. On SBN, it says otherwise. -- This message was sent by Atlassian JIRA (v6.2#6252)