[jira] [Updated] (HDFS-6681) TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN is flaky and sometimes gets stuck in infinite loops

2014-07-15 Thread Ratandeep Ratti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ratandeep Ratti updated HDFS-6681:
--

Attachment: HDFS-6681.patch

 TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN is 
 flaky and sometimes gets stuck in infinite loops
 --

 Key: HDFS-6681
 URL: https://issues.apache.org/jira/browse/HDFS-6681
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.1
 Environment: Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
 Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)
 Linux [hostname] 2.6.32-279.14.1.el6.x86_64 #1 SMP Mon Oct 15 13:44:51 EDT 
 2012 x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ratandeep Ratti
 Attachments: HDFS-6681.patch


 This testcase has 3 infinite loops which break only on certain conditions 
 being satisfied.
 1st loop checks if there should be a single live replica. It assumes this to 
 be true since it has just corrupted a block on one of the datanodes (testcase 
 has replication factor as 2). One scenario in which this loop will never 
 break is if the Namenode invalidates the corrupt replica, schedules a 
 replication command, and the new copied replica is added all before this 
 testcase has the chance to check the live-replica count.
 2nd loop checks there should be 2 live replicas. It assumes this to be true 
 (in some time) since the first loop has broken implying there is a single 
 replica and now it is only a matter of time when the Namenode schedules a 
 replication command to copy a replica to another datanode. One scenario in 
 which this loop will never break is when the Namenode tries to schedule a new 
 replica on the same node on which we actually corrupted the block. That dst. 
 datanode will not copy the block, complaining that it already has the 
 (corrupted) replica in the create state. The situation that results is that 
 Namenode has scheduled a copy to a datanode, the block is now in the 
 namenode's pending replication queue, this block will never be removed from 
 the pending replication queue because the namenode will never receive a 
 report from the datanodes that the block is 'added'.
 Note: The block can be transferred from the 'pending replication' to needed 
 replication queue once the pending timeout (5 minutes) expires. The Namenode 
 then actively tries to schedule a replication for blocks in 'needed 
 replication' queue. This can cause the 2nd loop to break but the time in 
 which this process gets kicked in is more than 5 minutes.
 3rd loop: This loops checks if there are no corrupt replicas. I don't see a 
 scenario in which this loop can go on for ever, since once the live replica 
 count goes back to normal (2), the corrupted block will be removed
 I guess increasing the heart beat interval time, so that the testcase has 
 enough time to check condition in loop 1 before a datanode reports a 
 successful copy should help avoid race condition in loop1. Regarding loop2 I 
 guess we can reduce the timeout after which the block is transferred from the 
 pending replication to the needed replication queue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6681) TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN is flaky and sometimes gets stuck in infinite loops

2014-07-15 Thread Ratandeep Ratti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ratandeep Ratti updated HDFS-6681:
--

Status: Patch Available  (was: Open)

 TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN is 
 flaky and sometimes gets stuck in infinite loops
 --

 Key: HDFS-6681
 URL: https://issues.apache.org/jira/browse/HDFS-6681
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.1
 Environment: Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
 Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)
 Linux [hostname] 2.6.32-279.14.1.el6.x86_64 #1 SMP Mon Oct 15 13:44:51 EDT 
 2012 x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ratandeep Ratti
 Attachments: HDFS-6681.patch


 This testcase has 3 infinite loops which break only on certain conditions 
 being satisfied.
 1st loop checks if there should be a single live replica. It assumes this to 
 be true since it has just corrupted a block on one of the datanodes (testcase 
 has replication factor as 2). One scenario in which this loop will never 
 break is if the Namenode invalidates the corrupt replica, schedules a 
 replication command, and the new copied replica is added all before this 
 testcase has the chance to check the live-replica count.
 2nd loop checks there should be 2 live replicas. It assumes this to be true 
 (in some time) since the first loop has broken implying there is a single 
 replica and now it is only a matter of time when the Namenode schedules a 
 replication command to copy a replica to another datanode. One scenario in 
 which this loop will never break is when the Namenode tries to schedule a new 
 replica on the same node on which we actually corrupted the block. That dst. 
 datanode will not copy the block, complaining that it already has the 
 (corrupted) replica in the create state. The situation that results is that 
 Namenode has scheduled a copy to a datanode, the block is now in the 
 namenode's pending replication queue, this block will never be removed from 
 the pending replication queue because the namenode will never receive a 
 report from the datanodes that the block is 'added'.
 Note: The block can be transferred from the 'pending replication' to needed 
 replication queue once the pending timeout (5 minutes) expires. The Namenode 
 then actively tries to schedule a replication for blocks in 'needed 
 replication' queue. This can cause the 2nd loop to break but the time in 
 which this process gets kicked in is more than 5 minutes.
 3rd loop: This loops checks if there are no corrupt replicas. I don't see a 
 scenario in which this loop can go on for ever, since once the live replica 
 count goes back to normal (2), the corrupted block will be removed
 I guess increasing the heart beat interval time, so that the testcase has 
 enough time to check condition in loop 1 before a datanode reports a 
 successful copy should help avoid race condition in loop1. Regarding loop2 I 
 guess we can reduce the timeout after which the block is transferred from the 
 pending replication to the needed replication queue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6641) [ HDFS- File Concat ] Concat will fail when target file is having one block which is not full

2014-07-15 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula updated HDFS-6641:
---

Summary: [ HDFS- File Concat ] Concat will fail when target file is having 
one block which is not full   (was: [ HDFS- File Concat ] Concat will fail when 
Src/target file is having one block which is not full )

 [ HDFS- File Concat ] Concat will fail when target file is having one block 
 which is not full 
 --

 Key: HDFS-6641
 URL: https://issues.apache.org/jira/browse/HDFS-6641
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.4.1
Reporter: Brahma Reddy Battula

 sually we can't ensure lastblock alwaysfull...please let me know purpose of 
 following check..
 long blockSize = trgInode.getPreferredBlockSize();
 // check the end block to be full
 final BlockInfo last = trgInode.getLastBlock();
 if(blockSize != last.getNumBytes()) {
   throw new HadoopIllegalArgumentException(The last block in  + target
   +  is not full; last block size =  + last.getNumBytes()
   +  but file block size =  + blockSize);
 }
 If it is issue, I'll file jira.
 Following is the trace..
 exception in thread main 
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.HadoopIllegalArgumentException):
  The last block in /Test.txt is not full; last block size = 14 but file block 
 size = 134217728
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.concatInternal(FSNamesystem.java:1887)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.concatInt(FSNamesystem.java:1833)
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.concat(FSNamesystem.java:1795)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.concat(NameNodeRpcServer.java:704)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.concat(ClientNamenodeProtocolServerSideTranslatorPB.java:512)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6667) In HDFS HA mode, Distcp/SLive with webhdfs on secure cluster fails with Client cannot authenticate via:[TOKEN, KERBEROS] error

2014-07-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061738#comment-14061738
 ] 

Hadoop QA commented on HDFS-6667:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12655621/HDFS-6667.000.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  
org.apache.hadoop.hdfs.server.namenode.TestProcessCorruptBlocks

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7344//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7344//console

This message is automatically generated.

 In HDFS HA mode, Distcp/SLive with webhdfs on secure cluster fails with 
 Client cannot authenticate via:[TOKEN, KERBEROS] error
 --

 Key: HDFS-6667
 URL: https://issues.apache.org/jira/browse/HDFS-6667
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Reporter: Jian He
Assignee: Jing Zhao
 Attachments: HDFS-6667.000.patch


 Opening on [~arpitgupta]'s behalf.
 We observed that, in HDFS HA mode, running Distcp/SLive with webhdfs will 
 fail on YARN.  In non-HA mode, it'll pass. 
 The reason is in HA mode, only webhdfs delegation token is generated for the 
 job, but YARN also requires the regular hdfs token to do localization, 
 log-aggregation etc.
 In non-HA mode, both tokens are generated for the job.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6619) Clean up encryption-related tests

2014-07-15 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-6619:
--

Attachment: hdfs-6619.001.patch

Patch attached. High level sketch of changes:

- TestHDFSEncryption wasn't testing anything beyond normal stream operations, 
which are already tested in a number of other HDFS tests. I removed this file 
entirely.
- Renamed TestEncryptionZonesAPI to TestEncryptionZones
- The FileContext test extending TestEncryptionZonesAPI was running all the 
inherited tests again, when all it wanted was to run that one rename test. I 
folded that one test into TestEncryptionZones.
- I combined a bunch of small test cases into a single test case to save on 
minicluster invocations.

I'd like to see us extend some of the existing stream tests to operate on 
encryption zones to capture the intent of TestHdfsEncryption, but let's do that 
in a different JIRA. There are probably also some more tests that could be 
written for HDFS-6474 as well.

 Clean up encryption-related tests
 -

 Key: HDFS-6619
 URL: https://issues.apache.org/jira/browse/HDFS-6619
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Reporter: Andrew Wang
Assignee: Andrew Wang
Priority: Minor
 Attachments: hdfs-6619.001.patch


 Would be good to clean up TestHDFSEncryption and TestEncryptionZonesAPI. 
 These tests could be renamed, test timeouts added/adjusted, reduced number of 
 minicluster start/stops, whitespace, etc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Work started] (HDFS-6619) Clean up encryption-related tests

2014-07-15 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-6619 started by Andrew Wang.

 Clean up encryption-related tests
 -

 Key: HDFS-6619
 URL: https://issues.apache.org/jira/browse/HDFS-6619
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Reporter: Andrew Wang
Assignee: Andrew Wang
Priority: Minor
 Attachments: hdfs-6619.001.patch


 Would be good to clean up TestHDFSEncryption and TestEncryptionZonesAPI. 
 These tests could be renamed, test timeouts added/adjusted, reduced number of 
 minicluster start/stops, whitespace, etc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6509) distcp vs Data At Rest Encryption

2014-07-15 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated HDFS-6509:
--

Affects Version/s: fs-encryption (HADOOP-10150 and HDFS-6134)

 distcp vs Data At Rest Encryption
 -

 Key: HDFS-6509
 URL: https://issues.apache.org/jira/browse/HDFS-6509
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: security
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-6509distcpandDataatRestEncryption.pdf


 distcp needs to work with Data At Rest Encryption



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6134) Transparent data at rest encryption

2014-07-15 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061756#comment-14061756
 ] 

Andrew Wang commented on HDFS-6134:
---

Charles posted a design doc for how distcp will work with encryption at 
HDFS-6509. [~sanjay.radia] and [~owen.omalley], I think this is essentially the 
raw directory discussed earlier, but it'd be appreciated if you gave it a 
once over. Thanks!

 Transparent data at rest encryption
 ---

 Key: HDFS-6134
 URL: https://issues.apache.org/jira/browse/HDFS-6134
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: security
Affects Versions: 2.3.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur
 Attachments: HDFSDataatRestEncryptionProposal_obsolete.pdf, 
 HDFSEncryptionConceptualDesignProposal-2014-06-20.pdf


 Because of privacy and security regulations, for many industries, sensitive 
 data at rest must be in encrypted form. For example: the health­care industry 
 (HIPAA regulations), the card payment industry (PCI DSS regulations) or the 
 US government (FISMA regulations).
 This JIRA aims to provide a mechanism to encrypt HDFS data at rest that can 
 be used transparently by any application accessing HDFS via Hadoop Filesystem 
 Java API, Hadoop libhdfs C library, or WebHDFS REST API.
 The resulting implementation should be able to be used in compliance with 
 different regulation requirements.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6450) Support non-positional hedged reads in HDFS

2014-07-15 Thread Liang Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061760#comment-14061760
 ] 

Liang Xie commented on HDFS-6450:
-

After a deep looking, it's kind of hard to reuse/maintain block reader as 
before. 
In pread(), we don't have this trouble, because we always create new block 
reader.
In read(), if we want to support hedged read ability, in general:
1) first read(r1) using the old block reader if possible, then wait hedged read 
timeout setting
2) second read(r2) must create a new block reader, and submit into thread pool
3) wait the first completed task, and return final read result to client side.  
Here we need to set(remember) this task's block reader to DFIS's block reader 
variable, and should keep it open,  but we also need to close the other block 
reader to avoid leak.

Another thing need to know is that if we remember the faster block reader, if 
it's a remote block reader, then the following read() will bypass local read in 
the following r1 operations...
Any thought ? [~cmccabe], [~saint@gmail.com] ... 

 Support non-positional hedged reads in HDFS
 ---

 Key: HDFS-6450
 URL: https://issues.apache.org/jira/browse/HDFS-6450
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.4.0
Reporter: Colin Patrick McCabe
Assignee: Liang Xie
 Attachments: HDFS-6450-like-pread.txt


 HDFS-5776 added support for hedged positional reads.  We should also support 
 hedged non-position reads (aka regular reads).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6619) Clean up encryption-related tests

2014-07-15 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061765#comment-14061765
 ] 

Yi Liu commented on HDFS-6619:
--

LGTM, +1, thanks [~andrew.wang] for refining the tests.

 Clean up encryption-related tests
 -

 Key: HDFS-6619
 URL: https://issues.apache.org/jira/browse/HDFS-6619
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Reporter: Andrew Wang
Assignee: Andrew Wang
Priority: Minor
 Attachments: hdfs-6619.001.patch


 Would be good to clean up TestHDFSEncryption and TestEncryptionZonesAPI. 
 These tests could be renamed, test timeouts added/adjusted, reduced number of 
 minicluster start/stops, whitespace, etc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6676) KMS throws AuthenticationException when enabling kerberos authentication

2014-07-15 Thread liyunzhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang updated HDFS-6676:
-

Description: 
When I made a request http://server-1941.novalocal:16000/kms/v1/names in 
firefox. (before, i set configs in firefox according 
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Deployment_Guide/sso-config-firefox.html),
 following info was found in logs/kms.log.
2014-07-14 19:18:30,461 WARN  AuthenticationFilter - Authentication exception: 
GSSException: Failure unspecified at GSS-API level (Mechanism level: 
EncryptedData is encrypted using keytype DES CBC mode with CRC-32 but 
decryption key is of type NULL)
org.apache.hadoop.security.authentication.client.AuthenticationException: 
GSSException: Failure unspecified at GSS-API level (Mechanism levelis of type 
NULL)
at 
org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:380)
at 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:357)
at 
org.apache.hadoop.crypto.key.kms.server.KMSAuthenticationFilter.doFilter(KMSAuthenticationFilter.java:100)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:745)
Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism level: 
EncryptedData is encrypted using keytype DES CBC mode with CRC-32 but 
decryption key is of type NULL)
at 
sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:788)
at 
sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342)
at 
sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285)
at 
sun.security.jgss.spnego.SpNegoContext.GSS_acceptSecContext(SpNegoContext.java:875)
at 
sun.security.jgss.spnego.SpNegoContext.acceptSecContext(SpNegoContext.java:548)
at 
sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:342)
at 
sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:285)
at 
org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler$2.run(KerberosAuthenticationHandler.java:347)
at 
org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler$2.run(KerberosAuthenticationHandler.java:329)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.authentication.server.KerberosAuthenticationHandler.authenticate(KerberosAuthenticationHandler.java:329)
... 14 more
Caused by: KrbException: EncryptedData is encrypted using keytype DES CBC mode 
with CRC-32 but decryption key is of type NULL
at sun.security.krb5.EncryptedData.decrypt(EncryptedData.java:169)
at sun.security.krb5.KrbCred.init(KrbCred.java:131)
at 
sun.security.jgss.krb5.InitialToken$OverloadedChecksum.init(InitialToken.java:282)
at 
sun.security.jgss.krb5.InitSecContextToken.init(InitSecContextToken.java:130)
at 
sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:771)
... 25 more

Kerberos is enabled successful in my environment:
klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: HTTP/server-1941.novalocal@NOVALOCAL

Valid starting ExpiresService principal
07/14/14 19:18:10  07/15/14 19:18:09  krbtgt/NOVALOCAL@NOVALOCAL
renew until 07/14/14 19:18:10
07/14/14 19:18:30  07/15/14 19:18:09  HTTP/server-1941.novalocal@NOVALOCAL
renew until 07/14/14 19:18:10

Following are kdc configs:
cat /etc/krb5.conf
[logging]
 default = FILE:/var/log/krb5libs.log
 kdc = FILE:/var/log/krb5kdc.log
 admin_server = FILE:/var/log/kadmind.log

[libdefaults]
 default_realm = NOVALOCAL
 dns_lookup_realm = false
 

[jira] [Commented] (HDFS-6667) In HDFS HA mode, Distcp/SLive with webhdfs on secure cluster fails with Client cannot authenticate via:[TOKEN, KERBEROS] error

2014-07-15 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061783#comment-14061783
 ] 

Jing Zhao commented on HDFS-6667:
-

The unit test failures should be unrelated. TestDFSAdminWithHA and 
TestPipelinesFailover were also seen in recent Jenkins run such as 
[here|https://issues.apache.org/jira/browse/HDFS-2856?focusedCommentId=14059617page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14059617].
 TestProcessCorruptBlocks has been reported in HDFS-6656.

 In HDFS HA mode, Distcp/SLive with webhdfs on secure cluster fails with 
 Client cannot authenticate via:[TOKEN, KERBEROS] error
 --

 Key: HDFS-6667
 URL: https://issues.apache.org/jira/browse/HDFS-6667
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Reporter: Jian He
Assignee: Jing Zhao
 Attachments: HDFS-6667.000.patch


 Opening on [~arpitgupta]'s behalf.
 We observed that, in HDFS HA mode, running Distcp/SLive with webhdfs will 
 fail on YARN.  In non-HA mode, it'll pass. 
 The reason is in HA mode, only webhdfs delegation token is generated for the 
 job, but YARN also requires the regular hdfs token to do localization, 
 log-aggregation etc.
 In non-HA mode, both tokens are generated for the job.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-2892) Some of property descriptions are not given(hdfs-default.xml)

2014-07-15 Thread Chunjun Xiao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chunjun Xiao updated HDFS-2892:
---

Description: 
dfs.datanode.https.addressHi..I taken 23.0 release form 
http://hadoop.apache.org/common/releases.html#11+Nov%2C+2011%3A+release+0.23.0+available

I just gone through all properties provided in the hdfs-default.xml..Some of 
the property description not mentioned..It's better to give description of 
property and usage(how to configure ) and Only MapReduce related jars only 
provided..Please check following two configurations


 *No Description*

{noformat}
property
  namedfs.datanode.https.address/name
  value0.0.0.0:50475/value
/property

property
  namedfs.namenode.https-address/name
  value0.0.0.0:50470/value
/property
{noformat}


 Better to mention example usage (what to configure...format(syntax))in 
desc,here I did not get what default mean whether this name of n/w interface or 
something else

 property
  namedfs.datanode.dns.interface/name
  valuedefault/value
  descriptionThe name of the Network Interface from which a data node should 
  report its IP address.
  /description
 /property


The following property is commented..If it is not supported better to remove.

property
   namedfs.cluster.administrators/name
   valueACL for the admins/value
   descriptionThis configuration is used to control who can access the
default servlets in the namenode, etc.
   /description
/property




 Small clarification for following property..if some value configured this then 
NN will be safe mode upto this much time..
May I know usage of the following property...
property
  namedfs.blockreport.initialDelay/name  value0/value
  descriptionDelay for first block report in seconds./description
/property


  was:
Hi..I taken 23.0 release form 
http://hadoop.apache.org/common/releases.html#11+Nov%2C+2011%3A+release+0.23.0+available

I just gone through all properties provided in the hdfs-default.xml..Some of 
the property description not mentioned..It's better to give description of 
property and usage(how to configure ) and Only MapReduce related jars only 
provided..Please check following two configurations


 *No Description*

{noformat}
property
  namedfs.datanode.https.address/name
  value0.0.0.0:50475/value
/property

property
  namedfs.namenode.https-address/name
  value0.0.0.0:50470/value
/property
{noformat}


 Better to mention example usage (what to configure...format(syntax))in 
desc,here I did not get what default mean whether this name of n/w interface or 
something else

 property
  namedfs.datanode.dns.interface/name
  valuedefault/value
  descriptionThe name of the Network Interface from which a data node should 
  report its IP address.
  /description
 /property


The following property is commented..If it is not supported better to remove.

property
   namedfs.cluster.administrators/name
   valueACL for the admins/value
   descriptionThis configuration is used to control who can access the
default servlets in the namenode, etc.
   /description
/property




 Small clarification for following property..if some value configured this then 
NN will be safe mode upto this much time..
May I know usage of the following property...
property
  namedfs.blockreport.initialDelay/name  value0/value
  descriptionDelay for first block report in seconds./description
/property



 Some of property descriptions are not given(hdfs-default.xml) 
 --

 Key: HDFS-2892
 URL: https://issues.apache.org/jira/browse/HDFS-2892
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 0.23.0
Reporter: Brahma Reddy Battula
Priority: Trivial

 dfs.datanode.https.addressHi..I taken 23.0 release form 
 http://hadoop.apache.org/common/releases.html#11+Nov%2C+2011%3A+release+0.23.0+available
 I just gone through all properties provided in the hdfs-default.xml..Some of 
 the property description not mentioned..It's better to give description of 
 property and usage(how to configure ) and Only MapReduce related jars only 
 provided..Please check following two configurations
  *No Description*
 {noformat}
 property
   namedfs.datanode.https.address/name
   value0.0.0.0:50475/value
 /property
 property
   namedfs.namenode.https-address/name
   value0.0.0.0:50470/value
 /property
 {noformat}
  Better to mention example usage (what to configure...format(syntax))in 
 desc,here I did not get what default mean whether this name of n/w interface 
 or something else
  property
   namedfs.datanode.dns.interface/name
   valuedefault/value
   descriptionThe name of the Network Interface from which a data node 
 should 
   report its IP address.
   /description
  /property
 The following property is commented..If it is not supported better 

[jira] [Created] (HDFS-6682) Add a metric to expose the timestamp of the oldest under-replicated block

2014-07-15 Thread Akira AJISAKA (JIRA)
Akira AJISAKA created HDFS-6682:
---

 Summary: Add a metric to expose the timestamp of the oldest 
under-replicated block
 Key: HDFS-6682
 URL: https://issues.apache.org/jira/browse/HDFS-6682
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Akira AJISAKA
Assignee: Akira AJISAKA


In the following case, the data in the HDFS is lost and a client needs to put 
the same file again.
# A Client puts a file to HDFS
# A DataNode crashes before replicating a block of the file to other DataNodes

I propose a metric to expose the timestamp of the oldest 
under-replicated/corrupt block. That way client can know what file to retain 
for the re-try.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6588) Investigating removing getTrueCause method in Server.java

2014-07-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061836#comment-14061836
 ] 

Hadoop QA commented on HDFS-6588:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12655689/HDFS-6588.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.fs.shell.TestCopyPreserveFlag
  org.apache.hadoop.fs.TestSymlinkLocalFSFileContext
  org.apache.hadoop.fs.shell.TestTextCommand
  org.apache.hadoop.ipc.TestIPC
  org.apache.hadoop.fs.TestSymlinkLocalFSFileSystem
  org.apache.hadoop.fs.shell.TestPathData
  org.apache.hadoop.fs.TestDFVariations
  
org.apache.hadoop.hdfs.server.blockmanagement.TestBlocksWithNotEnoughRacks
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7345//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7345//console

This message is automatically generated.

 Investigating removing getTrueCause method in Server.java
 -

 Key: HDFS-6588
 URL: https://issues.apache.org/jira/browse/HDFS-6588
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security, webhdfs
Affects Versions: 2.5.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6588.001.patch


 When addressing Daryn Sharp's comment for HDFS-6475 quoted below:
 {quote}
 What I'm saying is I think the patch adds too much unnecessary code. Filing 
 an improvement to delete all but a few lines of the code changed in this 
 patch seems a bit odd. I think you just need to:
 - Delete getTrueCause entirely instead of moving it elsewhere
 - In saslProcess, just throw the exception instead of running it through 
 getTrueCause since it's not a InvalidToken wrapping another exception 
 anymore.
 - Keep your 3-line change to unwrap SecurityException in toResponse
 {quote}
 There are multiple test failures, after making the suggested changes, Filing 
 this jira to dedicate to the investigation of removing getTrueCause method.
 More detail will be put in the first comment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6681) TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN is flaky and sometimes gets stuck in infinite loops

2014-07-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061854#comment-14061854
 ] 

Hadoop QA commented on HDFS-6681:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12655697/HDFS-6681.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover
  
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS
  org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7346//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7346//console

This message is automatically generated.

 TestRBWBlockInvalidation#testBlockInvalidationWhenRBWReplicaMissedInDN is 
 flaky and sometimes gets stuck in infinite loops
 --

 Key: HDFS-6681
 URL: https://issues.apache.org/jira/browse/HDFS-6681
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.1
 Environment: Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
 Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)
 Linux [hostname] 2.6.32-279.14.1.el6.x86_64 #1 SMP Mon Oct 15 13:44:51 EDT 
 2012 x86_64 x86_64 x86_64 GNU/Linux
Reporter: Ratandeep Ratti
 Attachments: HDFS-6681.patch


 This testcase has 3 infinite loops which break only on certain conditions 
 being satisfied.
 1st loop checks if there should be a single live replica. It assumes this to 
 be true since it has just corrupted a block on one of the datanodes (testcase 
 has replication factor as 2). One scenario in which this loop will never 
 break is if the Namenode invalidates the corrupt replica, schedules a 
 replication command, and the new copied replica is added all before this 
 testcase has the chance to check the live-replica count.
 2nd loop checks there should be 2 live replicas. It assumes this to be true 
 (in some time) since the first loop has broken implying there is a single 
 replica and now it is only a matter of time when the Namenode schedules a 
 replication command to copy a replica to another datanode. One scenario in 
 which this loop will never break is when the Namenode tries to schedule a new 
 replica on the same node on which we actually corrupted the block. That dst. 
 datanode will not copy the block, complaining that it already has the 
 (corrupted) replica in the create state. The situation that results is that 
 Namenode has scheduled a copy to a datanode, the block is now in the 
 namenode's pending replication queue, this block will never be removed from 
 the pending replication queue because the namenode will never receive a 
 report from the datanodes that the block is 'added'.
 Note: The block can be transferred from the 'pending replication' to needed 
 replication queue once the pending timeout (5 minutes) expires. The Namenode 
 then actively tries to schedule a replication for blocks in 'needed 
 replication' queue. This can cause the 2nd loop to break but the time in 
 which this process gets kicked in is more than 5 minutes.
 3rd loop: This loops checks if there are no corrupt replicas. I don't see a 
 scenario in which this loop can go on for ever, since once the live replica 
 count goes back to normal (2), the corrupted block will be removed
 I guess increasing the heart beat interval time, so that the testcase has 
 enough time to check condition in loop 1 before a datanode reports a 
 successful copy should help avoid race condition in loop1. Regarding loop2 I 
 guess we can reduce the timeout after which the block is transferred from the 
 pending replication to the needed replication queue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6590) NullPointerException was generated in getBlockLocalPathInfo when datanode restarts

2014-07-15 Thread Guo Ruijing (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061863#comment-14061863
 ] 

Guo Ruijing commented on HDFS-6590:
---

Root-cause: data is not initialized and referenced in 
data.getBlockLocalPathInfo(block);

fix solution 1:

existing:
in getBlockLocalPathInfo()
{
BlockLocalPathInfo info = data.getBlockLocalPathInfo(block);
}

new:
in getBlockLocalPathInfo()
{  
BlockLocalPathInfo info = null;
if (data != null)  {
info = data.getBlockLocalPathInfo(block);
}
}

 NullPointerException was generated in getBlockLocalPathInfo when datanode 
 restarts
 --

 Key: HDFS-6590
 URL: https://issues.apache.org/jira/browse/HDFS-6590
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.2.0
Reporter: Guo Ruijing

 2014-06-11 20:34:40.240119, p43949, th140725562181728, ERROR cannot setup 
 block reader for Block: [block pool ID: 
 BP-1901161041-172.28.1.251-1402542341112 block ID 1073741926_1102] on 
 Datanode: sdw3(172.28.1.3).
 RpcHelper.h: 74: HdfsIOException: Unexpected exception: when unwrap the rpc 
 remote exception java.lang.NullPointerException, 
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.getBlockLocalPathInfo(DataNode.java:1014)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientDatanodeProtocolServerSideTranslatorPB.getBlockLocalPathInfo(ClientDatanodeProtocolServerSideTranslatorPB.java:112)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientDatanodeProtocolProtos$ClientDatanodeProtocolService$2.callBlockingMethod(ClientDatanodeProtocolProtos.java:6373)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6590) NullPointerException was generated in getBlockLocalPathInfo when datanode restarts

2014-07-15 Thread Guo Ruijing (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061869#comment-14061869
 ] 

Guo Ruijing commented on HDFS-6590:
---

fix solution 2:

move initIpcServer after initStorage like:

void initBlockPool
{
initStorage(nsInfo);
initPeriodicScanners(conf);
initIpcServer(conf);//move initIpcServer after initStorage. in this 
case, data is initialized before getBlockLocalPathInfo is called in IPC.
}



 NullPointerException was generated in getBlockLocalPathInfo when datanode 
 restarts
 --

 Key: HDFS-6590
 URL: https://issues.apache.org/jira/browse/HDFS-6590
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.2.0
Reporter: Guo Ruijing

 2014-06-11 20:34:40.240119, p43949, th140725562181728, ERROR cannot setup 
 block reader for Block: [block pool ID: 
 BP-1901161041-172.28.1.251-1402542341112 block ID 1073741926_1102] on 
 Datanode: sdw3(172.28.1.3).
 RpcHelper.h: 74: HdfsIOException: Unexpected exception: when unwrap the rpc 
 remote exception java.lang.NullPointerException, 
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hdfs.server.datanode.DataNode.getBlockLocalPathInfo(DataNode.java:1014)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientDatanodeProtocolServerSideTranslatorPB.getBlockLocalPathInfo(ClientDatanodeProtocolServerSideTranslatorPB.java:112)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientDatanodeProtocolProtos$ClientDatanodeProtocolService$2.callBlockingMethod(ClientDatanodeProtocolProtos.java:6373)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6667) In HDFS HA mode, Distcp/SLive with webhdfs on secure cluster fails with Client cannot authenticate via:[TOKEN, KERBEROS] error

2014-07-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061882#comment-14061882
 ] 

Hadoop QA commented on HDFS-6667:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12655621/HDFS-6667.000.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7347//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7347//console

This message is automatically generated.

 In HDFS HA mode, Distcp/SLive with webhdfs on secure cluster fails with 
 Client cannot authenticate via:[TOKEN, KERBEROS] error
 --

 Key: HDFS-6667
 URL: https://issues.apache.org/jira/browse/HDFS-6667
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Reporter: Jian He
Assignee: Jing Zhao
 Attachments: HDFS-6667.000.patch


 Opening on [~arpitgupta]'s behalf.
 We observed that, in HDFS HA mode, running Distcp/SLive with webhdfs will 
 fail on YARN.  In non-HA mode, it'll pass. 
 The reason is in HA mode, only webhdfs delegation token is generated for the 
 job, but YARN also requires the regular hdfs token to do localization, 
 log-aggregation etc.
 In non-HA mode, both tokens are generated for the job.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6114) Block Scan log rolling will never happen if blocks written continuously leading to huge size of dncp_block_verification.log.curr

2014-07-15 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061895#comment-14061895
 ] 

Vinayakumar B commented on HDFS-6114:
-

bq. I don't really see a good reason to separate delBlockInfo and 
delNewBlockInfo. It seems like this could just lead to scenarios where we think 
we're deleting a block but it pops back up (because we deleted, but did not 
delete new)
Here, both are working on different set. {{delBlockInfo}} is used in someother 
places as well while updating the scantime and resort the blockInfoSet.
{{delNewBlockInfo}} is only needs to be called while deleting the block itself, 
as intermediate updates will not happen on this set data.
So {{delBlockInfo}} and {{delNewBlockInfo}} serves separate purposes and both 
are required.

bq. I guess maybe it makes sense to separate addBlockInfo from addNewBlockInfo, 
just because there are places in the setup code where we're willing to add 
stuff directly to blockInfoSet. Even in that case, I would argue it might be 
easier to call addNewBlockInfo and then later roll all the newBlockInfoSet 
items into blockInfoSet. The problem is that having both functions creates 
confusion and increase the chance that someone will add an incorrect call to 
the wrong one later on in another change.
As I am seeing, both these methods are private and acts on different sets. 
since method name itself suggests {{addNewBlockInfo}} is only for the new 
blocks. I am not seeing any confusion here.

bq. It seems like a bad idea to use BlockScanInfo.LAST_SCAN_TIME_COMPARATOR for 
blockInfoSet, but BlockScanInfo#hashCode (i.e. the HashSet strategy) for 
newBlockInfoSet. Let's just use a SortedSet for both so we don't have to ponder 
any possible discrepancies between the comparator and the hash function.
{{blockInfoSet}} is required to be sorted based on the lastScanTime, as oldest 
scanned block will be picked for scanning, which will be the first element in 
this set always. BlockScanInfo.LAST_SCAN_TIME_COMPARATOR is used because 
{{BlockScanInfo#hashCode()}} is default which will sort based on the blockId 
rather than scan time. 
Do you suggest me to update this {{hashCode()}} itself?

bq. Another problem with HashSet (compared with TreeSet) is that it never 
shrinks down after enlarging... a bad property for a temporary holding area
Yes, this I agree, will update in the next patch.

 Block Scan log rolling will never happen if blocks written continuously 
 leading to huge size of dncp_block_verification.log.curr
 

 Key: HDFS-6114
 URL: https://issues.apache.org/jira/browse/HDFS-6114
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.3.0, 2.4.0
Reporter: Vinayakumar B
Assignee: Vinayakumar B
Priority: Critical
 Attachments: HDFS-6114.patch, HDFS-6114.patch


 1. {{BlockPoolSliceScanner#scan()}} will not return until all the blocks are 
 scanned. 
 2. If the blocks (with size in several MBs) to datanode are written 
 continuously 
 then one iteration of {{BlockPoolSliceScanner#scan()}} will be continously 
 scanning the blocks
 3. These blocks will be deleted after some time (enough to get block scanned)
 4. As Block Scanning is throttled, So verification of all blocks will take so 
 much time.
 5. Rolling will never happen, so even though the total number of blocks in 
 datanode doesn't increases, entries ( which contains stale entries of deleted 
 blocks) in *dncp_block_verification.log.curr* continuously increases leading 
 to huge size.
 In one of our env, it grown more than 1TB where total number of blocks were 
 only ~45k.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6678) MiniDFSCluster may still be partially running after initialization fails.

2014-07-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061929#comment-14061929
 ] 

Hudson commented on HDFS-6678:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #613 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/613/])
HDFS-6678. MiniDFSCluster may still be partially running after initialization 
fails. Contributed by Chris Nauroth. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1610549)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java


 MiniDFSCluster may still be partially running after initialization fails.
 -

 Key: HDFS-6678
 URL: https://issues.apache.org/jira/browse/HDFS-6678
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 2.5.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Fix For: 3.0.0, 2.6.0

 Attachments: HDFS-6678.1.patch


 {{MiniDFSCluster}} initializes the daemons (NameNodes, DataNodes) as part of 
 object construction.  If initialization fails, then the constructor throws an 
 exception.  When this happens, it's possible that daemons are left running in 
 the background.  There is effectively no way to clean up after this state, 
 because the constructor failed, and therefore the caller has no way to 
 trigger a shutdown.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6378) NFS registration should timeout instead of hanging when portmap/rpcbind is not available

2014-07-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061928#comment-14061928
 ] 

Hudson commented on HDFS-6378:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #613 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/613/])
HDFS-6378. NFS registration should timeout instead of hanging when 
portmap/rpcbind is not available. Contributed by Abhiraj Butala (brandonli: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1610543)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/mount/MountdBase.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/Nfs3Base.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/RpcProgram.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/SimpleUdpClient.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 NFS registration should timeout instead of hanging when portmap/rpcbind is 
 not available
 

 Key: HDFS-6378
 URL: https://issues.apache.org/jira/browse/HDFS-6378
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Reporter: Brandon Li
Assignee: Abhiraj Butala
 Fix For: 2.5.0

 Attachments: HDFS-6378.002.patch, HDFS-6378.003.patch, HDFS-6378.patch


 When portmap/rpcbind is not available, NFS could be stuck at registration. 
 Instead, NFS gateway should shut down automatically with proper error message.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-2856) Fix block protocol so that Datanodes don't require root or jsvc

2014-07-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061926#comment-14061926
 ] 

Hudson commented on HDFS-2856:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #613 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/613/])
HDFS-2856. Fix block protocol so that Datanodes don't require root or jsvc. 
Contributed by Chris Nauroth. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1610474)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderFactory.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemotePeerFactory.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/net/EncryptedPeer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/net/TcpPeerServer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferEncryptor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/DataEncryptionKeyFactory.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/DataTransferSaslUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/InvalidMagicNumberException.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferClient.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferServer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslParticipant.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferTestCase.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/TestSaslDataTransfer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancerWithSaslDataTransfer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockTokenWithDFS.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java


 Fix block protocol so that Datanodes don't require root or jsvc
 ---

 Key: HDFS-2856
 URL: https://issues.apache.org/jira/browse/HDFS-2856
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, security
Affects Versions: 3.0.0, 2.4.0
Reporter: Owen O'Malley
Assignee: Chris Nauroth
 

[jira] [Commented] (HDFS-6671) Archival Storage: Consider block storage policy in replicaiton

2014-07-15 Thread Vinayakumar B (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14061933#comment-14061933
 ] 

Vinayakumar B commented on HDFS-6671:
-

Thanks [~szetszwo],
I  have found following things
1. {code}
+return new IteratorStorageType() {
+  final IteratorDatanodeStorageInfo i = chosen.iterator();
+  @Override
+  public boolean hasNext() {return i.hasNext();}
+  @Override
+  public StorageType next() {return i.next().getStorageType();}
+};{code}
Here one more method remove() needs to be implemented to fix the compilation 
errors.

2. typo in TestBlockStoragePolicy.DEFAULT_STORAGE_POICY

3. As of now, BlockPlacementPolicyDefault#getStorageType() will result in NPE, 
since the storagePolicyId is set to 0 in INodeFile, and this will return null 
storagePolicy. Would it be better of default policy is returned if the policy 
with id is found null? 

 Archival Storage: Consider block storage policy in replicaiton
 --

 Key: HDFS-6671
 URL: https://issues.apache.org/jira/browse/HDFS-6671
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h6671_20140714.patch


 In order to satisfy storage policy requirement, replication monitor in 
 addition reads storage policy information from INodeFile when performing 
 replication.  As before, it only adds replicas if a block is under 
 replicated, and deletes replicas if a block is over replicated.  It will NOT 
 move replicas around for satisfying storage policy requirement.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-2856) Fix block protocol so that Datanodes don't require root or jsvc

2014-07-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062069#comment-14062069
 ] 

Hudson commented on HDFS-2856:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1805 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1805/])
HDFS-2856. Fix block protocol so that Datanodes don't require root or jsvc. 
Contributed by Chris Nauroth. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1610474)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderFactory.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemotePeerFactory.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/net/EncryptedPeer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/net/TcpPeerServer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferEncryptor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/DataEncryptionKeyFactory.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/DataTransferSaslUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/InvalidMagicNumberException.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferClient.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferServer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslParticipant.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferTestCase.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/TestSaslDataTransfer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancerWithSaslDataTransfer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockTokenWithDFS.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java


 Fix block protocol so that Datanodes don't require root or jsvc
 ---

 Key: HDFS-2856
 URL: https://issues.apache.org/jira/browse/HDFS-2856
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, security
Affects Versions: 3.0.0, 2.4.0
Reporter: Owen O'Malley
Assignee: Chris Nauroth
 

[jira] [Commented] (HDFS-6678) MiniDFSCluster may still be partially running after initialization fails.

2014-07-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062072#comment-14062072
 ] 

Hudson commented on HDFS-6678:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1805 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1805/])
HDFS-6678. MiniDFSCluster may still be partially running after initialization 
fails. Contributed by Chris Nauroth. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1610549)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java


 MiniDFSCluster may still be partially running after initialization fails.
 -

 Key: HDFS-6678
 URL: https://issues.apache.org/jira/browse/HDFS-6678
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 2.5.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Fix For: 3.0.0, 2.6.0

 Attachments: HDFS-6678.1.patch


 {{MiniDFSCluster}} initializes the daemons (NameNodes, DataNodes) as part of 
 object construction.  If initialization fails, then the constructor throws an 
 exception.  When this happens, it's possible that daemons are left running in 
 the background.  There is effectively no way to clean up after this state, 
 because the constructor failed, and therefore the caller has no way to 
 trigger a shutdown.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6378) NFS registration should timeout instead of hanging when portmap/rpcbind is not available

2014-07-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062071#comment-14062071
 ] 

Hudson commented on HDFS-6378:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1805 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1805/])
HDFS-6378. NFS registration should timeout instead of hanging when 
portmap/rpcbind is not available. Contributed by Abhiraj Butala (brandonli: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1610543)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/mount/MountdBase.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/Nfs3Base.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/RpcProgram.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/SimpleUdpClient.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 NFS registration should timeout instead of hanging when portmap/rpcbind is 
 not available
 

 Key: HDFS-6378
 URL: https://issues.apache.org/jira/browse/HDFS-6378
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Reporter: Brandon Li
Assignee: Abhiraj Butala
 Fix For: 2.5.0

 Attachments: HDFS-6378.002.patch, HDFS-6378.003.patch, HDFS-6378.patch


 When portmap/rpcbind is not available, NFS could be stuck at registration. 
 Instead, NFS gateway should shut down automatically with proper error message.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6619) Clean up encryption-related tests

2014-07-15 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062078#comment-14062078
 ] 

Charles Lamb commented on HDFS-6619:


Piling on... +1


 Clean up encryption-related tests
 -

 Key: HDFS-6619
 URL: https://issues.apache.org/jira/browse/HDFS-6619
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Reporter: Andrew Wang
Assignee: Andrew Wang
Priority: Minor
 Attachments: hdfs-6619.001.patch


 Would be good to clean up TestHDFSEncryption and TestEncryptionZonesAPI. 
 These tests could be renamed, test timeouts added/adjusted, reduced number of 
 minicluster start/stops, whitespace, etc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6422) getfattr in CLI doesn't throw exception or return non-0 return code when xattr doesn't exist

2014-07-15 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-6422:
---

Status: In Progress  (was: Patch Available)

 getfattr in CLI doesn't throw exception or return non-0 return code when 
 xattr doesn't exist
 

 Key: HDFS-6422
 URL: https://issues.apache.org/jira/browse/HDFS-6422
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.5.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-6422.1.patch, HDFS-6422.2.patch, HDFS-6422.3.patch


 If you do
 hdfs dfs -getfattr -n user.blah /foo
 and user.blah doesn't exist, the command prints
 # file: /foo
 and a 0 return code.
 It should print an exception and return a non-0 return code instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6597) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed

2014-07-15 Thread Danilo Vunjak (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danilo Vunjak updated HDFS-6597:


Attachment: JIRA-HDFS-6597.02.patch

 Add a new option to NN upgrade to terminate the process after upgrade on NN 
 is completed
 

 Key: HDFS-6597
 URL: https://issues.apache.org/jira/browse/HDFS-6597
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Danilo Vunjak
 Attachments: JIRA-HDFS-30.patch, JIRA-HDFS-6597.02.patch, 
 JIRA-HDFS-6597.patch


 Currently when namenode is started for upgrade (hadoop namenode -upgrade 
 command), after finishing upgrade of metadata, namenode starts working 
 normally and wait for datanodes to upgrade itself and connect to to NN. We 
 need to have option for upgrading only NN metadata, so after upgrade is 
 finished on NN, process should terminate.
 I have tested it by changing in file: hdfs.server.namenode.NameNode.java, 
 method: public static NameNode createNameNode(String argv[], Configuration 
 conf):
  in switch added
  case UPGRADE:
 case UPGRADE:
   {
 DefaultMetricsSystem.initialize(NameNode);
   NameNode nameNode = new NameNode(conf);
   if (startOpt.getForceUpgrade()) {
 terminate(0);
 return null;
   }
   
   return nameNode;
   }
 This did upgrade of metadata, closed process after finished, and later when 
 all services were started, upgrade of datanodes finished sucessfully and 
 system run .
 What I'm suggesting right now is to add new startup parameter -force, so 
 namenode can be started like this hadoop namenode -upgrade -force, so we 
 can indicate that we want to terminate process after upgrade metadata on NN 
 is finished. Old functionality should be preserved, so users can run hadoop 
 namenode -upgrade on same way and with same behaviour as it was previous.
  Thanks,
  Danilo



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6597) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed

2014-07-15 Thread Danilo Vunjak (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danilo Vunjak updated HDFS-6597:


Status: Patch Available  (was: Open)

 Add a new option to NN upgrade to terminate the process after upgrade on NN 
 is completed
 

 Key: HDFS-6597
 URL: https://issues.apache.org/jira/browse/HDFS-6597
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Danilo Vunjak
 Attachments: JIRA-HDFS-30.patch, JIRA-HDFS-6597.02.patch, 
 JIRA-HDFS-6597.patch


 Currently when namenode is started for upgrade (hadoop namenode -upgrade 
 command), after finishing upgrade of metadata, namenode starts working 
 normally and wait for datanodes to upgrade itself and connect to to NN. We 
 need to have option for upgrading only NN metadata, so after upgrade is 
 finished on NN, process should terminate.
 I have tested it by changing in file: hdfs.server.namenode.NameNode.java, 
 method: public static NameNode createNameNode(String argv[], Configuration 
 conf):
  in switch added
  case UPGRADE:
 case UPGRADE:
   {
 DefaultMetricsSystem.initialize(NameNode);
   NameNode nameNode = new NameNode(conf);
   if (startOpt.getForceUpgrade()) {
 terminate(0);
 return null;
   }
   
   return nameNode;
   }
 This did upgrade of metadata, closed process after finished, and later when 
 all services were started, upgrade of datanodes finished sucessfully and 
 system run .
 What I'm suggesting right now is to add new startup parameter -force, so 
 namenode can be started like this hadoop namenode -upgrade -force, so we 
 can indicate that we want to terminate process after upgrade metadata on NN 
 is finished. Old functionality should be preserved, so users can run hadoop 
 namenode -upgrade on same way and with same behaviour as it was previous.
  Thanks,
  Danilo



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6597) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed

2014-07-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062145#comment-14062145
 ] 

Hadoop QA commented on HDFS-6597:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12655777/JIRA-HDFS-6597.02.patch
  against trunk revision .

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7348//console

This message is automatically generated.

 Add a new option to NN upgrade to terminate the process after upgrade on NN 
 is completed
 

 Key: HDFS-6597
 URL: https://issues.apache.org/jira/browse/HDFS-6597
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Danilo Vunjak
 Attachments: JIRA-HDFS-30.patch, JIRA-HDFS-6597.02.patch, 
 JIRA-HDFS-6597.patch


 Currently when namenode is started for upgrade (hadoop namenode -upgrade 
 command), after finishing upgrade of metadata, namenode starts working 
 normally and wait for datanodes to upgrade itself and connect to to NN. We 
 need to have option for upgrading only NN metadata, so after upgrade is 
 finished on NN, process should terminate.
 I have tested it by changing in file: hdfs.server.namenode.NameNode.java, 
 method: public static NameNode createNameNode(String argv[], Configuration 
 conf):
  in switch added
  case UPGRADE:
 case UPGRADE:
   {
 DefaultMetricsSystem.initialize(NameNode);
   NameNode nameNode = new NameNode(conf);
   if (startOpt.getForceUpgrade()) {
 terminate(0);
 return null;
   }
   
   return nameNode;
   }
 This did upgrade of metadata, closed process after finished, and later when 
 all services were started, upgrade of datanodes finished sucessfully and 
 system run .
 What I'm suggesting right now is to add new startup parameter -force, so 
 namenode can be started like this hadoop namenode -upgrade -force, so we 
 can indicate that we want to terminate process after upgrade metadata on NN 
 is finished. Old functionality should be preserved, so users can run hadoop 
 namenode -upgrade on same way and with same behaviour as it was previous.
  Thanks,
  Danilo



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6378) NFS registration should timeout instead of hanging when portmap/rpcbind is not available

2014-07-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062169#comment-14062169
 ] 

Hudson commented on HDFS-6378:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1832 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1832/])
HDFS-6378. NFS registration should timeout instead of hanging when 
portmap/rpcbind is not available. Contributed by Abhiraj Butala (brandonli: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1610543)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/mount/MountdBase.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/nfs3/Nfs3Base.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/RpcProgram.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/oncrpc/SimpleUdpClient.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 NFS registration should timeout instead of hanging when portmap/rpcbind is 
 not available
 

 Key: HDFS-6378
 URL: https://issues.apache.org/jira/browse/HDFS-6378
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Reporter: Brandon Li
Assignee: Abhiraj Butala
 Fix For: 2.5.0

 Attachments: HDFS-6378.002.patch, HDFS-6378.003.patch, HDFS-6378.patch


 When portmap/rpcbind is not available, NFS could be stuck at registration. 
 Instead, NFS gateway should shut down automatically with proper error message.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6678) MiniDFSCluster may still be partially running after initialization fails.

2014-07-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062170#comment-14062170
 ] 

Hudson commented on HDFS-6678:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1832 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1832/])
HDFS-6678. MiniDFSCluster may still be partially running after initialization 
fails. Contributed by Chris Nauroth. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1610549)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java


 MiniDFSCluster may still be partially running after initialization fails.
 -

 Key: HDFS-6678
 URL: https://issues.apache.org/jira/browse/HDFS-6678
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 3.0.0, 2.5.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Fix For: 3.0.0, 2.6.0

 Attachments: HDFS-6678.1.patch


 {{MiniDFSCluster}} initializes the daemons (NameNodes, DataNodes) as part of 
 object construction.  If initialization fails, then the constructor throws an 
 exception.  When this happens, it's possible that daemons are left running in 
 the background.  There is effectively no way to clean up after this state, 
 because the constructor failed, and therefore the caller has no way to 
 trigger a shutdown.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-2856) Fix block protocol so that Datanodes don't require root or jsvc

2014-07-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062167#comment-14062167
 ] 

Hudson commented on HDFS-2856:
--

SUCCESS: Integrated in Hadoop-Mapreduce-trunk #1832 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1832/])
HDFS-2856. Fix block protocol so that Datanodes don't require root or jsvc. 
Contributed by Chris Nauroth. (cnauroth: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1610474)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/pom.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderFactory.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemotePeerFactory.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/net/EncryptedPeer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/net/TcpPeerServer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/DataTransferEncryptor.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/DataEncryptionKeyFactory.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/DataTransferSaslUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/InvalidMagicNumberException.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferClient.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferServer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslParticipant.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/NameNodeConnector.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DNConf.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferTestCase.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/TestSaslDataTransfer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/balancer/TestBalancerWithSaslDataTransfer.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/blockmanagement/TestBlockTokenWithDFS.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestDataNodeVolumeFailure.java


 Fix block protocol so that Datanodes don't require root or jsvc
 ---

 Key: HDFS-2856
 URL: https://issues.apache.org/jira/browse/HDFS-2856
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: datanode, security
Affects Versions: 3.0.0, 2.4.0
Reporter: Owen O'Malley
Assignee: Chris Nauroth

[jira] [Updated] (HDFS-6597) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed

2014-07-15 Thread Danilo Vunjak (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danilo Vunjak updated HDFS-6597:


Status: Open  (was: Patch Available)

 Add a new option to NN upgrade to terminate the process after upgrade on NN 
 is completed
 

 Key: HDFS-6597
 URL: https://issues.apache.org/jira/browse/HDFS-6597
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Danilo Vunjak
 Attachments: JIRA-HDFS-30.patch, JIRA-HDFS-6597.02.patch, 
 JIRA-HDFS-6597.patch


 Currently when namenode is started for upgrade (hadoop namenode -upgrade 
 command), after finishing upgrade of metadata, namenode starts working 
 normally and wait for datanodes to upgrade itself and connect to to NN. We 
 need to have option for upgrading only NN metadata, so after upgrade is 
 finished on NN, process should terminate.
 I have tested it by changing in file: hdfs.server.namenode.NameNode.java, 
 method: public static NameNode createNameNode(String argv[], Configuration 
 conf):
  in switch added
  case UPGRADE:
 case UPGRADE:
   {
 DefaultMetricsSystem.initialize(NameNode);
   NameNode nameNode = new NameNode(conf);
   if (startOpt.getForceUpgrade()) {
 terminate(0);
 return null;
   }
   
   return nameNode;
   }
 This did upgrade of metadata, closed process after finished, and later when 
 all services were started, upgrade of datanodes finished sucessfully and 
 system run .
 What I'm suggesting right now is to add new startup parameter -force, so 
 namenode can be started like this hadoop namenode -upgrade -force, so we 
 can indicate that we want to terminate process after upgrade metadata on NN 
 is finished. Old functionality should be preserved, so users can run hadoop 
 namenode -upgrade on same way and with same behaviour as it was previous.
  Thanks,
  Danilo



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6671) Archival Storage: Consider block storage policy in replicaiton

2014-07-15 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-6671:
--

Attachment: h6671_20140715.patch

Thanks Vinay for reveiwing the patch.
# Good catch.  Fixed.  My IDE somehow does not show this error.
# Fixed.
# You are right that it should return default since storagePolicyId == 0 means 
policy not specified.  Fixed.

Here is a new patch: h6671_20140715.patch


 Archival Storage: Consider block storage policy in replicaiton
 --

 Key: HDFS-6671
 URL: https://issues.apache.org/jira/browse/HDFS-6671
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h6671_20140714.patch, h6671_20140715.patch


 In order to satisfy storage policy requirement, replication monitor in 
 addition reads storage policy information from INodeFile when performing 
 replication.  As before, it only adds replicas if a block is under 
 replicated, and deletes replicas if a block is over replicated.  It will NOT 
 move replicas around for satisfying storage policy requirement.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6597) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed

2014-07-15 Thread Danilo Vunjak (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danilo Vunjak updated HDFS-6597:


Attachment: JIRA-HDFS-6597.03.patch

 Add a new option to NN upgrade to terminate the process after upgrade on NN 
 is completed
 

 Key: HDFS-6597
 URL: https://issues.apache.org/jira/browse/HDFS-6597
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Danilo Vunjak
 Attachments: JIRA-HDFS-30.patch, JIRA-HDFS-6597.02.patch, 
 JIRA-HDFS-6597.03.patch, JIRA-HDFS-6597.patch


 Currently when namenode is started for upgrade (hadoop namenode -upgrade 
 command), after finishing upgrade of metadata, namenode starts working 
 normally and wait for datanodes to upgrade itself and connect to to NN. We 
 need to have option for upgrading only NN metadata, so after upgrade is 
 finished on NN, process should terminate.
 I have tested it by changing in file: hdfs.server.namenode.NameNode.java, 
 method: public static NameNode createNameNode(String argv[], Configuration 
 conf):
  in switch added
  case UPGRADE:
 case UPGRADE:
   {
 DefaultMetricsSystem.initialize(NameNode);
   NameNode nameNode = new NameNode(conf);
   if (startOpt.getForceUpgrade()) {
 terminate(0);
 return null;
   }
   
   return nameNode;
   }
 This did upgrade of metadata, closed process after finished, and later when 
 all services were started, upgrade of datanodes finished sucessfully and 
 system run .
 What I'm suggesting right now is to add new startup parameter -force, so 
 namenode can be started like this hadoop namenode -upgrade -force, so we 
 can indicate that we want to terminate process after upgrade metadata on NN 
 is finished. Old functionality should be preserved, so users can run hadoop 
 namenode -upgrade on same way and with same behaviour as it was previous.
  Thanks,
  Danilo



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6597) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed

2014-07-15 Thread Danilo Vunjak (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danilo Vunjak updated HDFS-6597:


Status: Patch Available  (was: Open)

 Add a new option to NN upgrade to terminate the process after upgrade on NN 
 is completed
 

 Key: HDFS-6597
 URL: https://issues.apache.org/jira/browse/HDFS-6597
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Danilo Vunjak
 Attachments: JIRA-HDFS-30.patch, JIRA-HDFS-6597.02.patch, 
 JIRA-HDFS-6597.03.patch, JIRA-HDFS-6597.patch


 Currently when namenode is started for upgrade (hadoop namenode -upgrade 
 command), after finishing upgrade of metadata, namenode starts working 
 normally and wait for datanodes to upgrade itself and connect to to NN. We 
 need to have option for upgrading only NN metadata, so after upgrade is 
 finished on NN, process should terminate.
 I have tested it by changing in file: hdfs.server.namenode.NameNode.java, 
 method: public static NameNode createNameNode(String argv[], Configuration 
 conf):
  in switch added
  case UPGRADE:
 case UPGRADE:
   {
 DefaultMetricsSystem.initialize(NameNode);
   NameNode nameNode = new NameNode(conf);
   if (startOpt.getForceUpgrade()) {
 terminate(0);
 return null;
   }
   
   return nameNode;
   }
 This did upgrade of metadata, closed process after finished, and later when 
 all services were started, upgrade of datanodes finished sucessfully and 
 system run .
 What I'm suggesting right now is to add new startup parameter -force, so 
 namenode can be started like this hadoop namenode -upgrade -force, so we 
 can indicate that we want to terminate process after upgrade metadata on NN 
 is finished. Old functionality should be preserved, so users can run hadoop 
 namenode -upgrade on same way and with same behaviour as it was previous.
  Thanks,
  Danilo



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6597) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed

2014-07-15 Thread Danilo Vunjak (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danilo Vunjak updated HDFS-6597:


Status: Patch Available  (was: Open)

 Add a new option to NN upgrade to terminate the process after upgrade on NN 
 is completed
 

 Key: HDFS-6597
 URL: https://issues.apache.org/jira/browse/HDFS-6597
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Danilo Vunjak
 Attachments: HDFS-6597.04.patch, JIRA-HDFS-30.patch, 
 JIRA-HDFS-6597.02.patch, JIRA-HDFS-6597.03.patch, JIRA-HDFS-6597.patch


 Currently when namenode is started for upgrade (hadoop namenode -upgrade 
 command), after finishing upgrade of metadata, namenode starts working 
 normally and wait for datanodes to upgrade itself and connect to to NN. We 
 need to have option for upgrading only NN metadata, so after upgrade is 
 finished on NN, process should terminate.
 I have tested it by changing in file: hdfs.server.namenode.NameNode.java, 
 method: public static NameNode createNameNode(String argv[], Configuration 
 conf):
  in switch added
  case UPGRADE:
 case UPGRADE:
   {
 DefaultMetricsSystem.initialize(NameNode);
   NameNode nameNode = new NameNode(conf);
   if (startOpt.getForceUpgrade()) {
 terminate(0);
 return null;
   }
   
   return nameNode;
   }
 This did upgrade of metadata, closed process after finished, and later when 
 all services were started, upgrade of datanodes finished sucessfully and 
 system run .
 What I'm suggesting right now is to add new startup parameter -force, so 
 namenode can be started like this hadoop namenode -upgrade -force, so we 
 can indicate that we want to terminate process after upgrade metadata on NN 
 is finished. Old functionality should be preserved, so users can run hadoop 
 namenode -upgrade on same way and with same behaviour as it was previous.
  Thanks,
  Danilo



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6597) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed

2014-07-15 Thread Danilo Vunjak (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danilo Vunjak updated HDFS-6597:


Attachment: HDFS-6597.04.patch

 Add a new option to NN upgrade to terminate the process after upgrade on NN 
 is completed
 

 Key: HDFS-6597
 URL: https://issues.apache.org/jira/browse/HDFS-6597
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Danilo Vunjak
 Attachments: HDFS-6597.04.patch, JIRA-HDFS-30.patch, 
 JIRA-HDFS-6597.02.patch, JIRA-HDFS-6597.03.patch, JIRA-HDFS-6597.patch


 Currently when namenode is started for upgrade (hadoop namenode -upgrade 
 command), after finishing upgrade of metadata, namenode starts working 
 normally and wait for datanodes to upgrade itself and connect to to NN. We 
 need to have option for upgrading only NN metadata, so after upgrade is 
 finished on NN, process should terminate.
 I have tested it by changing in file: hdfs.server.namenode.NameNode.java, 
 method: public static NameNode createNameNode(String argv[], Configuration 
 conf):
  in switch added
  case UPGRADE:
 case UPGRADE:
   {
 DefaultMetricsSystem.initialize(NameNode);
   NameNode nameNode = new NameNode(conf);
   if (startOpt.getForceUpgrade()) {
 terminate(0);
 return null;
   }
   
   return nameNode;
   }
 This did upgrade of metadata, closed process after finished, and later when 
 all services were started, upgrade of datanodes finished sucessfully and 
 system run .
 What I'm suggesting right now is to add new startup parameter -force, so 
 namenode can be started like this hadoop namenode -upgrade -force, so we 
 can indicate that we want to terminate process after upgrade metadata on NN 
 is finished. Old functionality should be preserved, so users can run hadoop 
 namenode -upgrade on same way and with same behaviour as it was previous.
  Thanks,
  Danilo



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6597) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed

2014-07-15 Thread Danilo Vunjak (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danilo Vunjak updated HDFS-6597:


Status: Open  (was: Patch Available)

 Add a new option to NN upgrade to terminate the process after upgrade on NN 
 is completed
 

 Key: HDFS-6597
 URL: https://issues.apache.org/jira/browse/HDFS-6597
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Danilo Vunjak
 Attachments: HDFS-6597.04.patch, JIRA-HDFS-30.patch, 
 JIRA-HDFS-6597.02.patch, JIRA-HDFS-6597.03.patch, JIRA-HDFS-6597.patch


 Currently when namenode is started for upgrade (hadoop namenode -upgrade 
 command), after finishing upgrade of metadata, namenode starts working 
 normally and wait for datanodes to upgrade itself and connect to to NN. We 
 need to have option for upgrading only NN metadata, so after upgrade is 
 finished on NN, process should terminate.
 I have tested it by changing in file: hdfs.server.namenode.NameNode.java, 
 method: public static NameNode createNameNode(String argv[], Configuration 
 conf):
  in switch added
  case UPGRADE:
 case UPGRADE:
   {
 DefaultMetricsSystem.initialize(NameNode);
   NameNode nameNode = new NameNode(conf);
   if (startOpt.getForceUpgrade()) {
 terminate(0);
 return null;
   }
   
   return nameNode;
   }
 This did upgrade of metadata, closed process after finished, and later when 
 all services were started, upgrade of datanodes finished sucessfully and 
 system run .
 What I'm suggesting right now is to add new startup parameter -force, so 
 namenode can be started like this hadoop namenode -upgrade -force, so we 
 can indicate that we want to terminate process after upgrade metadata on NN 
 is finished. Old functionality should be preserved, so users can run hadoop 
 namenode -upgrade on same way and with same behaviour as it was previous.
  Thanks,
  Danilo



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6490) Fix the keyid format for generated keys in FSNamesystem.createEncryptionZone

2014-07-15 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062224#comment-14062224
 ] 

Uma Maheswara Rao G commented on HDFS-6490:
---

Hi [~clamb], I have reviewed the patch. Please find the comments below.
Patch need update with latest code

I think now we are passing keyid from outside to createNewKey.

In the case nameserviceID null we can use assume non federated cluster and use 
DFS_NAMENODE_RPC_ADDRESS_KEY?

seems like when you have path ends with '/', you want to pass last char, that 
means again '/'. so can we use directly '/' instead of substring?
sb.append(src.endsWith(/) ? / : src);  -- sb.append(src.endsWith(/) ? 
'/' : src);

sb.append(/);  -- sb.append('/');

 Fix the keyid format for generated keys in FSNamesystem.createEncryptionZone 
 -

 Key: HDFS-6490
 URL: https://issues.apache.org/jira/browse/HDFS-6490
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode, security
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-6490.001.patch


 FSNamesystem.createEncryptionZone needs to create key ids with the format 
 hdfs://HOST:PORT/pathOfEZ



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6671) Archival Storage: Consider block storage policy in replicaiton

2014-07-15 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-6671:
--

Attachment: h6671_20140715b.patch

h6671_20140715b.patch: adds more checks on parsing storage policies.

 Archival Storage: Consider block storage policy in replicaiton
 --

 Key: HDFS-6671
 URL: https://issues.apache.org/jira/browse/HDFS-6671
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: h6671_20140714.patch, h6671_20140715.patch, 
 h6671_20140715b.patch


 In order to satisfy storage policy requirement, replication monitor in 
 addition reads storage policy information from INodeFile when performing 
 replication.  As before, it only adds replicas if a block is under 
 replicated, and deletes replicas if a block is over replicated.  It will NOT 
 move replicas around for satisfying storage policy requirement.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-5809) BlockPoolSliceScanner and high speed hdfs appending make datanode to drop into infinite loop

2014-07-15 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5809:
---

Description: 
{{BlockPoolSliceScanner#scan}} contains a while loop that continues to verify 
(i.e. scan) blocks until the {{blockInfoSet}} is empty (or some other 
conditions like a timeout have occurred.)  In order to do this, it calls 
{{BlockPoolSliceScanner#verifyFirstBlock}}.  This is intended to grab the first 
block in the {{blockInfoSet}}, verify it, and remove it from that set.  
({{blockInfoSet}} is sorted by last scan time.) Unfortunately, if we hit a 
certain bug in {{updateScanStatus}}, the block may never be removed from 
{{blockInfoSet}}.  When this happens, we keep rescanning the exact same block 
until the timeout hits.

The bug is triggered when a block winds up in {{blockInfoSet}} but not in 
{{blockMap}}.  You can see it clearly in this code:
{code}
  private synchronized void updateScanStatus(Block block,  
 ScanType type,
 boolean scanOk) { 
BlockScanInfo info = blockMap.get(block);
   
if ( info != null ) {
  delBlockInfo(info);
} else {   
  // It might already be removed. Thats ok, it will be caught next time.   
  info = new BlockScanInfo(block); 
}   
{code}

If {{info == null}}, we never call {{delBlockInfo}}, the function which is 
intended to remove the {{blockInfoSet}} entry.

Luckily, there is a simple fix here... the variable that {{updateScanStatus}} 
is being passed is actually a BlockInfo object, so we can simply call 
{{delBlockInfo}} on it directly, without doing a lookup in the {{blockMap}}.  
This is both faster and more robust.

  was:
Hello, everyone.

When hadoop cluster starts, BlockPoolSliceScanner start scanning the blocks in 
my cluster.
Then, randomly one datanode drop into infinite loop as the log show, and 
finally all datanodes drop into infinite loop.
Every datanode just verify fail by one block. 
When i check the fail block like this : hadoop fsck / -files -blocks | grep 
blk_1223474551535936089_4702249, no hdfs file contains the block.

It seems that in while block of BlockPoolSliceScanner's scan method drop into 
infinite loop .
BlockPoolSliceScanner: 650

while (datanode.shouldRun
 !datanode.blockScanner.blockScannerThread.isInterrupted()
 datanode.isBPServiceAlive(blockPoolId)) { 

The log finally printed in method verifyBlock(BlockPoolSliceScanner:453).

Please excuse my poor English.
-
LOG: 
2014-01-21 18:36:50,582 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
failed for 
BP-1040548460-58.229.158.13-1385606058039:blk_6833233229840997944_4702634 - may 
be due to race with write
2014-01-21 18:36:50,582 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
failed for 
BP-1040548460-58.229.158.13-1385606058039:blk_6833233229840997944_4702634 - may 
be due to race with write
2014-01-21 18:36:50,582 INFO 
org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification 
failed for 
BP-1040548460-58.229.158.13-1385606058039:blk_6833233229840997944_4702634 - may 
be due to race with write


 BlockPoolSliceScanner and high speed hdfs appending make datanode to drop 
 into infinite loop
 

 Key: HDFS-5809
 URL: https://issues.apache.org/jira/browse/HDFS-5809
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.0.0-alpha
 Environment: jdk1.6, centos6.4, 2.0.0-cdh4.5.0
Reporter: ikweesung
Assignee: Colin Patrick McCabe
Priority: Critical
  Labels: blockpoolslicescanner, datanode, infinite-loop
 Attachments: HDFS-5809.001.patch


 {{BlockPoolSliceScanner#scan}} contains a while loop that continues to 
 verify (i.e. scan) blocks until the {{blockInfoSet}} is empty (or some other 
 conditions like a timeout have occurred.)  In order to do this, it calls 
 {{BlockPoolSliceScanner#verifyFirstBlock}}.  This is intended to grab the 
 first block in the {{blockInfoSet}}, verify it, and remove it from that set.  
 ({{blockInfoSet}} is sorted by last scan time.) Unfortunately, if we hit a 
 certain bug in {{updateScanStatus}}, the block may never be removed from 
 {{blockInfoSet}}.  When this happens, we keep rescanning the exact same block 
 until the timeout hits.
 The 

[jira] [Commented] (HDFS-6114) Block Scan log rolling will never happen if blocks written continuously leading to huge size of dncp_block_verification.log.curr

2014-07-15 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062322#comment-14062322
 ] 

Colin Patrick McCabe commented on HDFS-6114:


bq. blockInfoSet is required to be sorted based on the lastScanTime, as oldest 
scanned block will be picked for scanning, which will be the first element in 
this set always. BlockScanInfo.LAST_SCAN_TIME_COMPARATOR is used because 
BlockScanInfo#hashCode() is default which will sort based on the blockId rather 
than scan time.  Do you suggest me to update this hashCode() itself?

I was suggesting that you use a {{TreeSet}} or {{TreeMap}} with the same 
comparator as {{blockInfoSet}}.  All the hash sets that I'm aware of do not 
shrink down after enlarging.

bq. So delBlockInfo and delNewBlockInfo serves separate purposes and both are 
required.

I can write a version of the patch that only has one del function and only one 
add function.  I am really reluctant to put in another set of add/del functions 
on top of what's already there, since I think it will make things hard to 
understand for people trying to modify this code later or backport this patch 
to other branches.

 Block Scan log rolling will never happen if blocks written continuously 
 leading to huge size of dncp_block_verification.log.curr
 

 Key: HDFS-6114
 URL: https://issues.apache.org/jira/browse/HDFS-6114
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.3.0, 2.4.0
Reporter: Vinayakumar B
Assignee: Vinayakumar B
Priority: Critical
 Attachments: HDFS-6114.patch, HDFS-6114.patch


 1. {{BlockPoolSliceScanner#scan()}} will not return until all the blocks are 
 scanned. 
 2. If the blocks (with size in several MBs) to datanode are written 
 continuously 
 then one iteration of {{BlockPoolSliceScanner#scan()}} will be continously 
 scanning the blocks
 3. These blocks will be deleted after some time (enough to get block scanned)
 4. As Block Scanning is throttled, So verification of all blocks will take so 
 much time.
 5. Rolling will never happen, so even though the total number of blocks in 
 datanode doesn't increases, entries ( which contains stale entries of deleted 
 blocks) in *dncp_block_verification.log.curr* continuously increases leading 
 to huge size.
 In one of our env, it grown more than 1TB where total number of blocks were 
 only ~45k.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5809) BlockPoolSliceScanner and high speed hdfs appending make datanode to drop into infinite loop

2014-07-15 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062385#comment-14062385
 ] 

Aaron T. Myers commented on HDFS-5809:
--

+1, the patch looks good to me. I agree that writing a unit test for this would 
be fairly difficult, and the fix is really quite clear, so I'm OK committing it 
without a test.

Thanks a lot for taking care of this, Colin, and tanks much to ikeweesung for 
reporting this issue.

 BlockPoolSliceScanner and high speed hdfs appending make datanode to drop 
 into infinite loop
 

 Key: HDFS-5809
 URL: https://issues.apache.org/jira/browse/HDFS-5809
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.0.0-alpha
 Environment: jdk1.6, centos6.4, 2.0.0-cdh4.5.0
Reporter: ikweesung
Assignee: Colin Patrick McCabe
Priority: Critical
  Labels: blockpoolslicescanner, datanode, infinite-loop
 Attachments: HDFS-5809.001.patch


 {{BlockPoolSliceScanner#scan}} contains a while loop that continues to 
 verify (i.e. scan) blocks until the {{blockInfoSet}} is empty (or some other 
 conditions like a timeout have occurred.)  In order to do this, it calls 
 {{BlockPoolSliceScanner#verifyFirstBlock}}.  This is intended to grab the 
 first block in the {{blockInfoSet}}, verify it, and remove it from that set.  
 ({{blockInfoSet}} is sorted by last scan time.) Unfortunately, if we hit a 
 certain bug in {{updateScanStatus}}, the block may never be removed from 
 {{blockInfoSet}}.  When this happens, we keep rescanning the exact same block 
 until the timeout hits.
 The bug is triggered when a block winds up in {{blockInfoSet}} but not in 
 {{blockMap}}.  You can see it clearly in this code:
 {code}
   private synchronized void updateScanStatus(Block block, 
  
  ScanType type,
  boolean scanOk) {
  
 BlockScanInfo info = blockMap.get(block);
   
  
 if ( info != null ) {
   delBlockInfo(info);
 } else {  
  
   // It might already be removed. Thats ok, it will be caught next time.  
  
   info = new BlockScanInfo(block);
  
 }   
 {code}
 If {{info == null}}, we never call {{delBlockInfo}}, the function which is 
 intended to remove the {{blockInfoSet}} entry.
 Luckily, there is a simple fix here... the variable that {{updateScanStatus}} 
 is being passed is actually a BlockInfo object, so we can simply call 
 {{delBlockInfo}} on it directly, without doing a lookup in the {{blockMap}}.  
 This is both faster and more robust.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6667) In HDFS HA mode, Distcp/SLive with webhdfs on secure cluster fails with Client cannot authenticate via:[TOKEN, KERBEROS] error

2014-07-15 Thread Haohui Mai (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062340#comment-14062340
 ] 

Haohui Mai commented on HDFS-6667:
--

Looks good to me. +1. I think that this patch implements the approach proposed 
by Daryn.

[~daryn], do you have any comments?

 In HDFS HA mode, Distcp/SLive with webhdfs on secure cluster fails with 
 Client cannot authenticate via:[TOKEN, KERBEROS] error
 --

 Key: HDFS-6667
 URL: https://issues.apache.org/jira/browse/HDFS-6667
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security
Reporter: Jian He
Assignee: Jing Zhao
 Attachments: HDFS-6667.000.patch


 Opening on [~arpitgupta]'s behalf.
 We observed that, in HDFS HA mode, running Distcp/SLive with webhdfs will 
 fail on YARN.  In non-HA mode, it'll pass. 
 The reason is in HA mode, only webhdfs delegation token is generated for the 
 job, but YARN also requires the regular hdfs token to do localization, 
 log-aggregation etc.
 In non-HA mode, both tokens are generated for the job.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6683) TestDFSAdminWithHA.testSaveNamespace failed with Timed out waiting for Mini HDFS Cluster to start

2014-07-15 Thread Yongjun Zhang (JIRA)
Yongjun Zhang created HDFS-6683:
---

 Summary: TestDFSAdminWithHA.testSaveNamespace failed with Timed 
out waiting for Mini HDFS Cluster to start
 Key: HDFS-6683
 URL: https://issues.apache.org/jira/browse/HDFS-6683
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, tools
Affects Versions: 3.0.0
Reporter: Yongjun Zhang


Test failure in quite some recent test runs:

{code}

org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.testSaveNamespace

Failing for the past 9 builds (Since Failed#7337 )
Took 12 sec.
Error Message

Timed out waiting for Mini HDFS Cluster to start
Stacktrace

java.io.IOException: Timed out waiting for Mini HDFS Cluster to start
at 
org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1097)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:732)
at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:378)
at 
org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:359)
at 
org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster.init(MiniQJMHACluster.java:102)
at 
org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster.init(MiniQJMHACluster.java:40)
at 
org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster$Builder.build(MiniQJMHACluster.java:67)
at 
org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.setUpHaCluster(TestDFSAdminWithHA.java:82)
at 
org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.testSaveNamespace(TestDFSAdminWithHA.java:134)
{code}




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6588) Investigating removing getTrueCause method in Server.java

2014-07-15 Thread Yongjun Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062400#comment-14062400
 ] 

Yongjun Zhang commented on HDFS-6588:
-

It appeared to be a glitch in the testing, re-upload the same patch to trigger 
a new run. However, there seems to be a real problem with 
org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.testSaveNamespace, which failed 
in other runs many times, filed HDFS-6683.




 Investigating removing getTrueCause method in Server.java
 -

 Key: HDFS-6588
 URL: https://issues.apache.org/jira/browse/HDFS-6588
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security, webhdfs
Affects Versions: 2.5.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6588.001.patch, HDFS-6588.001.patch


 When addressing Daryn Sharp's comment for HDFS-6475 quoted below:
 {quote}
 What I'm saying is I think the patch adds too much unnecessary code. Filing 
 an improvement to delete all but a few lines of the code changed in this 
 patch seems a bit odd. I think you just need to:
 - Delete getTrueCause entirely instead of moving it elsewhere
 - In saslProcess, just throw the exception instead of running it through 
 getTrueCause since it's not a InvalidToken wrapping another exception 
 anymore.
 - Keep your 3-line change to unwrap SecurityException in toResponse
 {quote}
 There are multiple test failures, after making the suggested changes, Filing 
 this jira to dedicate to the investigation of removing getTrueCause method.
 More detail will be put in the first comment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6588) Investigating removing getTrueCause method in Server.java

2014-07-15 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-6588:


Attachment: HDFS-6588.001.patch

 Investigating removing getTrueCause method in Server.java
 -

 Key: HDFS-6588
 URL: https://issues.apache.org/jira/browse/HDFS-6588
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security, webhdfs
Affects Versions: 2.5.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6588.001.patch, HDFS-6588.001.patch


 When addressing Daryn Sharp's comment for HDFS-6475 quoted below:
 {quote}
 What I'm saying is I think the patch adds too much unnecessary code. Filing 
 an improvement to delete all but a few lines of the code changed in this 
 patch seems a bit odd. I think you just need to:
 - Delete getTrueCause entirely instead of moving it elsewhere
 - In saslProcess, just throw the exception instead of running it through 
 getTrueCause since it's not a InvalidToken wrapping another exception 
 anymore.
 - Keep your 3-line change to unwrap SecurityException in toResponse
 {quote}
 There are multiple test failures, after making the suggested changes, Filing 
 this jira to dedicate to the investigation of removing getTrueCause method.
 More detail will be put in the first comment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6683) TestDFSAdminWithHA.testSaveNamespace failed with Timed out waiting for Mini HDFS Cluster to start

2014-07-15 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-6683:


Description: 
Test failure in quite some recent test runs:
...
https://builds.apache.org/job/PreCommit-HDFS-Build/7344/
https://builds.apache.org/job/PreCommit-HDFS-Build/7345/
https://builds.apache.org/job/PreCommit-HDFS-Build/7346/
...
{code}

org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.testSaveNamespace

Failing for the past 9 builds (Since Failed#7337 )
Took 12 sec.
Error Message

Timed out waiting for Mini HDFS Cluster to start
Stacktrace

java.io.IOException: Timed out waiting for Mini HDFS Cluster to start
at 
org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1097)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:732)
at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:378)
at 
org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:359)
at 
org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster.init(MiniQJMHACluster.java:102)
at 
org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster.init(MiniQJMHACluster.java:40)
at 
org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster$Builder.build(MiniQJMHACluster.java:67)
at 
org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.setUpHaCluster(TestDFSAdminWithHA.java:82)
at 
org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.testSaveNamespace(TestDFSAdminWithHA.java:134)
{code}


  was:
Test failure in quite some recent test runs:

{code}

org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.testSaveNamespace

Failing for the past 9 builds (Since Failed#7337 )
Took 12 sec.
Error Message

Timed out waiting for Mini HDFS Cluster to start
Stacktrace

java.io.IOException: Timed out waiting for Mini HDFS Cluster to start
at 
org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1097)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:732)
at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:378)
at 
org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:359)
at 
org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster.init(MiniQJMHACluster.java:102)
at 
org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster.init(MiniQJMHACluster.java:40)
at 
org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster$Builder.build(MiniQJMHACluster.java:67)
at 
org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.setUpHaCluster(TestDFSAdminWithHA.java:82)
at 
org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.testSaveNamespace(TestDFSAdminWithHA.java:134)
{code}



 TestDFSAdminWithHA.testSaveNamespace failed with Timed out waiting for Mini 
 HDFS Cluster to start
 ---

 Key: HDFS-6683
 URL: https://issues.apache.org/jira/browse/HDFS-6683
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha, tools
Affects Versions: 3.0.0
Reporter: Yongjun Zhang

 Test failure in quite some recent test runs:
 ...
 https://builds.apache.org/job/PreCommit-HDFS-Build/7344/
 https://builds.apache.org/job/PreCommit-HDFS-Build/7345/
 https://builds.apache.org/job/PreCommit-HDFS-Build/7346/
 ...
 {code}
 org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.testSaveNamespace
 Failing for the past 9 builds (Since Failed#7337 )
 Took 12 sec.
 Error Message
 Timed out waiting for Mini HDFS Cluster to start
 Stacktrace
 java.io.IOException: Timed out waiting for Mini HDFS Cluster to start
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1097)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:732)
   at org.apache.hadoop.hdfs.MiniDFSCluster.init(MiniDFSCluster.java:378)
   at 
 org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:359)
   at 
 org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster.init(MiniQJMHACluster.java:102)
   at 
 org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster.init(MiniQJMHACluster.java:40)
   at 
 org.apache.hadoop.hdfs.qjournal.MiniQJMHACluster$Builder.build(MiniQJMHACluster.java:67)
   at 
 org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.setUpHaCluster(TestDFSAdminWithHA.java:82)
   at 
 org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA.testSaveNamespace(TestDFSAdminWithHA.java:134)
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6684) HDFS NN and DN JSP pages do not check for script injection.

2014-07-15 Thread Jinghui Wang (JIRA)
Jinghui Wang created HDFS-6684:
--

 Summary: HDFS NN and DN JSP pages do not check for script 
injection.
 Key: HDFS-6684
 URL: https://issues.apache.org/jira/browse/HDFS-6684
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.1, 2.3.0, 2.2.0, 2.1.0-beta
Reporter: Jinghui Wang
Assignee: Jinghui Wang


Datanode's browseDirectory.jsp is not filtering script injection, able to 
inject a script with dir parameter using 
dir=/hadoop'\/scriptalert(759)/script.

NameNode's dfsnodelist.sjp is not filtering script injection either. Able to 
set the sorter/order parameter to DSC%20onMouseOver=alert(959)//.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6684) HDFS NN and DN JSP pages do not check for script injection.

2014-07-15 Thread Jinghui Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinghui Wang updated HDFS-6684:
---

Attachment: HDFS-6684.patch

 HDFS NN and DN JSP pages do not check for script injection.
 ---

 Key: HDFS-6684
 URL: https://issues.apache.org/jira/browse/HDFS-6684
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.1.0-beta, 2.2.0, 2.3.0, 2.4.1
Reporter: Jinghui Wang
Assignee: Jinghui Wang
 Attachments: HDFS-6684.patch


 Datanode's browseDirectory.jsp is not filtering script injection, able to 
 inject a script with dir parameter using 
 dir=/hadoop'\/scriptalert(759)/script.
 NameNode's dfsnodelist.sjp is not filtering script injection either. Able to 
 set the sorter/order parameter to DSC%20onMouseOver=alert(959)//.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6684) HDFS NN and DN JSP pages do not check for script injection.

2014-07-15 Thread Jinghui Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062406#comment-14062406
 ] 

Jinghui Wang commented on HDFS-6684:


Patch attached.

 HDFS NN and DN JSP pages do not check for script injection.
 ---

 Key: HDFS-6684
 URL: https://issues.apache.org/jira/browse/HDFS-6684
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.1.0-beta, 2.2.0, 2.3.0, 2.4.1
Reporter: Jinghui Wang
Assignee: Jinghui Wang
 Attachments: HDFS-6684.patch


 Datanode's browseDirectory.jsp is not filtering script injection, able to 
 inject a script with dir parameter using 
 dir=/hadoop'\/scriptalert(759)/script.
 NameNode's dfsnodelist.sjp is not filtering script injection either. Able to 
 set the sorter/order parameter to DSC%20onMouseOver=alert(959)//.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6584) Support archival storage

2014-07-15 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-6584:
--

Attachment: HDFSArchivalStorageDesign20140715.pdf

HDFSArchivalStorageDesign20140715.pdf: revised design doc.

 Support archival storage
 

 Key: HDFS-6584
 URL: https://issues.apache.org/jira/browse/HDFS-6584
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode, namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: HDFSArchivalStorageDesign20140623.pdf, 
 HDFSArchivalStorageDesign20140715.pdf


 In most of the Hadoop clusters, as more and more data is stored for longer 
 time, the demand for storage is outstripping the compute. Hadoop needs a cost 
 effective and easy to manage solution to meet this demand for storage. 
 Current solution is:
 - Delete the old unused data. This comes at operational cost of identifying 
 unnecessary data and deleting them manually.
 - Add more nodes to the clusters. This adds along with storage capacity 
 unnecessary compute capacity to the cluster.
 Hadoop needs a solution to decouple growing storage capacity from compute 
 capacity. Nodes with higher density and less expensive storage with low 
 compute power are becoming available and can be used as cold storage in the 
 clusters. Based on policy the data from hot storage can be moved to cold 
 storage. Adding more nodes to the cold storage can grow the storage 
 independent of the compute capacity in the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6584) Support Archival Storage

2014-07-15 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-6584:
--

Component/s: (was: datanode)
 balancer
Summary: Support Archival Storage  (was: Support archival storage)

 Support Archival Storage
 

 Key: HDFS-6584
 URL: https://issues.apache.org/jira/browse/HDFS-6584
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: balancer, namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
 Attachments: HDFSArchivalStorageDesign20140623.pdf, 
 HDFSArchivalStorageDesign20140715.pdf


 In most of the Hadoop clusters, as more and more data is stored for longer 
 time, the demand for storage is outstripping the compute. Hadoop needs a cost 
 effective and easy to manage solution to meet this demand for storage. 
 Current solution is:
 - Delete the old unused data. This comes at operational cost of identifying 
 unnecessary data and deleting them manually.
 - Add more nodes to the clusters. This adds along with storage capacity 
 unnecessary compute capacity to the cluster.
 Hadoop needs a solution to decouple growing storage capacity from compute 
 capacity. Nodes with higher density and less expensive storage with low 
 compute power are becoming available and can be used as cold storage in the 
 clusters. Based on policy the data from hot storage can be moved to cold 
 storage. Adding more nodes to the cold storage can grow the storage 
 independent of the compute capacity in the cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HDFS-6679) Archival Storage: Bump NameNodeLayoutVersion and update editsStored test files

2014-07-15 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze reassigned HDFS-6679:
-

Assignee: (was: Tsz Wo Nicholas Sze)

 Archival Storage: Bump NameNodeLayoutVersion and update editsStored test files
 --

 Key: HDFS-6679
 URL: https://issues.apache.org/jira/browse/HDFS-6679
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo Nicholas Sze

 HDFS-6677 changed fsimage for storing storage policy IDs.  We should bump the 
 NameNodeLayoutVersion and as well fix the tests.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6685) Archival Storage: Update Balancer to preserve storage type of replicas

2014-07-15 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-6685:
-

 Summary: Archival Storage: Update Balancer to preserve storage 
type of replicas
 Key: HDFS-6685
 URL: https://issues.apache.org/jira/browse/HDFS-6685
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: balancer
Reporter: Tsz Wo Nicholas Sze


In order to maintain storage policy requirement, Balancer always moves replicas 
from a storage with any type to another storage with the same type, i.e. it 
preserves storage type of replicas.  In this way, Balancer does not require to 
know storage policy information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-5809) BlockPoolSliceScanner and high speed hdfs appending make datanode to drop into infinite loop

2014-07-15 Thread Colin Patrick McCabe (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Patrick McCabe updated HDFS-5809:
---

   Resolution: Fixed
Fix Version/s: 2.6.0
   Status: Resolved  (was: Patch Available)

 BlockPoolSliceScanner and high speed hdfs appending make datanode to drop 
 into infinite loop
 

 Key: HDFS-5809
 URL: https://issues.apache.org/jira/browse/HDFS-5809
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.0.0-alpha
 Environment: jdk1.6, centos6.4, 2.0.0-cdh4.5.0
Reporter: ikweesung
Assignee: Colin Patrick McCabe
Priority: Critical
  Labels: blockpoolslicescanner, datanode, infinite-loop
 Fix For: 2.6.0

 Attachments: HDFS-5809.001.patch


 {{BlockPoolSliceScanner#scan}} contains a while loop that continues to 
 verify (i.e. scan) blocks until the {{blockInfoSet}} is empty (or some other 
 conditions like a timeout have occurred.)  In order to do this, it calls 
 {{BlockPoolSliceScanner#verifyFirstBlock}}.  This is intended to grab the 
 first block in the {{blockInfoSet}}, verify it, and remove it from that set.  
 ({{blockInfoSet}} is sorted by last scan time.) Unfortunately, if we hit a 
 certain bug in {{updateScanStatus}}, the block may never be removed from 
 {{blockInfoSet}}.  When this happens, we keep rescanning the exact same block 
 until the timeout hits.
 The bug is triggered when a block winds up in {{blockInfoSet}} but not in 
 {{blockMap}}.  You can see it clearly in this code:
 {code}
   private synchronized void updateScanStatus(Block block, 
  
  ScanType type,
  boolean scanOk) {
  
 BlockScanInfo info = blockMap.get(block);
   
  
 if ( info != null ) {
   delBlockInfo(info);
 } else {  
  
   // It might already be removed. Thats ok, it will be caught next time.  
  
   info = new BlockScanInfo(block);
  
 }   
 {code}
 If {{info == null}}, we never call {{delBlockInfo}}, the function which is 
 intended to remove the {{blockInfoSet}} entry.
 Luckily, there is a simple fix here... the variable that {{updateScanStatus}} 
 is being passed is actually a BlockInfo object, so we can simply call 
 {{delBlockInfo}} on it directly, without doing a lookup in the {{blockMap}}.  
 This is both faster and more robust.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6686) Archival Storage: Use fallback storage types

2014-07-15 Thread Tsz Wo Nicholas Sze (JIRA)
Tsz Wo Nicholas Sze created HDFS-6686:
-

 Summary: Archival Storage: Use fallback storage types
 Key: HDFS-6686
 URL: https://issues.apache.org/jira/browse/HDFS-6686
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze


HDFS-6671 changes replication monitor to use block storage policy for 
replication.  It should also use the fallback storage types when a particular 
type of storage is full.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-5809) BlockPoolSliceScanner and high speed hdfs appending make datanode to drop into infinite loop

2014-07-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062449#comment-14062449
 ] 

Hudson commented on HDFS-5809:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5883 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5883/])
HDFS-5809. BlockPoolSliceScanner and high speed hdfs appending make datanode to 
drop into infinite loop (cmccabe) (cmccabe: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1610790)
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockPoolSliceScanner.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/DataNodeTestUtils.java


 BlockPoolSliceScanner and high speed hdfs appending make datanode to drop 
 into infinite loop
 

 Key: HDFS-5809
 URL: https://issues.apache.org/jira/browse/HDFS-5809
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.0.0-alpha
 Environment: jdk1.6, centos6.4, 2.0.0-cdh4.5.0
Reporter: ikweesung
Assignee: Colin Patrick McCabe
Priority: Critical
  Labels: blockpoolslicescanner, datanode, infinite-loop
 Fix For: 2.6.0

 Attachments: HDFS-5809.001.patch


 {{BlockPoolSliceScanner#scan}} contains a while loop that continues to 
 verify (i.e. scan) blocks until the {{blockInfoSet}} is empty (or some other 
 conditions like a timeout have occurred.)  In order to do this, it calls 
 {{BlockPoolSliceScanner#verifyFirstBlock}}.  This is intended to grab the 
 first block in the {{blockInfoSet}}, verify it, and remove it from that set.  
 ({{blockInfoSet}} is sorted by last scan time.) Unfortunately, if we hit a 
 certain bug in {{updateScanStatus}}, the block may never be removed from 
 {{blockInfoSet}}.  When this happens, we keep rescanning the exact same block 
 until the timeout hits.
 The bug is triggered when a block winds up in {{blockInfoSet}} but not in 
 {{blockMap}}.  You can see it clearly in this code:
 {code}
   private synchronized void updateScanStatus(Block block, 
  
  ScanType type,
  boolean scanOk) {
  
 BlockScanInfo info = blockMap.get(block);
   
  
 if ( info != null ) {
   delBlockInfo(info);
 } else {  
  
   // It might already be removed. Thats ok, it will be caught next time.  
  
   info = new BlockScanInfo(block);
  
 }   
 {code}
 If {{info == null}}, we never call {{delBlockInfo}}, the function which is 
 intended to remove the {{blockInfoSet}} entry.
 Luckily, there is a simple fix here... the variable that {{updateScanStatus}} 
 is being passed is actually a BlockInfo object, so we can simply call 
 {{delBlockInfo}} on it directly, without doing a lookup in the {{blockMap}}.  
 This is both faster and more robust.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6340) DN can't finalize upgrade

2014-07-15 Thread Vitaliy Fuks (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062470#comment-14062470
 ] 

Vitaliy Fuks commented on HDFS-6340:


Is anyone aware of any way to work around this issue, with upgrading?

 DN can't finalize upgrade
 -

 Key: HDFS-6340
 URL: https://issues.apache.org/jira/browse/HDFS-6340
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.4.0
Reporter: Rahul Singhal
Assignee: Rahul Singhal
Priority: Blocker
 Fix For: 3.0.0, 2.4.1

 Attachments: HDFS-6340-branch-2.4.0.patch, HDFS-6340.02.patch, 
 HDFS-6340.patch


 I upgraded a (NN) HA cluster from 2.2.0 to 2.4.0. After I issued the 
 '-finalizeUpgarde' command, NN was able to finalize the upgrade but DN 
 couldn't (I waited for the next block report).
 I think I have found the problem to be due to HDFS-5153. I will attach a 
 proposed fix.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6597) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed

2014-07-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062471#comment-14062471
 ] 

Hadoop QA commented on HDFS-6597:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12655785/HDFS-6597.04.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS
  org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7349//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7349//console

This message is automatically generated.

 Add a new option to NN upgrade to terminate the process after upgrade on NN 
 is completed
 

 Key: HDFS-6597
 URL: https://issues.apache.org/jira/browse/HDFS-6597
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Danilo Vunjak
 Attachments: HDFS-6597.04.patch, JIRA-HDFS-30.patch, 
 JIRA-HDFS-6597.02.patch, JIRA-HDFS-6597.03.patch, JIRA-HDFS-6597.patch


 Currently when namenode is started for upgrade (hadoop namenode -upgrade 
 command), after finishing upgrade of metadata, namenode starts working 
 normally and wait for datanodes to upgrade itself and connect to to NN. We 
 need to have option for upgrading only NN metadata, so after upgrade is 
 finished on NN, process should terminate.
 I have tested it by changing in file: hdfs.server.namenode.NameNode.java, 
 method: public static NameNode createNameNode(String argv[], Configuration 
 conf):
  in switch added
  case UPGRADE:
 case UPGRADE:
   {
 DefaultMetricsSystem.initialize(NameNode);
   NameNode nameNode = new NameNode(conf);
   if (startOpt.getForceUpgrade()) {
 terminate(0);
 return null;
   }
   
   return nameNode;
   }
 This did upgrade of metadata, closed process after finished, and later when 
 all services were started, upgrade of datanodes finished sucessfully and 
 system run .
 What I'm suggesting right now is to add new startup parameter -force, so 
 namenode can be started like this hadoop namenode -upgrade -force, so we 
 can indicate that we want to terminate process after upgrade metadata on NN 
 is finished. Old functionality should be preserved, so users can run hadoop 
 namenode -upgrade on same way and with same behaviour as it was previous.
  Thanks,
  Danilo



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6340) DN can't finalize upgrade

2014-07-15 Thread Arpit Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062480#comment-14062480
 ] 

Arpit Agarwal commented on HDFS-6340:
-

You can manually delete the 'previous' directory on each DN and also 
'blocksBeingWritten', if it exists. This will effectively finalize the DN 
upgrade.

 DN can't finalize upgrade
 -

 Key: HDFS-6340
 URL: https://issues.apache.org/jira/browse/HDFS-6340
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.4.0
Reporter: Rahul Singhal
Assignee: Rahul Singhal
Priority: Blocker
 Fix For: 3.0.0, 2.4.1

 Attachments: HDFS-6340-branch-2.4.0.patch, HDFS-6340.02.patch, 
 HDFS-6340.patch


 I upgraded a (NN) HA cluster from 2.2.0 to 2.4.0. After I issued the 
 '-finalizeUpgarde' command, NN was able to finalize the upgrade but DN 
 couldn't (I waited for the next block report).
 I think I have found the problem to be due to HDFS-5153. I will attach a 
 proposed fix.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6616) bestNode shouldn't always return the first DataNode

2014-07-15 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062482#comment-14062482
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6616:
---

@ zhaoyunjiong, are you going to post a new patch?

 bestNode shouldn't always return the first DataNode
 ---

 Key: HDFS-6616
 URL: https://issues.apache.org/jira/browse/HDFS-6616
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: webhdfs
Reporter: zhaoyunjiong
Assignee: zhaoyunjiong
Priority: Minor
 Attachments: HDFS-6616.patch


 When we are doing distcp between clusters, job failed:
 014-06-30 20:56:28,430 INFO org.apache.hadoop.tools.DistCp: FAIL 
 part-r-00101.avro : java.net.NoRouteToHostException: No route to host
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
 sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1491)
   at java.security.AccessController.doPrivileged(Native Method)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1485)
   at 
 sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1139)
   at 
 java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:379)
   at org.apache.hadoop.hdfs.HftpFileSystem.open(HftpFileSystem.java:322)
   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:427)
   at org.apache.hadoop.tools.DistCp$CopyFilesMapper.copy(DistCp.java:419)
   at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:547)
   at org.apache.hadoop.tools.DistCp$CopyFilesMapper.map(DistCp.java:314)
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:429)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:365)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:396)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 The root reason is one of the DataNode can't access from outside, but inside 
 cluster, it's health.
 In NamenodeWebHdfsMethods.java:bestNode, it always return the first DataNode, 
 so even after the distcp retries, it still failed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6658) Namenode memory optimization - Block replicas list

2014-07-15 Thread Amir Langer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062498#comment-14062498
 ] 

Amir Langer commented on HDFS-6658:
---

We explored the idea of off-heap memory for the Namenode.
It makes sense for Inodes and there was already some work on that done at 
Hortonworks.
For the blocks however there is a problem - Blocks data has two very different 
access patterns.
Clients will typically access a few blocks (from same or similar files) and 
mostly the recent ones, while block reports can scan the entire block space.
This means there is no locality of reference and caching is not going to work.
If we don't have caching, we need to cope with the added latency of off-heap 
memory - It is after all backed up by a file.
From our measurements - this cost seems too high with some block reports seem 
to never be able to finish. (Just think of the cost of the off-heap management 
keep needing to load pages from the file into its memory and its page caching 
not having any effect).



 Namenode memory optimization - Block replicas list 
 ---

 Key: HDFS-6658
 URL: https://issues.apache.org/jira/browse/HDFS-6658
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.4.1
Reporter: Amir Langer
Assignee: Amir Langer
 Attachments: Namenode Memory Optimizations - Block replicas list.docx


 Part of the memory consumed by every BlockInfo object in the Namenode is a 
 linked list of block references for every DatanodeStorageInfo (called 
 triplets). 
 We propose to change the way we store the list in memory. 
 Using primitive integer indexes instead of object references will reduce the 
 memory needed for every block replica (when compressed oops is disabled) and 
 in our new design the list overhead will be per DatanodeStorageInfo and not 
 per block replica.
 see attached design doc. for details and evaluation results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6469) Coordinated replication of the namespace using ConsensusNode

2014-07-15 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062502#comment-14062502
 ] 

Sanjay Radia commented on HDFS-6469:


bq. I meant that if you use QJM then every update on the NameNode results in 
writing into two journals: first into edits log and then into QJM journal. 
Konstantine, HDFS has supported parallel journals (ie multiple editlogs for a 
long time.) that are written in parallel. A customer can use just QJM (which 
gives at least 3 replicas) and can optionally have a local parallel editlog if 
they want additional redundancy. What you are proposing is dual *serial* 
journals. 

 Coordinated replication of the namespace using ConsensusNode
 

 Key: HDFS-6469
 URL: https://issues.apache.org/jira/browse/HDFS-6469
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Attachments: CNodeDesign.pdf


 This is a proposal to introduce ConsensusNode - an evolution of the NameNode, 
 which enables replication of the namespace on multiple nodes of an HDFS 
 cluster by means of a Coordination Engine.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6687) nn.getNamesystem() may return NPE from JspHelper

2014-07-15 Thread Mit Desai (JIRA)
Mit Desai created HDFS-6687:
---

 Summary: nn.getNamesystem() may return NPE from JspHelper
 Key: HDFS-6687
 URL: https://issues.apache.org/jira/browse/HDFS-6687
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Mit Desai
Assignee: Mit Desai


In hadoop-2, the http server is started in the very early stage to show the 
progress. If the user tries to get the name system, it may not be completely up 
and the NN logs will have this kind of error.

{noformat}
2014-07-14 15:49:03,521 [***] WARN
resources.ExceptionHandler: INTERNAL_SERVER_ERROR
java.lang.NullPointerException
at
org.apache.hadoop.hdfs.server.common.JspHelper.getTokenUGI(JspHelper.java:661)
at
org.apache.hadoop.hdfs.server.common.JspHelper.getUGI(JspHelper.java:604)
at
org.apache.hadoop.hdfs.web.resources.UserProvider.getValue(UserProvider.java:53)
at
org.apache.hadoop.hdfs.web.resources.UserProvider.getValue(UserProvider.java:41)
at
com.sun.jersey.server.impl.inject.InjectableValuesProvider.getInjectableValues(InjectableValuesProvider.java:46)
at
com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$EntityParamInInvoker.getParams(AbstractResourceMethodDispatchProvider.java:153)
at
com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:203)
at
com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
at
com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
at
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at
com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
at
com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
at
com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
at
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)
at
com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)
at
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)
at
com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)
at
com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416)
at
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
at
com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:699)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at org.apache.hadoop.hdfs.web.AuthFilter.doFilter(AuthFilter.java:84)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:78)
at com.yahoo.hadoop.GzipFilter.doFilter(GzipFilter.java:220)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at
org.apache.hadoop.http.HttpServer2$QuotingInputFilter.doFilter(HttpServer2.java:1223)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at 

[jira] [Commented] (HDFS-6658) Namenode memory optimization - Block replicas list

2014-07-15 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062514#comment-14062514
 ] 

Colin Patrick McCabe commented on HDFS-6658:


bq. If we don't have caching, we need to cope with the added latency of 
off-heap memory - It is after all backed up by a file.

Amir, there's no file involved.  See my comment here: 
https://issues.apache.org/jira/browse/HDFS-6658?focusedCommentId=14061374page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14061374

I'm talking about memory.  Memory, not disk.  It is simply RAM that is not 
managed by the JVM.  There's more information here: 
http://stackoverflow.com/questions/6091615/difference-between-on-heap-and-off-heap.

 Namenode memory optimization - Block replicas list 
 ---

 Key: HDFS-6658
 URL: https://issues.apache.org/jira/browse/HDFS-6658
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.4.1
Reporter: Amir Langer
Assignee: Amir Langer
 Attachments: Namenode Memory Optimizations - Block replicas list.docx


 Part of the memory consumed by every BlockInfo object in the Namenode is a 
 linked list of block references for every DatanodeStorageInfo (called 
 triplets). 
 We propose to change the way we store the list in memory. 
 Using primitive integer indexes instead of object references will reduce the 
 memory needed for every block replica (when compressed oops is disabled) and 
 in our new design the list overhead will be per DatanodeStorageInfo and not 
 per block replica.
 see attached design doc. for details and evaluation results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6688) Hadoop JMX stats are not refreshed

2014-07-15 Thread Biju Nair (JIRA)
Biju Nair created HDFS-6688:
---

 Summary: Hadoop JMX stats are not refreshed
 Key: HDFS-6688
 URL: https://issues.apache.org/jira/browse/HDFS-6688
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: Ubuntu
Reporter: Biju Nair


Even when the HDFS datanode process is stopped the JMX attribute 
Hadoop.NameNode.FSNamesystemState.NumLiveDataNodes/NumDeadDataNodes attribute 
values doesn't change. Also Hadoop.NameNode.NameNodeInfo.Attributes.LiveNodes 
shows the stopped datanode details. If these attributes reflect the actual 
changes in the datanode, they can be used to monitor the health of the HDFS 
cluster which currently can't be used.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6469) Coordinated replication of the namespace using ConsensusNode

2014-07-15 Thread Sanjay Radia (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062559#comment-14062559
 ] 

Sanjay Radia commented on HDFS-6469:


Todd said:
bq. a fully usable solution would be available to the community at large, 
whereas the design you're proposing seems like it will only be usably 
implemented by a proprietary extension (I don't consider the ZK reference 
implementation likely to actually work in a usable fashion).

Konstanine I had mentioned exactly the above point to you at the Hadoop summit 
Europe.  ZK is a coordination service and for this to be practical it needs to 
be an inline Paxos protocol. We had also discussed 2 potential  paxos libraries 
 that could come into open source: I believe Facebook has one that they may 
contribute and CMU has one called E-Paxos; if either of these become available 
then it addresses this particular issue. I have no objections to a customer 
going to Wandisco for the enterprise supported  version, but if the community 
is going to maintain such an extension then there needs to a practical, 
in-production-usable  free solution; sending offline messages to a coordinator 
service  for each transaction is not usable. Lets discuss the performance part 
in a separate comment. Let me comment on your comparisons to  the topology and 
windows examples that the community supported in the past:
* Topology - these changes allowed Hadoop to be used on containers such as VMs. 
** Both KVM and VirtualBox offer free VM solutions - the customer does not need 
to buy ESX.  
** The topology solution would will also help with a Docker container 
deployment which is freely available and offers even better performance than 
VMs. 
** Hadoop is commonly used in cloud environment (e.g. AWS, or Azure, or 
Altiscale) which all use VMs or containers
** Further, it was recognized that while, in the past, we had considered racks 
to be a failure zone, that there could be other failure zones: nodes (for the 
case of VMs or containers on a host) and also groups of machines.
* Windows - this was done for platform support which is very different than 
what we are talking about here; many open source solutions support multiple 
platforms to enable the widest adoption. BTW Hadoop supported windows via 
cygwin but we made it first class since the initial support via cygwin was 
messy. 

 Coordinated replication of the namespace using ConsensusNode
 

 Key: HDFS-6469
 URL: https://issues.apache.org/jira/browse/HDFS-6469
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: namenode
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko
Assignee: Konstantin Shvachko
 Attachments: CNodeDesign.pdf


 This is a proposal to introduce ConsensusNode - an evolution of the NameNode, 
 which enables replication of the namespace on multiple nodes of an HDFS 
 cluster by means of a Coordination Engine.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6560) Byte array native checksumming on DN side

2014-07-15 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-6560:
--

Issue Type: Improvement  (was: Sub-task)
Parent: (was: HDFS-3528)

 Byte array native checksumming on DN side
 -

 Key: HDFS-6560
 URL: https://issues.apache.org/jira/browse/HDFS-6560
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: performance
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-3528.patch, HDFS-6560.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6560) Byte array native checksumming on DN side

2014-07-15 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062561#comment-14062561
 ] 

Todd Lipcon commented on HDFS-6560:
---

A few notes on the patch:

{code}
+  sums_addr = (*env)-GetPrimitiveArrayCritical(env, j_sums, NULL);
+  data_addr = (*env)-GetPrimitiveArrayCritical(env, j_data, NULL);
+
+  if (unlikely(!sums_addr || !data_addr)) {
+(*env)-ReleasePrimitiveArrayCritical(env, j_data, data_addr, 0);
+(*env)-ReleasePrimitiveArrayCritical(env, j_sums, sums_addr, 0);
{code}

Here it seems like you might call Release() on a NULL address. I can't tell 
from reading the spec whether that's safe or not, but maybe best to guard the 
Release calls and only release non-NULL addresses.

{code}
+  ret = bulk_verify_crc(data, MIN(numChecksumsInMB * bytes_per_checksum,
+data_len - checksumNum * bytes_per_checksum), sums,
+crc_type, bytes_per_checksum, error_data);
{code}
style nit: given that the second line here is an argument to MIN, it should 
probably wrap more like:
{code}
  ret = bulk_verify_crc(data, MIN(numChecksumsInMB * bytes_per_checksum,
  data_len - checksumNum * 
bytes_per_checksum),
sums, crc_type, bytes_per_checksum, error_data);
{code}

or assign the MIN result to a temporary value like 'len'

{code}
+long pos = base_pos + (error_data.bad_data - data) + checksumNum *
+bytes_per_checksum;
{code}
style: indentation is off a bit here (continuation line should indent)

Also, I'll move this to the HADOOP project since it only affects code in common/

 Byte array native checksumming on DN side
 -

 Key: HDFS-6560
 URL: https://issues.apache.org/jira/browse/HDFS-6560
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: performance
Reporter: James Thomas
Assignee: James Thomas
 Attachments: HDFS-3528.patch, HDFS-6560.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6455) NFS: Exception should be added in NFS log for invalid separator in allowed.hosts

2014-07-15 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062574#comment-14062574
 ] 

Brandon Li commented on HDFS-6455:
--

Sorry for the late reply. The time out is possibly due to this code in the 
patch:
{noformat}
+  if (hostsMatcher != null) {
+hostsMatchers.add(hostsMatcher);
+out = MountResponse.writeExportList(out, xid, exports, hostsMatchers);
+  }
{noformat}

If hostMatcher is null, it doesn't send response back.
I would suggest fixing HDFS-6456 first since it will fix part of the problem. 
After patch to HDFS-6456 is committed, this problem will be easier to fix.


 NFS: Exception should be added in NFS log for invalid separator in 
 allowed.hosts
 

 Key: HDFS-6455
 URL: https://issues.apache.org/jira/browse/HDFS-6455
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.2.0
Reporter: Yesha Vora
 Attachments: HDFS-6455.patch


 The error for invalid separator in dfs.nfs.exports.allowed.hosts property 
 should be added in nfs log file instead nfs.out file.
 Steps to reproduce:
 1. Pass invalid separator in dfs.nfs.exports.allowed.hosts
 {noformat}
 propertynamedfs.nfs.exports.allowed.hosts/namevaluehost1  ro:host2 
 rw/value/property
 {noformat}
 2. restart NFS server. NFS server fails to start and print exception console.
 {noformat}
 [hrt_qa@host1 hwqe]$ ssh -o StrictHostKeyChecking=no -o 
 UserKnownHostsFile=/dev/null host1 sudo su - -c 
 \/usr/lib/hadoop/sbin/hadoop-daemon.sh start nfs3\ hdfs
 starting nfs3, logging to /tmp/log/hadoop/hdfs/hadoop-hdfs-nfs3-horst1.out
 DEPRECATED: Use of this script to execute hdfs command is deprecated.
 Instead use the hdfs command for it.
 Exception in thread main java.lang.IllegalArgumentException: Incorrectly 
 formatted line 'host1 ro:host2 rw'
   at org.apache.hadoop.nfs.NfsExports.getMatch(NfsExports.java:356)
   at org.apache.hadoop.nfs.NfsExports.init(NfsExports.java:151)
   at org.apache.hadoop.nfs.NfsExports.getInstance(NfsExports.java:54)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.init(RpcProgramNfs3.java:176)
   at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.init(Nfs3.java:43)
   at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.main(Nfs3.java:59)
 {noformat}
 NFS log does not print any error message. It directly shuts down. 
 {noformat}
 STARTUP_MSG:   java = 1.6.0_31
 /
 2014-05-27 18:47:13,972 INFO  nfs3.Nfs3Base (SignalLogger.java:register(91)) 
 - registered UNIX signal handlers for [TERM, HUP, INT]
 2014-05-27 18:47:14,169 INFO  nfs3.IdUserGroup 
 (IdUserGroup.java:updateMapInternal(159)) - Updated user map size:259
 2014-05-27 18:47:14,179 INFO  nfs3.IdUserGroup 
 (IdUserGroup.java:updateMapInternal(159)) - Updated group map size:73
 2014-05-27 18:47:14,192 INFO  nfs3.Nfs3Base (StringUtils.java:run(640)) - 
 SHUTDOWN_MSG:
 /
 SHUTDOWN_MSG: Shutting down Nfs3 at 
 {noformat}
 NFS.out file has exception.
 {noformat}
 EPRECATED: Use of this script to execute hdfs command is deprecated.
 Instead use the hdfs command for it.
 Exception in thread main java.lang.IllegalArgumentException: Incorrectly 
 formatted line 'host1 ro:host2 rw'
 at org.apache.hadoop.nfs.NfsExports.getMatch(NfsExports.java:356)
 at org.apache.hadoop.nfs.NfsExports.init(NfsExports.java:151)
 at org.apache.hadoop.nfs.NfsExports.getInstance(NfsExports.java:54)
 at 
 org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.init(RpcProgramNfs3.java:176)
 at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.init(Nfs3.java:43)
 at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.main(Nfs3.java:59)
 ulimit -a for user hdfs
 core file size  (blocks, -c) 409600
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 188893
 max locked memory   (kbytes, -l) unlimited
 max memory size (kbytes, -m) unlimited
 open files  (-n) 32768
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 10240
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 65536
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6658) Namenode memory optimization - Block replicas list

2014-07-15 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062593#comment-14062593
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6658:
---

Nice idea!

For processing block report using BitSet, are the bits correspond to the block 
indices in DatanodeStorageInfo?  I think it can be eliminated by overwriting 
length, say setting it to (- length -1).  Set it back when computing the 
toRemove list.

 Namenode memory optimization - Block replicas list 
 ---

 Key: HDFS-6658
 URL: https://issues.apache.org/jira/browse/HDFS-6658
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.4.1
Reporter: Amir Langer
Assignee: Amir Langer
 Attachments: Namenode Memory Optimizations - Block replicas list.docx


 Part of the memory consumed by every BlockInfo object in the Namenode is a 
 linked list of block references for every DatanodeStorageInfo (called 
 triplets). 
 We propose to change the way we store the list in memory. 
 Using primitive integer indexes instead of object references will reduce the 
 memory needed for every block replica (when compressed oops is disabled) and 
 in our new design the list overhead will be per DatanodeStorageInfo and not 
 per block replica.
 see attached design doc. for details and evaluation results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6456) NFS should throw error for invalid entry in dfs.nfs.exports.allowed.hosts

2014-07-15 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-6456:
-

Summary: NFS should throw error for invalid entry in 
dfs.nfs.exports.allowed.hosts  (was: NFS: NFS server should throw error for 
invalid entry in dfs.nfs.exports.allowed.hosts)

 NFS should throw error for invalid entry in dfs.nfs.exports.allowed.hosts
 -

 Key: HDFS-6456
 URL: https://issues.apache.org/jira/browse/HDFS-6456
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.2.0
Reporter: Yesha Vora
Assignee: Abhiraj Butala
 Attachments: HDFS-6456.patch


 Pass invalid entry in dfs.nfs.exports.allowed.hosts. Use - as separator 
 between hostname and access permission 
 {noformat}
 propertynamedfs.nfs.exports.allowed.hosts/namevaluehost1-rw/value/property
 {noformat}
 This misconfiguration is not detected by NFS server. It does not print any 
 error message. The host passed in this configuration is also not able to 
 mount nfs. In conclusion, no node can mount the nfs with this value. A format 
 check is required for this property. If the value of this property does not 
 follow the format, an error should be thrown.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6456) NFS: NFS server should throw error for invalid entry in dfs.nfs.exports.allowed.hosts

2014-07-15 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062600#comment-14062600
 ] 

Brandon Li commented on HDFS-6456:
--

+1. Patch looks good to me. 

 NFS: NFS server should throw error for invalid entry in 
 dfs.nfs.exports.allowed.hosts
 -

 Key: HDFS-6456
 URL: https://issues.apache.org/jira/browse/HDFS-6456
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.2.0
Reporter: Yesha Vora
Assignee: Abhiraj Butala
 Attachments: HDFS-6456.patch


 Pass invalid entry in dfs.nfs.exports.allowed.hosts. Use - as separator 
 between hostname and access permission 
 {noformat}
 propertynamedfs.nfs.exports.allowed.hosts/namevaluehost1-rw/value/property
 {noformat}
 This misconfiguration is not detected by NFS server. It does not print any 
 error message. The host passed in this configuration is also not able to 
 mount nfs. In conclusion, no node can mount the nfs with this value. A format 
 check is required for this property. If the value of this property does not 
 follow the format, an error should be thrown.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6456) NFS should throw error for invalid entry in dfs.nfs.exports.allowed.hosts

2014-07-15 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-6456:
-

Fix Version/s: 2.6.0

 NFS should throw error for invalid entry in dfs.nfs.exports.allowed.hosts
 -

 Key: HDFS-6456
 URL: https://issues.apache.org/jira/browse/HDFS-6456
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.2.0
Reporter: Yesha Vora
Assignee: Abhiraj Butala
 Fix For: 2.6.0

 Attachments: HDFS-6456.patch


 Pass invalid entry in dfs.nfs.exports.allowed.hosts. Use - as separator 
 between hostname and access permission 
 {noformat}
 propertynamedfs.nfs.exports.allowed.hosts/namevaluehost1-rw/value/property
 {noformat}
 This misconfiguration is not detected by NFS server. It does not print any 
 error message. The host passed in this configuration is also not able to 
 mount nfs. In conclusion, no node can mount the nfs with this value. A format 
 check is required for this property. If the value of this property does not 
 follow the format, an error should be thrown.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6456) NFS should throw error for invalid entry in dfs.nfs.exports.allowed.hosts

2014-07-15 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-6456:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

 NFS should throw error for invalid entry in dfs.nfs.exports.allowed.hosts
 -

 Key: HDFS-6456
 URL: https://issues.apache.org/jira/browse/HDFS-6456
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.2.0
Reporter: Yesha Vora
Assignee: Abhiraj Butala
 Fix For: 2.6.0

 Attachments: HDFS-6456.patch


 Pass invalid entry in dfs.nfs.exports.allowed.hosts. Use - as separator 
 between hostname and access permission 
 {noformat}
 propertynamedfs.nfs.exports.allowed.hosts/namevaluehost1-rw/value/property
 {noformat}
 This misconfiguration is not detected by NFS server. It does not print any 
 error message. The host passed in this configuration is also not able to 
 mount nfs. In conclusion, no node can mount the nfs with this value. A format 
 check is required for this property. If the value of this property does not 
 follow the format, an error should be thrown.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6456) NFS should throw error for invalid entry in dfs.nfs.exports.allowed.hosts

2014-07-15 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062622#comment-14062622
 ] 

Hudson commented on HDFS-6456:
--

FAILURE: Integrated in Hadoop-trunk-Commit #5886 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5886/])
HDFS-6456. NFS should throw error for invalid entry in 
dfs.nfs.exports.allowed.hosts. Contributed by Abhiraj Butala (brandonli: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1610840)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/main/java/org/apache/hadoop/nfs/NfsExports.java
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-nfs/src/test/java/org/apache/hadoop/nfs/TestNfsExports.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt


 NFS should throw error for invalid entry in dfs.nfs.exports.allowed.hosts
 -

 Key: HDFS-6456
 URL: https://issues.apache.org/jira/browse/HDFS-6456
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.2.0
Reporter: Yesha Vora
Assignee: Abhiraj Butala
 Fix For: 2.6.0

 Attachments: HDFS-6456.patch


 Pass invalid entry in dfs.nfs.exports.allowed.hosts. Use - as separator 
 between hostname and access permission 
 {noformat}
 propertynamedfs.nfs.exports.allowed.hosts/namevaluehost1-rw/value/property
 {noformat}
 This misconfiguration is not detected by NFS server. It does not print any 
 error message. The host passed in this configuration is also not able to 
 mount nfs. In conclusion, no node can mount the nfs with this value. A format 
 check is required for this property. If the value of this property does not 
 follow the format, an error should be thrown.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6455) NFS: Exception should be added in NFS log for invalid separator in allowed.hosts

2014-07-15 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062621#comment-14062621
 ] 

Brandon Li commented on HDFS-6455:
--

[~abutala], HDFS-6456 has been fixed. Please rebased the current patch. I think 
it should be a smaller change now. :-)

 NFS: Exception should be added in NFS log for invalid separator in 
 allowed.hosts
 

 Key: HDFS-6455
 URL: https://issues.apache.org/jira/browse/HDFS-6455
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.2.0
Reporter: Yesha Vora
 Attachments: HDFS-6455.patch


 The error for invalid separator in dfs.nfs.exports.allowed.hosts property 
 should be added in nfs log file instead nfs.out file.
 Steps to reproduce:
 1. Pass invalid separator in dfs.nfs.exports.allowed.hosts
 {noformat}
 propertynamedfs.nfs.exports.allowed.hosts/namevaluehost1  ro:host2 
 rw/value/property
 {noformat}
 2. restart NFS server. NFS server fails to start and print exception console.
 {noformat}
 [hrt_qa@host1 hwqe]$ ssh -o StrictHostKeyChecking=no -o 
 UserKnownHostsFile=/dev/null host1 sudo su - -c 
 \/usr/lib/hadoop/sbin/hadoop-daemon.sh start nfs3\ hdfs
 starting nfs3, logging to /tmp/log/hadoop/hdfs/hadoop-hdfs-nfs3-horst1.out
 DEPRECATED: Use of this script to execute hdfs command is deprecated.
 Instead use the hdfs command for it.
 Exception in thread main java.lang.IllegalArgumentException: Incorrectly 
 formatted line 'host1 ro:host2 rw'
   at org.apache.hadoop.nfs.NfsExports.getMatch(NfsExports.java:356)
   at org.apache.hadoop.nfs.NfsExports.init(NfsExports.java:151)
   at org.apache.hadoop.nfs.NfsExports.getInstance(NfsExports.java:54)
   at 
 org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.init(RpcProgramNfs3.java:176)
   at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.init(Nfs3.java:43)
   at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.main(Nfs3.java:59)
 {noformat}
 NFS log does not print any error message. It directly shuts down. 
 {noformat}
 STARTUP_MSG:   java = 1.6.0_31
 /
 2014-05-27 18:47:13,972 INFO  nfs3.Nfs3Base (SignalLogger.java:register(91)) 
 - registered UNIX signal handlers for [TERM, HUP, INT]
 2014-05-27 18:47:14,169 INFO  nfs3.IdUserGroup 
 (IdUserGroup.java:updateMapInternal(159)) - Updated user map size:259
 2014-05-27 18:47:14,179 INFO  nfs3.IdUserGroup 
 (IdUserGroup.java:updateMapInternal(159)) - Updated group map size:73
 2014-05-27 18:47:14,192 INFO  nfs3.Nfs3Base (StringUtils.java:run(640)) - 
 SHUTDOWN_MSG:
 /
 SHUTDOWN_MSG: Shutting down Nfs3 at 
 {noformat}
 NFS.out file has exception.
 {noformat}
 EPRECATED: Use of this script to execute hdfs command is deprecated.
 Instead use the hdfs command for it.
 Exception in thread main java.lang.IllegalArgumentException: Incorrectly 
 formatted line 'host1 ro:host2 rw'
 at org.apache.hadoop.nfs.NfsExports.getMatch(NfsExports.java:356)
 at org.apache.hadoop.nfs.NfsExports.init(NfsExports.java:151)
 at org.apache.hadoop.nfs.NfsExports.getInstance(NfsExports.java:54)
 at 
 org.apache.hadoop.hdfs.nfs.nfs3.RpcProgramNfs3.init(RpcProgramNfs3.java:176)
 at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.init(Nfs3.java:43)
 at org.apache.hadoop.hdfs.nfs.nfs3.Nfs3.main(Nfs3.java:59)
 ulimit -a for user hdfs
 core file size  (blocks, -c) 409600
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 188893
 max locked memory   (kbytes, -l) unlimited
 max memory size (kbytes, -m) unlimited
 open files  (-n) 32768
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 10240
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 65536
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (HDFS-5464) Simplify block report diff calculation

2014-07-15 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze reopened HDFS-5464:
---


 Simplify block report diff calculation
 --

 Key: HDFS-5464
 URL: https://issues.apache.org/jira/browse/HDFS-5464
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h5464_20131105.patch, h5464_20131105b.patch, 
 h5464_20131105c.patch


 The current calculation in BlockManager.reportDiff(..) is unnecessarily 
 complicated.  We could simplify the calculation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6509) distcp vs Data At Rest Encryption

2014-07-15 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-6509:
---

Attachment: HDFS-6509distcpandDataatRestEncryption-2.pdf

I've made some revisions to the doc:

. Fixed some typos
. Added an alternative proposal made by [~andrew.wang] to have a raw.* extended 
attribute namespace.
. Made the wording about the raw namespace only being accessible by the HDFS 
super user.


 distcp vs Data At Rest Encryption
 -

 Key: HDFS-6509
 URL: https://issues.apache.org/jira/browse/HDFS-6509
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: security
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-6509distcpandDataatRestEncryption-2.pdf, 
 HDFS-6509distcpandDataatRestEncryption.pdf


 distcp needs to work with Data At Rest Encryption



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HDFS-6619) Clean up encryption-related tests

2014-07-15 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang resolved HDFS-6619.
---

   Resolution: Fixed
Fix Version/s: fs-encryption (HADOOP-10150 and HDFS-6134)

Thanks for the reviews guys, committed to fs-encryption branch.

 Clean up encryption-related tests
 -

 Key: HDFS-6619
 URL: https://issues.apache.org/jira/browse/HDFS-6619
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: test
Reporter: Andrew Wang
Assignee: Andrew Wang
Priority: Minor
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: hdfs-6619.001.patch


 Would be good to clean up TestHDFSEncryption and TestEncryptionZonesAPI. 
 These tests could be renamed, test timeouts added/adjusted, reduced number of 
 minicluster start/stops, whitespace, etc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6688) Hadoop JMX stats are not refreshed

2014-07-15 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062640#comment-14062640
 ] 

Andrew Wang commented on HDFS-6688:
---

Hi Biju, did you wait 10.5 minutes for the default dead nodes timeout before 
checking these stats? Did you also compare the JMX stats with the display on 
the webui? I'd expect these to be the same, but we have had some issues here in 
the past.

 Hadoop JMX stats are not refreshed
 --

 Key: HDFS-6688
 URL: https://issues.apache.org/jira/browse/HDFS-6688
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: Ubuntu
Reporter: Biju Nair

 Even when the HDFS datanode process is stopped the JMX attribute 
 Hadoop.NameNode.FSNamesystemState.NumLiveDataNodes/NumDeadDataNodes attribute 
 values doesn't change. Also Hadoop.NameNode.NameNodeInfo.Attributes.LiveNodes 
 shows the stopped datanode details. If these attributes reflect the actual 
 changes in the datanode, they can be used to monitor the health of the HDFS 
 cluster which currently can't be used.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6658) Namenode memory optimization - Block replicas list

2014-07-15 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062648#comment-14062648
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6658:
---

Let me try it in HDFS-5464.

 Namenode memory optimization - Block replicas list 
 ---

 Key: HDFS-6658
 URL: https://issues.apache.org/jira/browse/HDFS-6658
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.4.1
Reporter: Amir Langer
Assignee: Amir Langer
 Attachments: Namenode Memory Optimizations - Block replicas list.docx


 Part of the memory consumed by every BlockInfo object in the Namenode is a 
 linked list of block references for every DatanodeStorageInfo (called 
 triplets). 
 We propose to change the way we store the list in memory. 
 Using primitive integer indexes instead of object references will reduce the 
 memory needed for every block replica (when compressed oops is disabled) and 
 in our new design the list overhead will be per DatanodeStorageInfo and not 
 per block replica.
 see attached design doc. for details and evaluation results.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-5464) Simplify block report diff calculation

2014-07-15 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-5464:
--

Attachment: h5464_20140715.patch

Some new idea inspired by HDFS-6658.

h5464_20140715.patch: marks visited blocks by setting length n to (-n-1).

 Simplify block report diff calculation
 --

 Key: HDFS-5464
 URL: https://issues.apache.org/jira/browse/HDFS-5464
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h5464_20131105.patch, h5464_20131105b.patch, 
 h5464_20131105c.patch, h5464_20140715.patch


 The current calculation in BlockManager.reportDiff(..) is unnecessarily 
 complicated.  We could simplify the calculation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-5464) Simplify block report diff calculation

2014-07-15 Thread Tsz Wo Nicholas Sze (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo Nicholas Sze updated HDFS-5464:
--

Status: Patch Available  (was: Reopened)

 Simplify block report diff calculation
 --

 Key: HDFS-5464
 URL: https://issues.apache.org/jira/browse/HDFS-5464
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze
Priority: Minor
 Attachments: h5464_20131105.patch, h5464_20131105b.patch, 
 h5464_20131105c.patch, h5464_20140715.patch


 The current calculation in BlockManager.reportDiff(..) is unnecessarily 
 complicated.  We could simplify the calculation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6689) NFS: can't access file under directory with 711 access right as other user

2014-07-15 Thread Yesha Vora (JIRA)
Yesha Vora created HDFS-6689:


 Summary: NFS: can't access file under directory with 711 access 
right as other user
 Key: HDFS-6689
 URL: https://issues.apache.org/jira/browse/HDFS-6689
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Yesha Vora


NFS does not allow other user to access a file with 644 permission and a parent 
with 711 access right.

Steps to reproduce:

1. Create a directory /user/userX with 711 permissions
2. Upload a file at /user/userX/TestFile with 644 as userX 
3. Try to access WriteTest as userY.
 HDFS will allow to read TestFile. 
{noformat}
bash-4.1$ id
uid=661(userY) gid=100(users) groups=100(users),13016(groupY)

bash-4.1$ hdfs dfs -cat /user/userX/TestFile
create a file with some content
{noformat}
 NFS will not allow to read TestFile.
{noformat}
bash-4.1$ cat /tmp/tmp_mnt/user/userX/TestFile
cat: /tmp/tmp_mnt/user/userX/TestFile: Permission denied
{noformat}




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6689) NFS: can't access file under directory with 711 access right as other user

2014-07-15 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-6689:
-

Affects Version/s: 2.2.0

 NFS: can't access file under directory with 711 access right as other user
 --

 Key: HDFS-6689
 URL: https://issues.apache.org/jira/browse/HDFS-6689
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.2.0
Reporter: Yesha Vora

 NFS does not allow other user to access a file with 644 permission and a 
 parent with 711 access right.
 Steps to reproduce:
 1. Create a directory /user/userX with 711 permissions
 2. Upload a file at /user/userX/TestFile with 644 as userX 
 3. Try to access WriteTest as userY.
  HDFS will allow to read TestFile. 
 {noformat}
 bash-4.1$ id
 uid=661(userY) gid=100(users) groups=100(users),13016(groupY)
 bash-4.1$ hdfs dfs -cat /user/userX/TestFile
 create a file with some content
 {noformat}
  NFS will not allow to read TestFile.
 {noformat}
 bash-4.1$ cat /tmp/tmp_mnt/user/userX/TestFile
 cat: /tmp/tmp_mnt/user/userX/TestFile: Permission denied
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6689) NFS: can't access file under directory with 711 access right as other user

2014-07-15 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-6689:
-

Labels:   (was: nfs)

 NFS: can't access file under directory with 711 access right as other user
 --

 Key: HDFS-6689
 URL: https://issues.apache.org/jira/browse/HDFS-6689
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.2.0
Reporter: Yesha Vora

 NFS does not allow other user to access a file with 644 permission and a 
 parent with 711 access right.
 Steps to reproduce:
 1. Create a directory /user/userX with 711 permissions
 2. Upload a file at /user/userX/TestFile with 644 as userX 
 3. Try to access WriteTest as userY.
  HDFS will allow to read TestFile. 
 {noformat}
 bash-4.1$ id
 uid=661(userY) gid=100(users) groups=100(users),13016(groupY)
 bash-4.1$ hdfs dfs -cat /user/userX/TestFile
 create a file with some content
 {noformat}
  NFS will not allow to read TestFile.
 {noformat}
 bash-4.1$ cat /tmp/tmp_mnt/user/userX/TestFile
 cat: /tmp/tmp_mnt/user/userX/TestFile: Permission denied
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6689) NFS: can't access file under directory with 711 access right as other user

2014-07-15 Thread Brandon Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Li updated HDFS-6689:
-

Component/s: nfs

 NFS: can't access file under directory with 711 access right as other user
 --

 Key: HDFS-6689
 URL: https://issues.apache.org/jira/browse/HDFS-6689
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.2.0
Reporter: Yesha Vora

 NFS does not allow other user to access a file with 644 permission and a 
 parent with 711 access right.
 Steps to reproduce:
 1. Create a directory /user/userX with 711 permissions
 2. Upload a file at /user/userX/TestFile with 644 as userX 
 3. Try to access WriteTest as userY.
  HDFS will allow to read TestFile. 
 {noformat}
 bash-4.1$ id
 uid=661(userY) gid=100(users) groups=100(users),13016(groupY)
 bash-4.1$ hdfs dfs -cat /user/userX/TestFile
 create a file with some content
 {noformat}
  NFS will not allow to read TestFile.
 {noformat}
 bash-4.1$ cat /tmp/tmp_mnt/user/userX/TestFile
 cat: /tmp/tmp_mnt/user/userX/TestFile: Permission denied
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6422) getfattr in CLI doesn't throw exception or return non-0 return code when xattr doesn't exist

2014-07-15 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb updated HDFS-6422:
---

Attachment: HDFS-6474.4.patch

This patch is not completely done yet. I am submitting it to see what the 
jenkins run looks like so please don't review it yet.

 getfattr in CLI doesn't throw exception or return non-0 return code when 
 xattr doesn't exist
 

 Key: HDFS-6422
 URL: https://issues.apache.org/jira/browse/HDFS-6422
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 3.0.0, 2.5.0
Reporter: Charles Lamb
Assignee: Charles Lamb
 Attachments: HDFS-6422.1.patch, HDFS-6422.2.patch, HDFS-6422.3.patch, 
 HDFS-6474.4.patch


 If you do
 hdfs dfs -getfattr -n user.blah /foo
 and user.blah doesn't exist, the command prints
 # file: /foo
 and a 0 return code.
 It should print an exception and return a non-0 return code instead.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6690) Deduplicate xattr names in memory

2014-07-15 Thread Andrew Wang (JIRA)
Andrew Wang created HDFS-6690:
-

 Summary: Deduplicate xattr names in memory
 Key: HDFS-6690
 URL: https://issues.apache.org/jira/browse/HDFS-6690
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.5.0
Reporter: Andrew Wang
Assignee: Andrew Wang


When the same string is used repeatedly for an xattr name, we could potentially 
save some NN memory by deduplicating the strings.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6588) Investigating removing getTrueCause method in Server.java

2014-07-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062668#comment-14062668
 ] 

Hadoop QA commented on HDFS-6588:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12655819/HDFS-6588.001.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.fs.TestSymlinkLocalFSFileSystem
  org.apache.hadoop.fs.TestSymlinkLocalFSFileContext
  org.apache.hadoop.ipc.TestIPC
  org.apache.hadoop.hdfs.tools.TestDFSAdminWithHA
  
org.apache.hadoop.hdfs.server.namenode.ha.TestPipelinesFailover

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7350//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7350//console

This message is automatically generated.

 Investigating removing getTrueCause method in Server.java
 -

 Key: HDFS-6588
 URL: https://issues.apache.org/jira/browse/HDFS-6588
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: security, webhdfs
Affects Versions: 2.5.0
Reporter: Yongjun Zhang
Assignee: Yongjun Zhang
 Attachments: HDFS-6588.001.patch, HDFS-6588.001.patch


 When addressing Daryn Sharp's comment for HDFS-6475 quoted below:
 {quote}
 What I'm saying is I think the patch adds too much unnecessary code. Filing 
 an improvement to delete all but a few lines of the code changed in this 
 patch seems a bit odd. I think you just need to:
 - Delete getTrueCause entirely instead of moving it elsewhere
 - In saslProcess, just throw the exception instead of running it through 
 getTrueCause since it's not a InvalidToken wrapping another exception 
 anymore.
 - Keep your 3-line change to unwrap SecurityException in toResponse
 {quote}
 There are multiple test failures, after making the suggested changes, Filing 
 this jira to dedicate to the investigation of removing getTrueCause method.
 More detail will be put in the first comment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6689) NFS: can't access file under directory with 711 access right as other user

2014-07-15 Thread Brandon Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062670#comment-14062670
 ] 

Brandon Li commented on HDFS-6689:
--

This is due to a bug in Nfs3Utils#getAccessRights(), which doesn't give 
execution permission to directories.
{noformat}
if (isSet(mode, Nfs3Constant.ACCESS_MODE_EXECUTE)) {
  if (type == NfsFileType.NFSREG.toValue()) {
rtn |= Nfs3Constant.ACCESS3_EXECUTE;
  }
}
{noformat}

 NFS: can't access file under directory with 711 access right as other user
 --

 Key: HDFS-6689
 URL: https://issues.apache.org/jira/browse/HDFS-6689
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: nfs
Affects Versions: 2.2.0
Reporter: Yesha Vora

 NFS does not allow other user to access a file with 644 permission and a 
 parent with 711 access right.
 Steps to reproduce:
 1. Create a directory /user/userX with 711 permissions
 2. Upload a file at /user/userX/TestFile with 644 as userX 
 3. Try to access WriteTest as userY.
  HDFS will allow to read TestFile. 
 {noformat}
 bash-4.1$ id
 uid=661(userY) gid=100(users) groups=100(users),13016(groupY)
 bash-4.1$ hdfs dfs -cat /user/userX/TestFile
 create a file with some content
 {noformat}
  NFS will not allow to read TestFile.
 {noformat}
 bash-4.1$ cat /tmp/tmp_mnt/user/userX/TestFile
 cat: /tmp/tmp_mnt/user/userX/TestFile: Permission denied
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HDFS-6597) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed

2014-07-15 Thread Chris Nauroth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HDFS-6597:


Assignee: Danilo Vunjak

 Add a new option to NN upgrade to terminate the process after upgrade on NN 
 is completed
 

 Key: HDFS-6597
 URL: https://issues.apache.org/jira/browse/HDFS-6597
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Danilo Vunjak
Assignee: Danilo Vunjak
 Attachments: HDFS-6597.04.patch, JIRA-HDFS-30.patch, 
 JIRA-HDFS-6597.02.patch, JIRA-HDFS-6597.03.patch, JIRA-HDFS-6597.patch


 Currently when namenode is started for upgrade (hadoop namenode -upgrade 
 command), after finishing upgrade of metadata, namenode starts working 
 normally and wait for datanodes to upgrade itself and connect to to NN. We 
 need to have option for upgrading only NN metadata, so after upgrade is 
 finished on NN, process should terminate.
 I have tested it by changing in file: hdfs.server.namenode.NameNode.java, 
 method: public static NameNode createNameNode(String argv[], Configuration 
 conf):
  in switch added
  case UPGRADE:
 case UPGRADE:
   {
 DefaultMetricsSystem.initialize(NameNode);
   NameNode nameNode = new NameNode(conf);
   if (startOpt.getForceUpgrade()) {
 terminate(0);
 return null;
   }
   
   return nameNode;
   }
 This did upgrade of metadata, closed process after finished, and later when 
 all services were started, upgrade of datanodes finished sucessfully and 
 system run .
 What I'm suggesting right now is to add new startup parameter -force, so 
 namenode can be started like this hadoop namenode -upgrade -force, so we 
 can indicate that we want to terminate process after upgrade metadata on NN 
 is finished. Old functionality should be preserved, so users can run hadoop 
 namenode -upgrade on same way and with same behaviour as it was previous.
  Thanks,
  Danilo



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6597) Add a new option to NN upgrade to terminate the process after upgrade on NN is completed

2014-07-15 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062687#comment-14062687
 ] 

Chris Nauroth commented on HDFS-6597:
-

Hi, [~dvunjak].  This mostly looks good.  I have 2 comments:
# I built the distro after adding a bogus new feature to bump the version to 
-58 in {{NameNodeLayoutVersion}}.  I tried running -upgradeOnly, but it didn't 
actually upgrade the metadata files.  It looks like you'll need another change 
in {{FSImage#recoverTransitionRead}}.  There is a switch statement that looks 
for the {{UPGRADE}} option, but not the new {{UPGRADEONLY}} option.
# The new test suite is a copy of the existing {{TestStartupOptionUpgrade}} 
with the option changed to {{UPGRADEONLY}}.  Instead of cloning the code, this 
looks like a good opportunity for a JUnit {{Parameterized}} test.  See 
{{TestNameNodeHttpServer}} for an existing example of a {{Parameterized}} test. 
 I think you can make a fairly small change in the existing 
{{TestStartupOptionUpgrade}} so that it's parameterized to run on both options: 
{{UPGRADE}} and {{UPGRADEONLY}}.

 Add a new option to NN upgrade to terminate the process after upgrade on NN 
 is completed
 

 Key: HDFS-6597
 URL: https://issues.apache.org/jira/browse/HDFS-6597
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Reporter: Danilo Vunjak
Assignee: Danilo Vunjak
 Attachments: HDFS-6597.04.patch, JIRA-HDFS-30.patch, 
 JIRA-HDFS-6597.02.patch, JIRA-HDFS-6597.03.patch, JIRA-HDFS-6597.patch


 Currently when namenode is started for upgrade (hadoop namenode -upgrade 
 command), after finishing upgrade of metadata, namenode starts working 
 normally and wait for datanodes to upgrade itself and connect to to NN. We 
 need to have option for upgrading only NN metadata, so after upgrade is 
 finished on NN, process should terminate.
 I have tested it by changing in file: hdfs.server.namenode.NameNode.java, 
 method: public static NameNode createNameNode(String argv[], Configuration 
 conf):
  in switch added
  case UPGRADE:
 case UPGRADE:
   {
 DefaultMetricsSystem.initialize(NameNode);
   NameNode nameNode = new NameNode(conf);
   if (startOpt.getForceUpgrade()) {
 terminate(0);
 return null;
   }
   
   return nameNode;
   }
 This did upgrade of metadata, closed process after finished, and later when 
 all services were started, upgrade of datanodes finished sucessfully and 
 system run .
 What I'm suggesting right now is to add new startup parameter -force, so 
 namenode can be started like this hadoop namenode -upgrade -force, so we 
 can indicate that we want to terminate process after upgrade metadata on NN 
 is finished. Old functionality should be preserved, so users can run hadoop 
 namenode -upgrade on same way and with same behaviour as it was previous.
  Thanks,
  Danilo



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-6691) The message on NN UI can be confusing during a rolling upgrade

2014-07-15 Thread Mit Desai (JIRA)
Mit Desai created HDFS-6691:
---

 Summary: The message on NN UI can be confusing during a rolling 
upgrade 
 Key: HDFS-6691
 URL: https://issues.apache.org/jira/browse/HDFS-6691
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.5.0
Reporter: Mit Desai
Assignee: Mit Desai
 Attachments: ha1.png

On ANN, it says rollback image was created. On SBN, it says otherwise.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   >