[jira] [Commented] (HDFS-2065) Fix NPE in DFSClient.getFileChecksum

2011-08-22 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088848#comment-13088848
 ] 

Bharath Mundlapudi commented on HDFS-2065:
--

Ok, I will recheck this. 

 Fix NPE in DFSClient.getFileChecksum
 

 Key: HDFS-2065
 URL: https://issues.apache.org/jira/browse/HDFS-2065
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0

 Attachments: HDFS-2065-1.patch


 The following code can throw NPE if callGetBlockLocations returns null.
 If server returns null 
 {code}
 ListLocatedBlock locatedblocks
 = callGetBlockLocations(namenode, src, 0, 
 Long.MAX_VALUE).getLocatedBlocks();
 {code}
 The right fix for this is server should throw right exception.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1976) Logging in DataXceiver will sometimes repeat stack traces

2011-07-26 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1976:
-

Assignee: Bharath Mundlapudi

 Logging in DataXceiver will sometimes repeat stack traces
 -

 Key: HDFS-1976
 URL: https://issues.apache.org/jira/browse/HDFS-1976
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Joey Echeverria
Assignee: Bharath Mundlapudi
Priority: Minor

 The run() method in DataXceiver logs the stack trace of all throwables thrown 
 while performing an operation. In some cases, the operations also log stack 
 traces despite throwing the exception up the stack. The logging code should 
 try to avoid double-logging stack traces where possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1872) BPOfferService.cleanUp(..) throws NullPointerException

2011-07-26 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070972#comment-13070972
 ] 

Bharath Mundlapudi commented on HDFS-1872:
--

https://issues.apache.org/jira/browse/HDFS-1592

 BPOfferService.cleanUp(..) throws NullPointerException
 --

 Key: HDFS-1872
 URL: https://issues.apache.org/jira/browse/HDFS-1872
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Reporter: Tsz Wo (Nicholas), SZE

 {noformat}
 NullPointerException
 at 
 org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.cleanUp(DataNode.java:1005)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.run(DataNode.java:1220)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1796) Switch NameNode to use non-fair locks

2011-07-26 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070976#comment-13070976
 ] 

Bharath Mundlapudi commented on HDFS-1796:
--

Fair lock guarantees the order of execution at the cost of performance. If 
delete and read arrives at the same time but read operation takes the lock 
first, without fair lock, system might schedule delete operation first and then 
read. Shouldn't we care about the correctness and order of execution in file 
systems?

 Switch NameNode to use non-fair locks
 -

 Key: HDFS-1796
 URL: https://issues.apache.org/jira/browse/HDFS-1796
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Hairong Kuang
 Attachments: non-fair-lock.patch


 According to JavaDoc, a non-fair lock will normally have higher throughput 
 than a fair lock. Our experiment also shows an improved performance when 
 using a non-fair lock. We should switch namenode to use non-fair locks.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1776) Bug in Concat code

2011-07-19 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1776:
-

Assignee: Bharath Mundlapudi

 Bug in Concat code
 --

 Key: HDFS-1776
 URL: https://issues.apache.org/jira/browse/HDFS-1776
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Dmytro Molkov
Assignee: Bharath Mundlapudi

 There is a bug in the concat code. Specifically: in INodeFile.appendBlocks() 
 we need to first reassign the blocks list and then go through it and update 
 the INode pointer. Otherwise we are not updating the inode pointer on all of 
 the new blocks in the file.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1776) Bug in Concat code

2011-07-19 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1776:
-

Affects Version/s: 0.23.0
Fix Version/s: 0.23.0

 Bug in Concat code
 --

 Key: HDFS-1776
 URL: https://issues.apache.org/jira/browse/HDFS-1776
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Dmytro Molkov
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0


 There is a bug in the concat code. Specifically: in INodeFile.appendBlocks() 
 we need to first reassign the blocks list and then go through it and update 
 the INode pointer. Otherwise we are not updating the inode pointer on all of 
 the new blocks in the file.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1776) Bug in Concat code

2011-07-19 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1776:
-

Attachment: HDFS-1776-1.patch

Attaching a patch for this.

 Bug in Concat code
 --

 Key: HDFS-1776
 URL: https://issues.apache.org/jira/browse/HDFS-1776
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Dmytro Molkov
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0

 Attachments: HDFS-1776-1.patch


 There is a bug in the concat code. Specifically: in INodeFile.appendBlocks() 
 we need to first reassign the blocks list and then go through it and update 
 the INode pointer. Otherwise we are not updating the inode pointer on all of 
 the new blocks in the file.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1977) Stop using StringUtils.stringifyException()

2011-07-12 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13064067#comment-13064067
 ] 

Bharath Mundlapudi commented on HDFS-1977:
--

Thank you all, I am attaching a patch which addresses Jitendra's comment.



 Stop using StringUtils.stringifyException()
 ---

 Key: HDFS-1977
 URL: https://issues.apache.org/jira/browse/HDFS-1977
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Joey Echeverria
Assignee: Bharath Mundlapudi
Priority: Minor
 Attachments: HDFS-1977-1.patch, HDFS-1977-2.patch, HDFS-1977-3.patch


 The old version of the logging APIs didn't support logging stack traces by 
 passing exceptions to the logging methods (e.g. Log.error()). A number of log 
 statements make use of StringUtils.stringifyException() to get around the old 
 behavior. It would be nice if this could get cleaned up to make use of the 
 the logger's stack trace printing. This also gives users more control since 
 you can configure how the stack traces are written to the logs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1977) Stop using StringUtils.stringifyException()

2011-07-12 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1977:
-

Attachment: HDFS-1977-4.patch

 Stop using StringUtils.stringifyException()
 ---

 Key: HDFS-1977
 URL: https://issues.apache.org/jira/browse/HDFS-1977
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Joey Echeverria
Assignee: Bharath Mundlapudi
Priority: Minor
 Attachments: HDFS-1977-1.patch, HDFS-1977-2.patch, HDFS-1977-3.patch, 
 HDFS-1977-4.patch


 The old version of the logging APIs didn't support logging stack traces by 
 passing exceptions to the logging methods (e.g. Log.error()). A number of log 
 statements make use of StringUtils.stringifyException() to get around the old 
 behavior. It would be nice if this could get cleaned up to make use of the 
 the logger's stack trace printing. This also gives users more control since 
 you can configure how the stack traces are written to the logs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1872) BPOfferService.cleanUp(..) throws NullPointerException

2011-07-12 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13064161#comment-13064161
 ] 

Bharath Mundlapudi commented on HDFS-1872:
--

Yes, I was seeing NPE in cleanup code earlier. I made some changes in this area 
related to datanode exit. It should be fine now.  

 BPOfferService.cleanUp(..) throws NullPointerException
 --

 Key: HDFS-1872
 URL: https://issues.apache.org/jira/browse/HDFS-1872
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Reporter: Tsz Wo (Nicholas), SZE

 {noformat}
 NullPointerException
 at 
 org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.cleanUp(DataNode.java:1005)
 at 
 org.apache.hadoop.hdfs.server.datanode.DataNode$BPOfferService.run(DataNode.java:1220)
 at java.lang.Thread.run(Thread.java:662)
 {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1977) Stop using StringUtils.stringifyException()

2011-07-08 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062068#comment-13062068
 ] 

Bharath Mundlapudi commented on HDFS-1977:
--

Todd, If you don't have any further comments on this patch, can you please 
commit this?



 Stop using StringUtils.stringifyException()
 ---

 Key: HDFS-1977
 URL: https://issues.apache.org/jira/browse/HDFS-1977
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Joey Echeverria
Assignee: Bharath Mundlapudi
Priority: Minor
 Attachments: HDFS-1977-1.patch, HDFS-1977-2.patch, HDFS-1977-3.patch


 The old version of the logging APIs didn't support logging stack traces by 
 passing exceptions to the logging methods (e.g. Log.error()). A number of log 
 statements make use of StringUtils.stringifyException() to get around the old 
 behavior. It would be nice if this could get cleaned up to make use of the 
 the logger's stack trace printing. This also gives users more control since 
 you can configure how the stack traces are written to the logs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1977) Stop using StringUtils.stringifyException()

2011-07-07 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061719#comment-13061719
 ] 

Bharath Mundlapudi commented on HDFS-1977:
--

Sure. I agree with you. I am posting an updated patch with your suggestions.


 Stop using StringUtils.stringifyException()
 ---

 Key: HDFS-1977
 URL: https://issues.apache.org/jira/browse/HDFS-1977
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Joey Echeverria
Assignee: Bharath Mundlapudi
Priority: Minor
 Attachments: HDFS-1977-1.patch, HDFS-1977-2.patch, HDFS-1977-3.patch


 The old version of the logging APIs didn't support logging stack traces by 
 passing exceptions to the logging methods (e.g. Log.error()). A number of log 
 statements make use of StringUtils.stringifyException() to get around the old 
 behavior. It would be nice if this could get cleaned up to make use of the 
 the logger's stack trace printing. This also gives users more control since 
 you can configure how the stack traces are written to the logs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1977) Stop using StringUtils.stringifyException()

2011-07-07 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1977:
-

Attachment: HDFS-1977-3.patch

 Stop using StringUtils.stringifyException()
 ---

 Key: HDFS-1977
 URL: https://issues.apache.org/jira/browse/HDFS-1977
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Joey Echeverria
Assignee: Bharath Mundlapudi
Priority: Minor
 Attachments: HDFS-1977-1.patch, HDFS-1977-2.patch, HDFS-1977-3.patch


 The old version of the logging APIs didn't support logging stack traces by 
 passing exceptions to the logging methods (e.g. Log.error()). A number of log 
 statements make use of StringUtils.stringifyException() to get around the old 
 behavior. It would be nice if this could get cleaned up to make use of the 
 the logger's stack trace printing. This also gives users more control since 
 you can configure how the stack traces are written to the logs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1977) Stop using StringUtils.stringifyException()

2011-07-06 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1977:
-

Attachment: HDFS-1977-2.patch

Things changed since last post, reattaching with new changes.

 Stop using StringUtils.stringifyException()
 ---

 Key: HDFS-1977
 URL: https://issues.apache.org/jira/browse/HDFS-1977
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Joey Echeverria
Assignee: Bharath Mundlapudi
Priority: Minor
 Attachments: HDFS-1977-1.patch, HDFS-1977-2.patch


 The old version of the logging APIs didn't support logging stack traces by 
 passing exceptions to the logging methods (e.g. Log.error()). A number of log 
 statements make use of StringUtils.stringifyException() to get around the old 
 behavior. It would be nice if this could get cleaned up to make use of the 
 the logger's stack trace printing. This also gives users more control since 
 you can configure how the stack traces are written to the logs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1977) Stop using StringUtils.stringifyException()

2011-07-06 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060910#comment-13060910
 ] 

Bharath Mundlapudi commented on HDFS-1977:
--

This patch doesn't include unit tests, since its just adopting to new logging 
api. No new tests are required.

 Stop using StringUtils.stringifyException()
 ---

 Key: HDFS-1977
 URL: https://issues.apache.org/jira/browse/HDFS-1977
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Joey Echeverria
Assignee: Bharath Mundlapudi
Priority: Minor
 Attachments: HDFS-1977-1.patch, HDFS-1977-2.patch


 The old version of the logging APIs didn't support logging stack traces by 
 passing exceptions to the logging methods (e.g. Log.error()). A number of log 
 statements make use of StringUtils.stringifyException() to get around the old 
 behavior. It would be nice if this could get cleaned up to make use of the 
 the logger's stack trace printing. This also gives users more control since 
 you can configure how the stack traces are written to the logs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2123) 1073: Checkpoint interval should be based on txn count, not size

2011-07-05 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060046#comment-13060046
 ] 

Bharath Mundlapudi commented on HDFS-2123:
--

I have reviewed the patch. This change is more meaningful than check pointing 
based on size. +1 to the approach. 

There are some minor comments on logging msgs since we are moving to txns, we 
should reflect this in log msgs too.

1. Replace in Checkpointer

+LOG.info(Log Size Trigger  :  + checkpointTxnCount +  txns );
With
+LOG.info(Transaction Count Trigger  :  + checkpointTxnCount +  txns );

2. Replace in SecondaryNameNode
+  + \nCheckpoint Size  :  + StringUtils.byteDesc(checkpointTxnCount)
++  (=  + checkpointTxnCount +  bytes) 
With
+  + \nTransaction Count  :  + 
StringUtils.byteDesc(checkpointTxnCount)
++  (=  + checkpointTxnCount +  txns) 

3. Replace in SecondaryNamenode
+LOG.info(Log Size Trigger: + checkpointTxnCount +  txns);
with
+LOG.info(Transaction Count Trigger  :  + checkpointTxnCount +  txns );


4. Replace in SecondaryNamenode
+  System.err.println(EditLog size  + count +  transactions is  +
  smaller than configured checkpoint  +
+ interval  + checkpointTxnCount +  
transactions.);
with
+  System.err.println(EditLog transactions  + count +  is  +
  smaller than configured checkpoint  +
+ transactions  + checkpointTxnCount);




 1073: Checkpoint interval should be based on txn count, not size
 

 Key: HDFS-2123
 URL: https://issues.apache.org/jira/browse/HDFS-2123
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: Edit log branch (HDFS-1073)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Fix For: Edit log branch (HDFS-1073)

 Attachments: hdfs-2123.txt, hdfs-2123.txt


 Currently, the administrator can configure the secondary namenode to 
 checkpoint either every N seconds, or every N bytes worth of edit log. It 
 would make more sense to get rid of the size-based interval and instead allow 
 the administrator to specify checkpoints every N transactions. This also 
 simplifies the code a little bit.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2109) Store uMask as member variable to DFSClient.Conf

2011-06-30 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-2109:
-

Attachment: HDFS-2109-2.patch

Fixed the javac warning

 Store uMask as member variable to DFSClient.Conf
 

 Key: HDFS-2109
 URL: https://issues.apache.org/jira/browse/HDFS-2109
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0

 Attachments: HDFS-2109-1.patch, HDFS-2109-2.patch


 As a part of removing reference to conf in DFSClient, I am proposing 
 replacing FsPermission.getUMask(conf) everywhere in DFSClient class with
 dfsClientConf.uMask by storing uMask as a member variable to DFSClient.Conf. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2109) Store uMask as member variable to DFSClient.Conf

2011-06-30 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058115#comment-13058115
 ] 

Bharath Mundlapudi commented on HDFS-2109:
--

Failed tests are not related to this patch.

 Store uMask as member variable to DFSClient.Conf
 

 Key: HDFS-2109
 URL: https://issues.apache.org/jira/browse/HDFS-2109
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0

 Attachments: HDFS-2109-1.patch, HDFS-2109-2.patch


 As a part of removing reference to conf in DFSClient, I am proposing 
 replacing FsPermission.getUMask(conf) everywhere in DFSClient class with
 dfsClientConf.uMask by storing uMask as a member variable to DFSClient.Conf. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2109) Store uMask as member variable to DFSClient.Conf

2011-06-26 Thread Bharath Mundlapudi (JIRA)
Store uMask as member variable to DFSClient.Conf


 Key: HDFS-2109
 URL: https://issues.apache.org/jira/browse/HDFS-2109
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0


As a part of removing reference to conf in DFSClient, I am proposing replacing 
FsPermission.getUMask(conf) everywhere in DFSClient class with
dfsClientConf.uMask. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2109) Store uMask as member variable to DFSClient.Conf

2011-06-26 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-2109:
-

Description: 
As a part of removing reference to conf in DFSClient, I am proposing replacing 
FsPermission.getUMask(conf) everywhere in DFSClient class with
dfsClientConf.uMask by storing uMask as a member variable to DFSClient.Conf. 

  was:
As a part of removing reference to conf in DFSClient, I am proposing replacing 
FsPermission.getUMask(conf) everywhere in DFSClient class with
dfsClientConf.uMask. 


 Store uMask as member variable to DFSClient.Conf
 

 Key: HDFS-2109
 URL: https://issues.apache.org/jira/browse/HDFS-2109
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0


 As a part of removing reference to conf in DFSClient, I am proposing 
 replacing FsPermission.getUMask(conf) everywhere in DFSClient class with
 dfsClientConf.uMask by storing uMask as a member variable to DFSClient.Conf. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2109) Store uMask as member variable to DFSClient.Conf

2011-06-26 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-2109:
-

Attachment: HDFS-2109-1.patch

Attaching the patch.

 Store uMask as member variable to DFSClient.Conf
 

 Key: HDFS-2109
 URL: https://issues.apache.org/jira/browse/HDFS-2109
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0

 Attachments: HDFS-2109-1.patch


 As a part of removing reference to conf in DFSClient, I am proposing 
 replacing FsPermission.getUMask(conf) everywhere in DFSClient class with
 dfsClientConf.uMask by storing uMask as a member variable to DFSClient.Conf. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2092) Create a light inner conf class in DFSClient

2011-06-24 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054264#comment-13054264
 ] 

Bharath Mundlapudi commented on HDFS-2092:
--

We are not concerned about the task attempt. The problem here is for Task 
Tracker's availability. The way conf was designed has its own benefits. At the 
same time it comes with some disadvantages. What if a task attempt can run for 
a day or more? This is not uncommon in, our clusters.

Again, I am listing couple of issues,
1. With UGI, conf will be created per user in TT. (Security folks?)
2. PIG or any other job can store arbitrary data. Hadoop framework should be 
able to deal with it as far as it can. 
3. Last but not least, API should not hold on to client's data. 

As every job is different so can workloads can be different. So one can't see 
or hear all the problems.







 Create a light inner conf class in DFSClient
 

 Key: HDFS-2092
 URL: https://issues.apache.org/jira/browse/HDFS-2092
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0

 Attachments: HDFS-2092-1.patch, HDFS-2092-2.patch


 At present, DFSClient stores reference to configuration object. Since, these 
 configuration objects are pretty big at times can blot the processes which 
 has multiple DFSClient objects like in TaskTracker. This is an attempt to 
 remove the reference of conf object in DFSClient. 
 This patch creates a light inner conf class and copies the required keys from 
 the Configuration object.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2092) Create a light inner conf class in DFSClient

2011-06-24 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054632#comment-13054632
 ] 

Bharath Mundlapudi commented on HDFS-2092:
--

Todd, Thanks for the reasons. 

When we mean a client it can be anything, like TT/JT which has TIP/JIP. You are 
right, client TIP/JIP can have references to JobConf. But then reference scope 
is decided by client. And yes, eventually, we need to fix the FS cache you are 
referring also if there are any leaks. 



 Create a light inner conf class in DFSClient
 

 Key: HDFS-2092
 URL: https://issues.apache.org/jira/browse/HDFS-2092
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0

 Attachments: HDFS-2092-1.patch, HDFS-2092-2.patch


 At present, DFSClient stores reference to configuration object. Since, these 
 configuration objects are pretty big at times can blot the processes which 
 has multiple DFSClient objects like in TaskTracker. This is an attempt to 
 remove the reference of conf object in DFSClient. 
 This patch creates a light inner conf class and copies the required keys from 
 the Configuration object.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-2103) Read lock must be released before acquiring a write lock

2011-06-23 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi resolved HDFS-2103.
--

Resolution: Not A Problem

Didn't notice the finally block, where read lock is released. I am closing this 
Jira.

 Read lock must be released before acquiring a write lock
 

 Key: HDFS-2103
 URL: https://issues.apache.org/jira/browse/HDFS-2103
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0


 In FSNamesystem.getBlockLocationsUpdateTimes function, we have the following 
 code:
 {code}
 for (int attempt = 0; attempt  2; attempt++) {
   if (attempt == 0) { // first attempt is with readlock
 readLock();
   }  else { // second attempt is with  write lock
 writeLock(); // writelock is needed to set accesstime
   }
   ...
   if (attempt == 0) {
  continue;
   }
 {code}
 In the above code, readLock is acquired in attempt 0 and if the execution 
 enters in the continue block, then it tries to acquire writeLock before 
 releasing the readLock.
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2092) Remove configuration object reference in DFSClient

2011-06-23 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054134#comment-13054134
 ] 

Bharath Mundlapudi commented on HDFS-2092:
--

Also, exiting unit tests should cover this path. So i haven't added new unit 
tests. 

 Remove configuration object reference in DFSClient
 --

 Key: HDFS-2092
 URL: https://issues.apache.org/jira/browse/HDFS-2092
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0

 Attachments: HDFS-2092-1.patch, HDFS-2092-2.patch


 At present, DFSClient stores reference to configuration object. Since, these 
 configuration objects are pretty big at times can blot the processes which 
 has multiple DFSClient objects like in TaskTracker. This is an attempt to 
 remove the reference of conf object in DFSClient. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2092) Remove configuration object reference in DFSClient

2011-06-23 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054176#comment-13054176
 ] 

Bharath Mundlapudi commented on HDFS-2092:
--

Hi Eli,

 Does this change mean that a Configuration object can now bee free'd because 
 there's one fewer ref to it?
Yes, the direction of this patch is that. Eventually, we will be passing around 
only the DFSClient#conf or only required parameters to the downstream. This 
will be a big change and needs border discussion. But you are right, the idea 
is to stop having references to the conf object coming from the users. We want 
to let client code to decide the scope of conf object.  

Regarding memory, these will be few [key,value] pairs copied into DFSClient but 
then will be freeing the blotted conf object for the GC. That will be a big win 
on memory.



 Remove configuration object reference in DFSClient
 --

 Key: HDFS-2092
 URL: https://issues.apache.org/jira/browse/HDFS-2092
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0

 Attachments: HDFS-2092-1.patch, HDFS-2092-2.patch


 At present, DFSClient stores reference to configuration object. Since, these 
 configuration objects are pretty big at times can blot the processes which 
 has multiple DFSClient objects like in TaskTracker. This is an attempt to 
 remove the reference of conf object in DFSClient. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2105) Remove the references to configuration object from the DFSClient library.

2011-06-23 Thread Bharath Mundlapudi (JIRA)
Remove the references to configuration object from the DFSClient library.
-

 Key: HDFS-2105
 URL: https://issues.apache.org/jira/browse/HDFS-2105
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0


This is an umbrella jira to track removing the references to conf object in 
DFSClient library.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2092) Create a light inner conf class in DFSClient

2011-06-23 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054245#comment-13054245
 ] 

Bharath Mundlapudi commented on HDFS-2092:
--

Hi Aaron,

That was just a sample of measurement for a day. We should care for MAX here in 
this case. Also, Going forward, PIG 0.9 will store lots of meta data in the 
conf and also one can embed the PIG script itself in the conf. This can 
potentially blow the TT. We can measure an approx size of conf by the job.xml 
file in the job history location. Since one can store anything in the job conf, 
we should be careful with the references to this object - we should not hold 
for long duration. 



  

 Create a light inner conf class in DFSClient
 

 Key: HDFS-2092
 URL: https://issues.apache.org/jira/browse/HDFS-2092
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0

 Attachments: HDFS-2092-1.patch, HDFS-2092-2.patch


 At present, DFSClient stores reference to configuration object. Since, these 
 configuration objects are pretty big at times can blot the processes which 
 has multiple DFSClient objects like in TaskTracker. This is an attempt to 
 remove the reference of conf object in DFSClient. 
 This patch creates a light inner conf class and copies the required keys from 
 the Configuration object.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2092) Remove configuration object reference in DFSClient

2011-06-22 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-2092:
-

Attachment: HDFS-2092-1.patch

Attaching a patch for this

 Remove configuration object reference in DFSClient
 --

 Key: HDFS-2092
 URL: https://issues.apache.org/jira/browse/HDFS-2092
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0

 Attachments: HDFS-2092-1.patch


 At present, DFSClient stores reference to configuration object. Since, these 
 configuration objects are pretty big at times can blot the processes which 
 has multiple DFSClient objects like in TaskTracker. This is an attempt to 
 remove the reference of conf object in DFSClient. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2092) Remove configuration object reference in DFSClient

2011-06-20 Thread Bharath Mundlapudi (JIRA)
Remove configuration object reference in DFSClient
--

 Key: HDFS-2092
 URL: https://issues.apache.org/jira/browse/HDFS-2092
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0


At present, DFSClient stores reference to configuration object. Since, these 
configuration objects are pretty big at times can blot the processes which has 
multiple DFSClient objects like in TaskTracker. This is an attempt to remove 
the reference of conf object in DFSClient. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2094) Add metrics for write pipeline failures

2011-06-20 Thread Bharath Mundlapudi (JIRA)
Add metrics for write pipeline failures
---

 Key: HDFS-2094
 URL: https://issues.apache.org/jira/browse/HDFS-2094
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0


Write pipeline can fail for various reasons like rpc connection issues, disk 
problem etc. I am proposing to add metrics to detect write pipeline issues. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1692) In secure mode, Datanode process doesn't exit when disks fail.

2011-06-16 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050792#comment-13050792
 ] 

Bharath Mundlapudi commented on HDFS-1692:
--

Exiting tests like TestDataNodeExit should check for this condition. So i have 
not added a new test for this. 

 In secure mode, Datanode process doesn't exit when disks fail.
 --

 Key: HDFS-1692
 URL: https://issues.apache.org/jira/browse/HDFS-1692
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.20.204.0, 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.20.204.0, 0.23.0

 Attachments: HDFS-1692-1.patch, HDFS-1692-v0.23-1.patch, 
 HDFS-1692-v0.23-2.patch


 In secure mode, when disks fail more than volumes tolerated, datanode process 
 doesn't exit properly and it just hangs even though shutdown method is 
 called. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1692) In secure mode, Datanode process doesn't exit when disks fail.

2011-06-15 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1692:
-

Attachment: HDFS-1692-v0.23-2.patch

I have cleaned up a little bit, like the logging related stuff and few 
comments. Uploading the patch again.

 In secure mode, Datanode process doesn't exit when disks fail.
 --

 Key: HDFS-1692
 URL: https://issues.apache.org/jira/browse/HDFS-1692
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.20.204.0, 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.20.204.0, 0.23.0

 Attachments: HDFS-1692-1.patch, HDFS-1692-v0.23-1.patch, 
 HDFS-1692-v0.23-2.patch


 In secure mode, when disks fail more than volumes tolerated, datanode process 
 doesn't exit properly and it just hangs even though shutdown method is 
 called. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1942) If all Block Pool service threads exit then datanode should exit.

2011-06-14 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1942:
-

Attachment: HDFS-1942-2.patch

Attaching a patch with some test cleanup. Reduced the test time.

 If all Block Pool service threads exit then datanode should exit.
 -

 Key: HDFS-1942
 URL: https://issues.apache.org/jira/browse/HDFS-1942
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Attachments: HDFS-1942-1.patch, HDFS-1942-2.patch


 Currently, if all block pool service threads exit, Datanode continue to run. 
 This should be fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1692) In secure mode, Datanode process doesn't exit when disks fail.

2011-06-14 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1692:
-

Affects Version/s: 0.23.0
Fix Version/s: 0.23.0

 In secure mode, Datanode process doesn't exit when disks fail.
 --

 Key: HDFS-1692
 URL: https://issues.apache.org/jira/browse/HDFS-1692
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.20.204.0, 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.20.204.0, 0.23.0

 Attachments: HDFS-1692-1.patch


 In secure mode, when disks fail more than volumes tolerated, datanode process 
 doesn't exit properly and it just hangs even though shutdown method is 
 called. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1692) In secure mode, Datanode process doesn't exit when disks fail.

2011-06-14 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1692:
-

Attachment: HDFS-1692-v0.23-1.patch

Attaching a patch for version 0.23.

 In secure mode, Datanode process doesn't exit when disks fail.
 --

 Key: HDFS-1692
 URL: https://issues.apache.org/jira/browse/HDFS-1692
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.20.204.0, 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.20.204.0, 0.23.0

 Attachments: HDFS-1692-1.patch, HDFS-1692-v0.23-1.patch


 In secure mode, when disks fail more than volumes tolerated, datanode process 
 doesn't exit properly and it just hangs even though shutdown method is 
 called. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HDFS-1977) Stop using StringUtils.stringifyException()

2011-06-14 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi reassigned HDFS-1977:


Assignee: Bharath Mundlapudi

 Stop using StringUtils.stringifyException()
 ---

 Key: HDFS-1977
 URL: https://issues.apache.org/jira/browse/HDFS-1977
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Joey Echeverria
Assignee: Bharath Mundlapudi
Priority: Minor

 The old version of the logging APIs didn't support logging stack traces by 
 passing exceptions to the logging methods (e.g. Log.error()). A number of log 
 statements make use of StringUtils.stringifyException() to get around the old 
 behavior. It would be nice if this could get cleaned up to make use of the 
 the logger's stack trace printing. This also gives users more control since 
 you can configure how the stack traces are written to the logs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-2072) Remove StringUtils.stringifyException(ie) in logger functions

2011-06-14 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi resolved HDFS-2072.
--

Resolution: Duplicate

 Remove StringUtils.stringifyException(ie) in logger functions
 -

 Key: HDFS-2072
 URL: https://issues.apache.org/jira/browse/HDFS-2072
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0


 Apache logger api has an overloaded function which can take the message and 
 exception. I am proposing to clean the logging code with this api.
 ie.:
 Change the code from LOG.warn(msg, 
 StringUtils.stringifyException(exception)); to LOG.warn(msg, exception);

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1977) Stop using StringUtils.stringifyException()

2011-06-14 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049438#comment-13049438
 ] 

Bharath Mundlapudi commented on HDFS-1977:
--

The newer logging API supports exceptions.
ie.:
Change the code from LOG.warn(msg, StringUtils.stringifyException(exception)); 
to LOG.warn(msg, exception);

  

 Stop using StringUtils.stringifyException()
 ---

 Key: HDFS-1977
 URL: https://issues.apache.org/jira/browse/HDFS-1977
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Joey Echeverria
Assignee: Bharath Mundlapudi
Priority: Minor

 The old version of the logging APIs didn't support logging stack traces by 
 passing exceptions to the logging methods (e.g. Log.error()). A number of log 
 statements make use of StringUtils.stringifyException() to get around the old 
 behavior. It would be nice if this could get cleaned up to make use of the 
 the logger's stack trace printing. This also gives users more control since 
 you can configure how the stack traces are written to the logs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1977) Stop using StringUtils.stringifyException()

2011-06-14 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1977:
-

Attachment: HDFS-1977-1.patch

Attaching a patch.

 Stop using StringUtils.stringifyException()
 ---

 Key: HDFS-1977
 URL: https://issues.apache.org/jira/browse/HDFS-1977
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Joey Echeverria
Assignee: Bharath Mundlapudi
Priority: Minor
 Attachments: HDFS-1977-1.patch


 The old version of the logging APIs didn't support logging stack traces by 
 passing exceptions to the logging methods (e.g. Log.error()). A number of log 
 statements make use of StringUtils.stringifyException() to get around the old 
 behavior. It would be nice if this could get cleaned up to make use of the 
 the logger's stack trace printing. This also gives users more control since 
 you can configure how the stack traces are written to the logs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1942) If all Block Pool service threads exit then datanode should exit.

2011-06-14 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1942:
-

Attachment: HDFS-1942-3.patch

Attaching a patch. Cleaned up some more code.

 If all Block Pool service threads exit then datanode should exit.
 -

 Key: HDFS-1942
 URL: https://issues.apache.org/jira/browse/HDFS-1942
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Attachments: HDFS-1942-1.patch, HDFS-1942-2.patch, HDFS-1942-3.patch


 Currently, if all block pool service threads exit, Datanode continue to run. 
 This should be fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2072) Remove StringUtils.stringifyException(ie) in logger functions

2011-06-13 Thread Bharath Mundlapudi (JIRA)
Remove StringUtils.stringifyException(ie) in logger functions
-

 Key: HDFS-2072
 URL: https://issues.apache.org/jira/browse/HDFS-2072
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0


Apache logger api has an overloaded function which can take the message and 
exception. I am proposing to clean the logging code with this api.

ie.:
Change the code from LOG.warn(msg, StringUtils.stringifyException(exception)); 
to LOG.warn(msg, exception);

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1942) If all Block Pool service threads exit then datanode should exit.

2011-06-13 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1942:
-

Attachment: HDFS-1942-1.patch

Attaching the patch.

 If all Block Pool service threads exit then datanode should exit.
 -

 Key: HDFS-1942
 URL: https://issues.apache.org/jira/browse/HDFS-1942
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Attachments: HDFS-1942-1.patch


 Currently, if all block pool service threads exit, Datanode continue to run. 
 This should be fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2065) Fix NPE in DFSClient.getFileChecksum

2011-06-10 Thread Bharath Mundlapudi (JIRA)
Fix NPE in DFSClient.getFileChecksum


 Key: HDFS-2065
 URL: https://issues.apache.org/jira/browse/HDFS-2065
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0


The following code can throw NPE if callGetBlockLocations returns null.

If server returns null 

{code}
ListLocatedBlock locatedblocks
= callGetBlockLocations(namenode, src, 0, 
Long.MAX_VALUE).getLocatedBlocks();
{code}

The right fix for this is server should throw right exception.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2030) Fix the usability of namenode upgrade command

2011-06-09 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046800#comment-13046800
 ] 

Bharath Mundlapudi commented on HDFS-2030:
--

Thanks for the review, Suresh.
My comments inline.
 
1.1 Missing banner - done.
1.2 This method is package protected, this unit test just test this function 
instead of using time consuming MiniDFSCluster.
1.3 Removed the null and empty checks.
1.4 BoolpoolID is autogenerated. Now i have modified the tests to not mock 
this. 
1.5 Added assertEquals where necessary
1.6 Made multiple tests

2.1 Since the setBlockPoolID() and setClusterID() are in NNStorage, i moved 
this function to this class now solves this problem.
2.2 renamed the function
2.3 comments moved outside the function and moved the if condition inside the 
method.

Attaching the patch with these changes.

 Fix the usability of namenode upgrade command
 -

 Key: HDFS-2030
 URL: https://issues.apache.org/jira/browse/HDFS-2030
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
Priority: Minor
 Fix For: 0.23.0

 Attachments: HDFS-2030-1.patch


 Fixing the Namenode upgrade option along the same line as Namenode format 
 option. 
 If clusterid is not given then clusterid will be automatically generated for 
 the upgrade but if clusterid is given then it will be honored.
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2030) Fix the usability of namenode upgrade command

2011-06-09 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-2030:
-

Attachment: HDFS-2030-2.patch

Attached the patch.

 Fix the usability of namenode upgrade command
 -

 Key: HDFS-2030
 URL: https://issues.apache.org/jira/browse/HDFS-2030
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
Priority: Minor
 Fix For: 0.23.0

 Attachments: HDFS-2030-1.patch, HDFS-2030-2.patch


 Fixing the Namenode upgrade option along the same line as Namenode format 
 option. 
 If clusterid is not given then clusterid will be automatically generated for 
 the upgrade but if clusterid is given then it will be honored.
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2030) Fix the usability of namenode upgrade command

2011-06-09 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-2030:
-

Attachment: HDFS-2030-3.patch

Done some more minor cleanup related to comments and adding more description to 
test class.

Please find the attached patch.

 Fix the usability of namenode upgrade command
 -

 Key: HDFS-2030
 URL: https://issues.apache.org/jira/browse/HDFS-2030
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
Priority: Minor
 Fix For: 0.23.0

 Attachments: HDFS-2030-1.patch, HDFS-2030-2.patch, HDFS-2030-3.patch


 Fixing the Namenode upgrade option along the same line as Namenode format 
 option. 
 If clusterid is not given then clusterid will be automatically generated for 
 the upgrade but if clusterid is given then it will be honored.
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-2057) Wait time to terminate the threads causing unit tests to take longer time

2011-06-09 Thread Bharath Mundlapudi (JIRA)
Wait time to terminate the threads causing unit tests to take longer time
-

 Key: HDFS-2057
 URL: https://issues.apache.org/jira/browse/HDFS-2057
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.20.204.0, 0.20.205.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.20.205.0


As a part of datanode process hang, this part of code was introduced in 
0.20.204 to clean up all the waiting threads.

-  try {
-  readPool.awaitTermination(10, TimeUnit.SECONDS);
-  } catch (InterruptedException e) {
-   LOG.info(Exception occured in doStop: + e.getMessage());
-  }
-  readPool.shutdownNow();

This was clearly meant for production, but all the unit tests uses 
minidfscluster and minimrcluster for shutdown which waits on this part of the 
code. Due to this, we saw increase in unit test run times. So removing this 
code. 


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2057) Wait time to terminate the threads causing unit tests to take longer time

2011-06-09 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-2057:
-

Attachment: HDFS-2057-1.patch

Attaching the patch.

 Wait time to terminate the threads causing unit tests to take longer time
 -

 Key: HDFS-2057
 URL: https://issues.apache.org/jira/browse/HDFS-2057
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.20.204.0, 0.20.205.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.20.205.0

 Attachments: HDFS-2057-1.patch


 As a part of datanode process hang, this part of code was introduced in 
 0.20.204 to clean up all the waiting threads.
 -  try {
 -  readPool.awaitTermination(10, TimeUnit.SECONDS);
 -  } catch (InterruptedException e) {
 -   LOG.info(Exception occured in doStop: + e.getMessage());
 -  }
 -  readPool.shutdownNow();
 This was clearly meant for production, but all the unit tests uses 
 minidfscluster and minimrcluster for shutdown which waits on this part of the 
 code. Due to this, we saw increase in unit test run times. So removing this 
 code. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2030) Fix the usability of namenode upgrade command

2011-06-07 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-2030:
-

Attachment: HDFS-2030-1.patch

Attaching the patch

 Fix the usability of namenode upgrade command
 -

 Key: HDFS-2030
 URL: https://issues.apache.org/jira/browse/HDFS-2030
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
Priority: Minor
 Fix For: 0.23.0

 Attachments: HDFS-2030-1.patch


 Fixing the Namenode upgrade option along the same line as Namenode format 
 option. 
 If clusterid is not given then clusterid will be automatically generated for 
 the upgrade but if clusterid is given then it will be honored.
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2023) Backport of NPE for File.list and File.listFiles

2011-06-04 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044243#comment-13044243
 ] 

Bharath Mundlapudi commented on HDFS-2023:
--

I have run local test-patch on this patch and here are the results for this 
branch. -1s are not related to this patch. 

 [exec] -1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 9 new or 
modified tests.
 [exec]
 [exec] -1 javadoc.  The javadoc tool appears to have generated 1 
warning messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] -1 Eclipse classpath. The patch causes the Eclipse classpath to 
differ from the contents of the lib directories.


Javadoc 6 warnings and Eclipse classpath are not related to this patch.


  [javadoc] 
/export/space/branch-0.20-security.qa/hadoop-common/src/core/org/apache/hadoop/security/SecurityUtil.java:36:
 warning: sun.security.jgss.krb5.Krb5Util
is Sun proprietary API and may be removed in a future release
  [javadoc] import sun.security.jgss.krb5.Krb5Util;
  [javadoc]  ^
  [javadoc] 
/export/space/branch-0.20-security.qa/hadoop-common/src/core/org/apache/hadoop/security/SecurityUtil.java:37:
 warning: sun.security.krb5.Credentials is
 Sun proprietary API and may be removed in a future release
  [javadoc] import sun.security.krb5.Credentials;
  [javadoc] ^
  [javadoc] 
/export/space/branch-0.20-security.qa/hadoop-common/src/core/org/apache/hadoop/security/SecurityUtil.java:38:
 warning: sun.security.krb5.PrincipalName
is Sun proprietary API and may be removed in a future release
  [javadoc] import sun.security.krb5.PrincipalName;
  [javadoc] ^
  [javadoc] 
/export/space/branch-0.20-security.qa/hadoop-common/src/core/org/apache/hadoop/security/KerberosName.java:29:
 warning: sun.security.krb5.Config is Sun
proprietary API and may be removed in a future release
  [javadoc] import sun.security.krb5.Config;
  [javadoc] ^
  [javadoc] 
/export/space/branch-0.20-security.qa/hadoop-common/src/core/org/apache/hadoop/security/KerberosName.java:30:
 warning: sun.security.krb5.KrbException i
s Sun proprietary API and may be removed in a future release
  [javadoc] import sun.security.krb5.KrbException;
  [javadoc] ^
  [javadoc] 
/export/space/branch-0.20-security.qa/hadoop-common/src/core/org/apache/hadoop/security/KerberosName.java:76:
 warning: sun.security.krb5.Config is Sun proprietary API and may be removed in 
a future release
  [javadoc]   private static Config kerbConf;
  [javadoc]  ^
  [javadoc] Standard Doclet version 1.6.0_17
  [javadoc] Building tree for all the packages and classes...
  [javadoc] Building index for all the packages and classes...
  [javadoc] Building index for all classes...
  [javadoc] Generating 
/export/space/branch-0.20-security.qa/hadoop-common/build/docs/api/stylesheet.css...
  [javadoc] 6 warnings

---

 Backport of NPE for File.list and File.listFiles
 

 Key: HDFS-2023
 URL: https://issues.apache.org/jira/browse/HDFS-2023
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.20.205.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.20.205.0

 Attachments: HDFS-2023-1.patch


 Since we have multiple Jira's in trunk for common and hdfs, I am creating 
 another jira for this issue. 
 This patch addresses the following:
 1. Provides FileUtil API for list and listFiles which throws IOException for 
 null cases. 
 2. Replaces most of the code where JDK file API with FileUtil API. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-2030) Fix the usability of namenode upgrade command

2011-06-03 Thread Bharath Mundlapudi (JIRA)
Fix the usability of namenode upgrade command
-

 Key: HDFS-2030
 URL: https://issues.apache.org/jira/browse/HDFS-2030
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
Priority: Minor
 Fix For: 0.23.0


Fixing the Namenode upgrade option along the same line as Namenode format 
option. 

If clusterid is not given then clusterid will be automatically generated for 
the upgrade but if clusterid is given then it will be honored.

 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2014) RPM packages broke bin/hdfs script

2011-06-02 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13043079#comment-13043079
 ] 

Bharath Mundlapudi commented on HDFS-2014:
--

I have tested this patch on few cases like hdfs format and upgrade etc. This 
patch works. Without this patch users will run into issues for the trunk. can 
someone commit this patch if you don't have any comments?

 RPM packages broke bin/hdfs script
 --

 Key: HDFS-2014
 URL: https://issues.apache.org/jira/browse/HDFS-2014
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.23.0
Reporter: Todd Lipcon
Assignee: Eric Yang
Priority: Critical
 Fix For: 0.23.0

 Attachments: HDFS-2014-1.patch, HDFS-2014.patch


 bin/hdfs now appears to depend on ../libexec, which doesn't exist inside of a 
 source checkout:
 todd@todd-w510:~/git/hadoop-hdfs$ ./bin/hdfs namenode
 ./bin/hdfs: line 22: 
 /home/todd/git/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or 
 directory
 ./bin/hdfs: line 138: cygpath: command not found
 ./bin/hdfs: line 161: exec: : not found

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions

2011-06-01 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042432#comment-13042432
 ] 

Bharath Mundlapudi commented on HDFS-988:
-

I am just wondering, if we are calling os sync at all on this code path. All i 
see is flush call which flushes from EditLogOutputStream (java buffers) to 
kernel buffers.  

Shouldn't we be doing the following?

eStream.flush();
eStream.getFileOutputStream().getFD().sync();

This will make sure the edits are actually written to disk. Is there any reason 
for not doing this? 


 saveNamespace can corrupt edits log, apparently due to race conditions
 --

 Key: HDFS-988
 URL: https://issues.apache.org/jira/browse/HDFS-988
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.20-append, 0.21.0, 0.22.0
Reporter: dhruba borthakur
Assignee: Eli Collins
Priority: Blocker
 Fix For: 0.20-append, 0.22.0

 Attachments: HDFS-988_fix_synchs.patch, hdfs-988-2.patch, 
 hdfs-988-3.patch, hdfs-988-4.patch, hdfs-988.txt, saveNamespace.txt, 
 saveNamespace_20-append.patch


 The adminstrator puts the namenode is safemode and then issues the 
 savenamespace command. This can corrupt the edits log. The problem is that  
 when the NN enters safemode, there could still be pending logSycs occuring 
 from other threads. Now, the saveNamespace command, when executed, would save 
 a edits log with partial writes. I have seen this happen on 0.20.
 https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-2023) Backport of NPE for File.list and File.listFiles

2011-06-01 Thread Bharath Mundlapudi (JIRA)
Backport of NPE for File.list and File.listFiles


 Key: HDFS-2023
 URL: https://issues.apache.org/jira/browse/HDFS-2023
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.20.205.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.20.205.0


Since we have multiple Jira's in trunk for common and hdfs, I am creating 
another jira for this issue. 

This patch addresses the following:

1. Provides FileUtil API for list and listFiles which throws IOException for 
null cases. 
2. Replaces most of the code where JDK file API with FileUtil API. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2023) Backport of NPE for File.list and File.listFiles

2011-06-01 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-2023:
-

Attachment: HDFS-2023-1.patch

Attaching a patch for this issue.

 Backport of NPE for File.list and File.listFiles
 

 Key: HDFS-2023
 URL: https://issues.apache.org/jira/browse/HDFS-2023
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.20.205.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.20.205.0

 Attachments: HDFS-2023-1.patch


 Since we have multiple Jira's in trunk for common and hdfs, I am creating 
 another jira for this issue. 
 This patch addresses the following:
 1. Provides FileUtil API for list and listFiles which throws IOException for 
 null cases. 
 2. Replaces most of the code where JDK file API with FileUtil API. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-2023) Backport of NPE for File.list and File.listFiles

2011-06-01 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042519#comment-13042519
 ] 

Bharath Mundlapudi commented on HDFS-2023:
--

Hi Eli,

I wanted to have this change in the same Jira as 0.23 but those were reviewed 
and committed. So I created this one. Also, i could have done multiple patches 
in those same Jiras but this will be not good for reviwers. On the positive 
side, we can have this single Jira for all 0.20.*.

But i agree with you on having same Jira for backporting.


  

 Backport of NPE for File.list and File.listFiles
 

 Key: HDFS-2023
 URL: https://issues.apache.org/jira/browse/HDFS-2023
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.20.205.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.20.205.0

 Attachments: HDFS-2023-1.patch


 Since we have multiple Jira's in trunk for common and hdfs, I am creating 
 another jira for this issue. 
 This patch addresses the following:
 1. Provides FileUtil API for list and listFiles which throws IOException for 
 null cases. 
 2. Replaces most of the code where JDK file API with FileUtil API. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-2019) Fix all the places where Java method File.list is used with FileUtil.list API

2011-05-31 Thread Bharath Mundlapudi (JIRA)
Fix all the places where Java method File.list is used with FileUtil.list API
-

 Key: HDFS-2019
 URL: https://issues.apache.org/jira/browse/HDFS-2019
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
Priority: Minor
 Fix For: 0.23.0


This new method FileUtil.list will throw an exception when disk is bad rather 
than returning null. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-2019) Fix all the places where Java method File.list is used with FileUtil.list API

2011-05-31 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-2019:
-

Attachment: HDFS-2019-1.patch

Attaching a patch.

 Fix all the places where Java method File.list is used with FileUtil.list API
 -

 Key: HDFS-2019
 URL: https://issues.apache.org/jira/browse/HDFS-2019
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
Priority: Minor
 Fix For: 0.23.0

 Attachments: HDFS-2019-1.patch


 This new method FileUtil.list will throw an exception when disk is bad rather 
 than returning null. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1934) Fix NullPointerException when certain File APIs return null

2011-05-31 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1934:
-

Attachment: HDFS-1934-5.patch

Reattaching a patch with minor correction for logging.

 Fix NullPointerException when certain File APIs return null
 ---

 Key: HDFS-1934
 URL: https://issues.apache.org/jira/browse/HDFS-1934
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0

 Attachments: HDFS-1934-1.patch, HDFS-1934-2.patch, HDFS-1934-3.patch, 
 HDFS-1934-4.patch, HDFS-1934-5.patch


 While testing Disk Fail Inplace, We encountered the NPE from this part of the 
 code. 
 File[] files = dir.listFiles();
 for (File f : files) {
 ...
 }
 This is kinda of an API issue. When a disk is bad (or name is not a 
 directory), this API (listFiles, list) return null rather than throwing an 
 exception. This 'for loop' throws a NPE exception. And same applies to 
 dir.list() API.
 Fix all the places where null condition was not checked.
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1934) Fix NullPointerException when certain File APIs return null

2011-05-27 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1934:
-

Status: Patch Available  (was: Open)

 Fix NullPointerException when certain File APIs return null
 ---

 Key: HDFS-1934
 URL: https://issues.apache.org/jira/browse/HDFS-1934
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0

 Attachments: HDFS-1934-1.patch, HDFS-1934-2.patch, HDFS-1934-3.patch


 While testing Disk Fail Inplace, We encountered the NPE from this part of the 
 code. 
 File[] files = dir.listFiles();
 for (File f : files) {
 ...
 }
 This is kinda of an API issue. When a disk is bad (or name is not a 
 directory), this API (listFiles, list) return null rather than throwing an 
 exception. This 'for loop' throws a NPE exception. And same applies to 
 dir.list() API.
 Fix all the places where null condition was not checked.
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1934) Fix NullPointerException when certain File APIs return null

2011-05-27 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13040121#comment-13040121
 ] 

Bharath Mundlapudi commented on HDFS-1934:
--

Right, this patch is trying to address exactly what you have mentioned.

 Fix NullPointerException when certain File APIs return null
 ---

 Key: HDFS-1934
 URL: https://issues.apache.org/jira/browse/HDFS-1934
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0

 Attachments: HDFS-1934-1.patch, HDFS-1934-2.patch, HDFS-1934-3.patch


 While testing Disk Fail Inplace, We encountered the NPE from this part of the 
 code. 
 File[] files = dir.listFiles();
 for (File f : files) {
 ...
 }
 This is kinda of an API issue. When a disk is bad (or name is not a 
 directory), this API (listFiles, list) return null rather than throwing an 
 exception. This 'for loop' throws a NPE exception. And same applies to 
 dir.list() API.
 Fix all the places where null condition was not checked.
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1963) HDFS rpm integration project

2011-05-27 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13040371#comment-13040371
 ] 

Bharath Mundlapudi commented on HDFS-1963:
--

This change seems like breaking Mac builds.If i do ant binary with this patch i 
am running into this issue.

BUILD FAILED
hdfs/build.xml:1114: 
/Users/bharathm/work/projects/hadoop-trunk/hdfs.patch/hdfs/build/c++/Mac_OS_X-x86_64-64/lib
 does not exist.
 



 HDFS rpm integration project
 

 Key: HDFS-1963
 URL: https://issues.apache.org/jira/browse/HDFS-1963
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: build
 Environment: Java 6, RHEL 5.5
Reporter: Eric Yang
Assignee: Eric Yang
 Fix For: 0.23.0

 Attachments: HDFS-1963-1.patch, HDFS-1963-2.patch, HDFS-1963-3.patch, 
 HDFS-1963-4.patch, HDFS-1963-5.patch, HDFS-1963-6.patch, HDFS-1963.patch


 This jira is corresponding to HADOOP-6255 and associated directory layout 
 change.  The patch for creating HDFS rpm packaging should be posted here for 
 patch test build to verify against hdfs svn trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1934) Fix NullPointerException when certain File APIs return null

2011-05-27 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1934:
-

Attachment: HDFS-1934-4.patch

Thanks for reviewing, Matt. Attaching the patch with this change.

 Fix NullPointerException when certain File APIs return null
 ---

 Key: HDFS-1934
 URL: https://issues.apache.org/jira/browse/HDFS-1934
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0

 Attachments: HDFS-1934-1.patch, HDFS-1934-2.patch, HDFS-1934-3.patch, 
 HDFS-1934-4.patch


 While testing Disk Fail Inplace, We encountered the NPE from this part of the 
 code. 
 File[] files = dir.listFiles();
 for (File f : files) {
 ...
 }
 This is kinda of an API issue. When a disk is bad (or name is not a 
 directory), this API (listFiles, list) return null rather than throwing an 
 exception. This 'for loop' throws a NPE exception. And same applies to 
 dir.list() API.
 Fix all the places where null condition was not checked.
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1836) Thousand of CLOSE_WAIT socket

2011-05-27 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1836:
-

Fix Version/s: 0.20.205.0

 Thousand of CLOSE_WAIT socket 
 --

 Key: HDFS-1836
 URL: https://issues.apache.org/jira/browse/HDFS-1836
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.20.2
 Environment: Linux 2.6.18-194.32.1.el5 #1 SMP Wed Jan 5 17:52:25 EST 
 2011 x86_64 x86_64 x86_64 GNU/Linux
 java version 1.6.0_23
 Java(TM) SE Runtime Environment (build 1.6.0_23-b05)
 Java HotSpot(TM) 64-Bit Server VM (build 19.0-b09, mixed mode)
Reporter: Dennis Cheung
Assignee: Todd Lipcon
 Fix For: 0.20.3, 0.20.205.0

 Attachments: hdfs-1836-0.20.txt, hdfs-1836-0.20.txt, 
 patch-draft-1836.patch


 $ /usr/sbin/lsof -i TCP:50010 | grep -c CLOSE_WAIT
 4471
 It is better if everything runs normal. 
 However, from time to time there are some DataStreamer Exception: 
 java.net.SocketTimeoutException and DFSClient.processDatanodeError(2507) | 
 Error Recovery for can be found from log file and the number of CLOSE_WAIT 
 socket just keep increasing
 The CLOSE_WAIT handles may remain for hours and days; then Too many open 
 file some day.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1836) Thousand of CLOSE_WAIT socket

2011-05-27 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1836:
-

Attachment: hdfs-1836-0.20.205.txt

Attaching a patch for 0.20.205 version. I just eliminated some hunks.

 Thousand of CLOSE_WAIT socket 
 --

 Key: HDFS-1836
 URL: https://issues.apache.org/jira/browse/HDFS-1836
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.20.2
 Environment: Linux 2.6.18-194.32.1.el5 #1 SMP Wed Jan 5 17:52:25 EST 
 2011 x86_64 x86_64 x86_64 GNU/Linux
 java version 1.6.0_23
 Java(TM) SE Runtime Environment (build 1.6.0_23-b05)
 Java HotSpot(TM) 64-Bit Server VM (build 19.0-b09, mixed mode)
Reporter: Dennis Cheung
Assignee: Todd Lipcon
 Fix For: 0.20.3, 0.20.205.0

 Attachments: hdfs-1836-0.20.205.txt, hdfs-1836-0.20.txt, 
 hdfs-1836-0.20.txt, patch-draft-1836.patch


 $ /usr/sbin/lsof -i TCP:50010 | grep -c CLOSE_WAIT
 4471
 It is better if everything runs normal. 
 However, from time to time there are some DataStreamer Exception: 
 java.net.SocketTimeoutException and DFSClient.processDatanodeError(2507) | 
 Error Recovery for can be found from log file and the number of CLOSE_WAIT 
 socket just keep increasing
 The CLOSE_WAIT handles may remain for hours and days; then Too many open 
 file some day.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1934) Fix NullPointerException when certain File APIs return null

2011-05-26 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1934:
-

Attachment: HDFS-1934-1.patch

Attaching a patch which addresses this problem.

 Fix NullPointerException when certain File APIs return null
 ---

 Key: HDFS-1934
 URL: https://issues.apache.org/jira/browse/HDFS-1934
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0

 Attachments: HDFS-1934-1.patch


 While testing Disk Fail Inplace, We encountered the NPE from this part of the 
 code. 
 File[] files = dir.listFiles();
 for (File f : files) {
 ...
 }
 This is kinda of an API issue. When a disk is bad (or name is not a 
 directory), this API (listFiles, list) return null rather than throwing an 
 exception. This 'for loop' throws a NPE exception. And same applies to 
 dir.list() API.
 Fix all the places where null condition was not checked.
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1934) Fix NullPointerException when certain File APIs return null

2011-05-26 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1934:
-

Attachment: HDFS-1934-2.patch

Adding this check at another location. So the updated patch.

 Fix NullPointerException when certain File APIs return null
 ---

 Key: HDFS-1934
 URL: https://issues.apache.org/jira/browse/HDFS-1934
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0

 Attachments: HDFS-1934-1.patch, HDFS-1934-2.patch


 While testing Disk Fail Inplace, We encountered the NPE from this part of the 
 code. 
 File[] files = dir.listFiles();
 for (File f : files) {
 ...
 }
 This is kinda of an API issue. When a disk is bad (or name is not a 
 directory), this API (listFiles, list) return null rather than throwing an 
 exception. This 'for loop' throws a NPE exception. And same applies to 
 dir.list() API.
 Fix all the places where null condition was not checked.
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1943) fail to start datanode while start-dfs.sh is executed by root user

2011-05-25 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13039320#comment-13039320
 ] 

Bharath Mundlapudi commented on HDFS-1943:
--

+1 to this patch. 

 fail to start datanode while start-dfs.sh is executed by root user
 --

 Key: HDFS-1943
 URL: https://issues.apache.org/jira/browse/HDFS-1943
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.23.0
Reporter: Wei Yongjun
Priority: Blocker
 Fix For: 0.23.0

 Attachments: HDFS-1943.patch


 When start-dfs.sh is run by root user, we got the following error message:
 # start-dfs.sh
 Starting namenodes on [localhost ]
 localhost: namenode running as process 2556. Stop it first.
 localhost: starting datanode, logging to 
 /usr/hadoop/hadoop-common-0.23.0-SNAPSHOT/bin/../logs/hadoop-root-datanode-cspf01.out
 localhost: Unrecognized option: -jvm
 localhost: Could not create the Java virtual machine.
 The -jvm options should be passed to jsvc when we starting a secure
 datanode, but it still passed to java when start-dfs.sh is run by root
 while secure datanode is disabled. This is a bug of bin/hdfs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1934) Fix NullPointerException when certain File APIs return null

2011-05-25 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1934:
-

Affects Version/s: (was: 0.20.205.0)
Fix Version/s: (was: 0.20.205.0)

 Fix NullPointerException when certain File APIs return null
 ---

 Key: HDFS-1934
 URL: https://issues.apache.org/jira/browse/HDFS-1934
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0


 While testing Disk Fail Inplace, We encountered the NPE from this part of the 
 code. 
 File[] files = dir.listFiles();
 for (File f : files) {
 ...
 }
 This is kinda of an API issue. When a disk is bad (or name is not a 
 directory), this API (listFiles, list) return null rather than throwing an 
 exception. This 'for loop' throws a NPE exception. And same applies to 
 dir.list() API.
 Fix all the places where null condition was not checked.
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1592) Datanode startup doesn't honor volumes.tolerated

2011-05-24 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1592:
-

Attachment: HDFS-1592-5.patch

Thanks the review, Eli and Jitendra. I am attaching a patch which incorporates 
your comments. 

 Datanode startup doesn't honor volumes.tolerated 
 -

 Key: HDFS-1592
 URL: https://issues.apache.org/jira/browse/HDFS-1592
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20.204.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.20.204.0, 0.23.0

 Attachments: HDFS-1592-1.patch, HDFS-1592-2.patch, HDFS-1592-3.patch, 
 HDFS-1592-4.patch, HDFS-1592-5.patch, HDFS-1592-rel20.patch


 Datanode startup doesn't honor volumes.tolerated for hadoop 20 version.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1592) Datanode startup doesn't honor volumes.tolerated

2011-05-20 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1592:
-

Attachment: HDFS-1592-4.patch

Attaching a patch with more unit tests.

 Datanode startup doesn't honor volumes.tolerated 
 -

 Key: HDFS-1592
 URL: https://issues.apache.org/jira/browse/HDFS-1592
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20.204.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.20.204.0, 0.23.0

 Attachments: HDFS-1592-1.patch, HDFS-1592-2.patch, HDFS-1592-3.patch, 
 HDFS-1592-4.patch, HDFS-1592-rel20.patch


 Datanode startup doesn't honor volumes.tolerated for hadoop 20 version.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1592) Datanode startup doesn't honor volumes.tolerated

2011-05-20 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036716#comment-13036716
 ] 

Bharath Mundlapudi commented on HDFS-1592:
--

Eli,

I have added more unit tests as mentioned above. Also, note that, the case you 
pointed is a rare condition. In our tests, making file system readonly through 
mount or umounting disks or even setting permission one level above, we will 
not hit this issue. Only, when we set the permission on this particular 
directory then only we will hit this issue. Anyways, i have fixed the case you 
pointed also. 

Thanks for spotting this though.


 

 Datanode startup doesn't honor volumes.tolerated 
 -

 Key: HDFS-1592
 URL: https://issues.apache.org/jira/browse/HDFS-1592
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20.204.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.20.204.0, 0.23.0

 Attachments: HDFS-1592-1.patch, HDFS-1592-2.patch, HDFS-1592-3.patch, 
 HDFS-1592-4.patch, HDFS-1592-rel20.patch


 Datanode startup doesn't honor volumes.tolerated for hadoop 20 version.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1592) Datanode startup doesn't honor volumes.tolerated

2011-05-20 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13036929#comment-13036929
 ] 

Bharath Mundlapudi commented on HDFS-1592:
--

These failing tests are not related to this patch. 

Eli, If you don't have any comments, we will commit this patch today.  



 Datanode startup doesn't honor volumes.tolerated 
 -

 Key: HDFS-1592
 URL: https://issues.apache.org/jira/browse/HDFS-1592
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20.204.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.20.204.0, 0.23.0

 Attachments: HDFS-1592-1.patch, HDFS-1592-2.patch, HDFS-1592-3.patch, 
 HDFS-1592-4.patch, HDFS-1592-rel20.patch


 Datanode startup doesn't honor volumes.tolerated for hadoop 20 version.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1592) Datanode startup doesn't honor volumes.tolerated

2011-05-18 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035533#comment-13035533
 ] 

Bharath Mundlapudi commented on HDFS-1592:
--

Eli, thanks for your review and comments. 

Yes, I have tested against trunk. How did you test this? Did you configure 
volumes tolerated correctly?
The expected behavior is - if volumes failed are more than volumes tolerated, 
BPOfferService daemon will fail to start.

Also, note that, i have filed another Jira for - if all BPService exit due to 
some reason, Datanode should exit. This is a bug in the current code.  

Please see the following four tests i have performed and their outcome on trunk.

Case 1: One disk failure (/grid/2) and Vol Tolerated = 0. Outcome: BP Service 
should exit.

11/05/18 07:48:56 WARN common.Util: Path /grid/0/testing/hadoop-logs/dfs/data 
should be specified as a URI in configuration files. Please update hdfs 
configuration.
11/05/18 07:48:56 WARN common.Util: Path /grid/1/testing/hadoop-logs/dfs/data 
should be specified as a URI in configuration files. Please update hdfs 
configuration.
11/05/18 07:48:56 WARN common.Util: Path /grid/2/testing/hadoop-logs/dfs/data 
should be specified as a URI in configuration files. Please update hdfs 
configuration.
11/05/18 07:48:56 WARN common.Util: Path /grid/3/testing/hadoop-logs/dfs/data 
should be specified as a URI in configuration files. Please update hdfs 
configuration.
11/05/18 07:48:56 WARN datanode.DataNode: Invalid directory in: 
dfs.datanode.data.dir: 
java.io.FileNotFoundException: File file:/grid/2/testing/hadoop-logs/dfs/data 
does not exist.
at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:424)
at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:315)
at 
org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:131)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:148)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.getDataDirsFromURIs(DataNode.java:2154)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2133)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2074)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2097)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2240)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2250)
11/05/18 07:48:56 INFO impl.MetricsConfig: loaded properties from 
hadoop-metrics2.properties
11/05/18 07:48:56 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 
second(s).
11/05/18 07:48:56 INFO impl.MetricsSystemImpl: DataNode metrics system started
11/05/18 07:48:56 INFO impl.MetricsSystemImpl: Registered source UgiMetrics
11/05/18 07:48:56 INFO datanode.DataNode: Opened info server at 50010
11/05/18 07:48:56 INFO datanode.DataNode: Balancing bandwith is 1048576 bytes/s
11/05/18 07:48:56 INFO mortbay.log: Logging to 
org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
11/05/18 07:48:56 INFO http.HttpServer: Added global filtersafety 
(class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
11/05/18 07:48:56 INFO http.HttpServer: Port returned by 
webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the 
listener on 50075
11/05/18 07:48:56 INFO http.HttpServer: listener.getLocalPort() returned 50075 
webServer.getConnectors()[0].getLocalPort() returned 50075
11/05/18 07:48:56 INFO http.HttpServer: Jetty bound to port 50075
11/05/18 07:48:56 INFO mortbay.log: jetty-6.1.14
11/05/18 07:48:56 WARN mortbay.log: Can't reuse 
/tmp/Jetty_0_0_0_0_50075_datanodehwtdwq, using 
/tmp/Jetty_0_0_0_0_50075_datanodehwtdwq_6441176730816569391
11/05/18 07:49:01 INFO mortbay.log: Started SelectChannelConnector@0.0.0.0:50075
11/05/18 07:49:01 INFO ipc.Server: Starting Socket Reader #1 for port 50020
11/05/18 07:49:01 INFO ipc.Server: Starting Socket Reader #2 for port 50020
11/05/18 07:49:01 INFO ipc.Server: Starting Socket Reader #3 for port 50020
11/05/18 07:49:01 INFO ipc.Server: Starting Socket Reader #4 for port 50020
11/05/18 07:49:01 INFO ipc.Server: Starting Socket Reader #5 for port 50020
11/05/18 07:49:01 INFO impl.MetricsSystemImpl: Registered source 
RpcActivityForPort50020
11/05/18 07:49:01 INFO impl.MetricsSystemImpl: Registered source 
RpcDetailedActivityForPort50020
11/05/18 07:49:01 INFO impl.MetricsSystemImpl: Registered source JvmMetrics
11/05/18 07:49:01 INFO impl.MetricsSystemImpl: Registered source 
DataNodeActivity-hadooplab40.yst.corp.yahoo.com-50010
11/05/18 07:49:01 INFO datanode.DataNode: 
DatanodeRegistration(hadooplab40.yst.corp.yahoo.com:50010, storageID=, 
infoPort=50075, ipcPort=50020, storageInfo=lv=0;cid=;nsid=0;c=0)In 

[jira] [Updated] (HDFS-1905) Improve the usability of namenode -format

2011-05-18 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1905:
-

Attachment: HDFS-1905-2.patch

Attaching a patch based on comments. Preserved the previous semantics.

 Improve the usability of namenode -format 
 --

 Key: HDFS-1905
 URL: https://issues.apache.org/jira/browse/HDFS-1905
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
Priority: Minor
 Fix For: 0.23.0

 Attachments: HDFS-1905-1.patch, HDFS-1905-2.patch


 While setting up 0.23 version based cluster, i ran into this issue. When i 
 issue a format namenode command, which got changed in 23, it should let the 
 user know to how to use this command in case where complete options were not 
 specified.
 ./hdfs namenode -format
 I get the following error msg, still its not clear what and how user should 
 use this command.
 11/05/09 15:36:25 ERROR namenode.NameNode: 
 java.lang.IllegalArgumentException: Format must be provided with clusterid
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1483)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1623)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1689)
  
 The usability of this command can be improved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1941) Remove -genclusterid from NameNode startup options

2011-05-18 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035980#comment-13035980
 ] 

Bharath Mundlapudi commented on HDFS-1941:
--

Failed tests are not related to this patch.

 Remove -genclusterid from NameNode startup options
 --

 Key: HDFS-1941
 URL: https://issues.apache.org/jira/browse/HDFS-1941
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
Priority: Minor
 Attachments: HDFS-1941-1.patch


 Currently, namenode -genclusterid is a helper utility to generate unique 
 clusterid. This option is useless once namenode -format automatically 
 generates the clusterid.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1592) Datanode startup doesn't honor volumes.tolerated

2011-05-18 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1592:
-

Attachment: HDFS-1592-3.patch

Attaching a patch which addresses Eli's comments.

 Datanode startup doesn't honor volumes.tolerated 
 -

 Key: HDFS-1592
 URL: https://issues.apache.org/jira/browse/HDFS-1592
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20.204.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.20.204.0, 0.23.0

 Attachments: HDFS-1592-1.patch, HDFS-1592-2.patch, HDFS-1592-3.patch, 
 HDFS-1592-rel20.patch


 Datanode startup doesn't honor volumes.tolerated for hadoop 20 version.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1592) Datanode startup doesn't honor volumes.tolerated

2011-05-18 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13035990#comment-13035990
 ] 

Bharath Mundlapudi commented on HDFS-1592:
--

First, Thank you for identifying this issue, Eli. Great job!

Couple of comments,
1. We did test couple of things like masking permissions still dfs level. That 
didn't catch this issue. You pointed in making specific directory permissions 
helped us to reproduce this case. Thanks again.
2. We tested by unmounting disks also.
3. Then we tested with injecting failures at kernel level. 

Regarding testcases,
I agree with you that we need more tests, But I think, we should do that in 
another jira. Since, we have already spent lot of effort in manual testing. Can 
we file another Jira to track this? 

With this new patch, i have tested following new cases. Can you please review 
and provide your feedback?

case 1: All four good volumes, Vol Tolerated=1, expected outcome = BPservice 
should start

11/05/19 04:57:51 INFO datanode.DataNode: FSDataset added volume - 
/grid/0/testing/hadoop-logs/dfs/data/current
11/05/19 04:57:51 INFO datanode.DataNode: FSDataset added volume - 
/grid/1/testing/hadoop-logs/dfs/data/current
11/05/19 04:57:51 INFO datanode.DataNode: FSDataset added volume - 
/grid/2/testing/hadoop-logs/dfs/data/current
11/05/19 04:57:51 INFO datanode.DataNode: FSDataset added volume - 
/grid/3/testing/hadoop-logs/dfs/data/current
11/05/19 04:57:51 INFO datanode.DataNode: Registered FSDatasetState MBean
11/05/19 04:57:51 INFO datanode.DataNode: Adding block pool 
BP-1694914230-10.72.86.55-1305704227822
11/05/19 04:57:51 INFO datanode.DirectoryScanner: Periodic Directory Tree 
Verification scan starting at 1305782678947 with interval 2160
11/05/19 04:57:51 INFO datanode.DataNode: in register: 
sid=DS-340618566-10.72.86.55-50010-1305704313207;SI=lv=-35;cid=test;nsid=413952175;c=0
11/05/19 04:57:51 INFO datanode.DataNode: bpReg after 
=lv=-35;cid=test;nsid=413952175;c=0;sid=DS-340618566-10.72.86.55-50010-1305704313207;name=127.0.0.1:50010
11/05/19 04:57:51 INFO datanode.DataNode: in 
register:;bpDNR=lv=-35;cid=test;nsid=413952175;c=0
11/05/19 04:57:51 INFO datanode.DataNode: For namenode localhost/127.0.0.1:8020 
using BLOCKREPORT_INTERVAL of 2160msec Initial delay: 0msec; 
heartBeatInterval=3000
11/05/19 04:57:51 INFO datanode.DataNode: BlockReport of 0 blocks got processed 
in 3 msecs
11/05/19 04:57:51 INFO datanode.DataNode: sent block report, processed 
command:org.apache.hadoop.hdfs.server.protocol.DatanodeCommand$Finalize@3e5a91
11/05/19 04:57:51 INFO datanode.BlockPoolSliceScanner: Periodic Block 
Verification scan initialized with interval 181440.
11/05/19 04:57:51 INFO datanode.DataBlockScanner: Added 
bpid=BP-1694914230-10.72.86.55-1305704227822 to blockPoolScannerMap, new size=1
11/05/19 04:57:56 INFO datanode.BlockPoolSliceScanner: Starting a new period : 
work left in prev period : 0.00%

case 2: One failed volume(/grid/2), three good volumes, Vol Tolerated=1, 
expected outcome = BPService should start

11/05/19 05:01:27 INFO common.Storage: Storage directory 
/grid/2/testing/hadoop-logs/dfs/data is not formatted.
11/05/19 05:01:27 INFO common.Storage: Formatting ...
11/05/19 05:01:27 WARN common.Storage: Invalid directory in: 
/grid/2/testing/hadoop-logs/dfs/data/current/BP-1694914230-10.72.86.55-1305704227822:
 File 
file:/grid/2/testing/hadoop-logs/dfs/data/current/BP-1694914230-10.72.86.55-1305704227822
 does not exist.
11/05/19 05:01:27 INFO common.Storage: Locking is disabled
11/05/19 05:01:27 INFO common.Storage: Locking is disabled
11/05/19 05:01:27 INFO common.Storage: Storage directory 
/grid/2/testing/hadoop-logs/dfs/data/current/BP-1694914230-10.72.86.55-1305704227822
 does not exist.
11/05/19 05:01:27 INFO common.Storage: Storage directory 
/grid/2/testing/hadoop-logs/dfs/data/current/BP-1694914230-10.72.86.55-1305704227822
 does not exist.
11/05/19 05:01:27 INFO common.Storage: Locking is disabled
11/05/19 05:01:27 INFO datanode.DataNode: setting up storage: 
nsid=0;bpid=BP-1694914230-10.72.86.55-1305704227822;lv=-35;nsInfo=lv=-35;cid=test;nsid=413952175;c=0;bpid=BP-1694914230-10.72.86.55-1305704227822
11/05/19 05:01:27 INFO datanode.DataNode: FSDataset added volume - 
/grid/0/testing/hadoop-logs/dfs/data/current
11/05/19 05:01:27 INFO datanode.DataNode: FSDataset added volume - 
/grid/1/testing/hadoop-logs/dfs/data/current
11/05/19 05:01:27 INFO datanode.DataNode: FSDataset added volume - 
/grid/3/testing/hadoop-logs/dfs/data/current
11/05/19 05:01:27 INFO datanode.DataNode: Registered FSDatasetState MBean
11/05/19 05:01:27 INFO datanode.DataNode: Adding block pool 
BP-1694914230-10.72.86.55-1305704227822
11/05/19 05:01:27 INFO datanode.DirectoryScanner: Periodic Directory Tree 
Verification scan starting at 1305789604425 with interval 2160
11/05/19 05:01:27 INFO datanode.DataNode: in register: 

[jira] [Commented] (HDFS-1592) Datanode startup doesn't honor volumes.tolerated

2011-05-16 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033885#comment-13033885
 ] 

Bharath Mundlapudi commented on HDFS-1592:
--

Yes, what you mentioned w.r.t usecases are right.

* A DN will successfully start with a failed volume as long as it's 
configured to tolerate a failed volume
* A DN will fail to start if more than the number of tolerated volumes are 
failed

This is the expected behavior with this patch. 

I had some difficulty in failing the disks through the unit tests. If we set 
the directory permissions to not writable, then once we run datanode, it will 
reset the directory permissions and test will always succeed. 

These tests were done outside of unit tests through umount -l etc. All the 
above mentioned cases were manually tested. 




 Datanode startup doesn't honor volumes.tolerated 
 -

 Key: HDFS-1592
 URL: https://issues.apache.org/jira/browse/HDFS-1592
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20.204.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.20.204.0, 0.23.0

 Attachments: HDFS-1592-1.patch, HDFS-1592-2.patch, 
 HDFS-1592-rel20.patch


 Datanode startup doesn't honor volumes.tolerated for hadoop 20 version.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-1942) If all Block Pool service threads exit then datanode should exit.

2011-05-16 Thread Bharath Mundlapudi (JIRA)
If all Block Pool service threads exit then datanode should exit.
-

 Key: HDFS-1942
 URL: https://issues.apache.org/jira/browse/HDFS-1942
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi


Currently, if all block pool service threads exit, Datanode continue to run. 
This should be fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1941) Remove -genclusterid from NameNode startup options

2011-05-16 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1941:
-

Attachment: HDFS-1941-1.patch

Attaching the patch which address this jira.

 Remove -genclusterid from NameNode startup options
 --

 Key: HDFS-1941
 URL: https://issues.apache.org/jira/browse/HDFS-1941
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
Priority: Minor
 Attachments: HDFS-1941-1.patch


 Currently, namenode -genclusterid is a helper utility to generate unique 
 clusterid. This option is useless once namenode -format automatically 
 generates the clusterid.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1905) Improve the usability of namenode -format

2011-05-16 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034301#comment-13034301
 ] 

Bharath Mundlapudi commented on HDFS-1905:
--

Thanks for the review, Suresh.

My comments:
1  3. From a high level, format returns boolean. Semantically, if this 
operation was successful, we should return true else we should return false.

The previous code was bit not right. If format was successful, it was returning 
false. Even if user opts to not format, format operation as such was not 
successful, so we should return false. So, i changed this part also.  

Let me know, if these assumptions are not correct, i will back to previous 
semantics.

I will fix the doc and tests. Sorry, i missed this part.

Regarding upgrade, do you want me to do in another Jira? Since, this was just 
filed for format usability.


 

 Improve the usability of namenode -format 
 --

 Key: HDFS-1905
 URL: https://issues.apache.org/jira/browse/HDFS-1905
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
Priority: Minor
 Fix For: 0.23.0

 Attachments: HDFS-1905-1.patch


 While setting up 0.23 version based cluster, i ran into this issue. When i 
 issue a format namenode command, which got changed in 23, it should let the 
 user know to how to use this command in case where complete options were not 
 specified.
 ./hdfs namenode -format
 I get the following error msg, still its not clear what and how user should 
 use this command.
 11/05/09 15:36:25 ERROR namenode.NameNode: 
 java.lang.IllegalArgumentException: Format must be provided with clusterid
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1483)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1623)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1689)
  
 The usability of this command can be improved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1905) Improve the usability of namenode -format

2011-05-16 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034320#comment-13034320
 ] 

Bharath Mundlapudi commented on HDFS-1905:
--

For Comment 2: Lets say that, i want to format a namenode which is part of a 
particular cluster. Reusing the cluster id is useful here - I just want to 
format this namenode and also want it to be part of same cluster.



 Improve the usability of namenode -format 
 --

 Key: HDFS-1905
 URL: https://issues.apache.org/jira/browse/HDFS-1905
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
Priority: Minor
 Fix For: 0.23.0

 Attachments: HDFS-1905-1.patch


 While setting up 0.23 version based cluster, i ran into this issue. When i 
 issue a format namenode command, which got changed in 23, it should let the 
 user know to how to use this command in case where complete options were not 
 specified.
 ./hdfs namenode -format
 I get the following error msg, still its not clear what and how user should 
 use this command.
 11/05/09 15:36:25 ERROR namenode.NameNode: 
 java.lang.IllegalArgumentException: Format must be provided with clusterid
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1483)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1623)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1689)
  
 The usability of this command can be improved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1905) Improve the usability of namenode -format

2011-05-15 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1905:
-

Attachment: HDFS-1905-1.patch

 Improve the usability of namenode -format 
 --

 Key: HDFS-1905
 URL: https://issues.apache.org/jira/browse/HDFS-1905
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
Priority: Minor
 Fix For: 0.23.0

 Attachments: HDFS-1905-1.patch


 While setting up 0.23 version based cluster, i ran into this issue. When i 
 issue a format namenode command, which got changed in 23, it should let the 
 user know to how to use this command in case where complete options were not 
 specified.
 ./hdfs namenode -format
 I get the following error msg, still its not clear what and how user should 
 use this command.
 11/05/09 15:36:25 ERROR namenode.NameNode: 
 java.lang.IllegalArgumentException: Format must be provided with clusterid
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1483)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1623)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1689)
  
 The usability of this command can be improved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1905) Improve the usability of namenode -format

2011-05-15 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033655#comment-13033655
 ] 

Bharath Mundlapudi commented on HDFS-1905:
--

Attached the patch which addressing the following:

1. clusterid will be automatically generated if user doesn't provide one.
2. Admins can specify clusterid with -clusterid option.


 Improve the usability of namenode -format 
 --

 Key: HDFS-1905
 URL: https://issues.apache.org/jira/browse/HDFS-1905
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
Priority: Minor
 Fix For: 0.23.0

 Attachments: HDFS-1905-1.patch


 While setting up 0.23 version based cluster, i ran into this issue. When i 
 issue a format namenode command, which got changed in 23, it should let the 
 user know to how to use this command in case where complete options were not 
 specified.
 ./hdfs namenode -format
 I get the following error msg, still its not clear what and how user should 
 use this command.
 11/05/09 15:36:25 ERROR namenode.NameNode: 
 java.lang.IllegalArgumentException: Format must be provided with clusterid
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1483)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1623)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1689)
  
 The usability of this command can be improved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1692) In secure mode, Datanode process doesn't exit when disks fail.

2011-05-15 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033868#comment-13033868
 ] 

Bharath Mundlapudi commented on HDFS-1692:
--

Yes, I will be porting this one to trunk. We run our clusters in secured mode. 

When a volume tolerated threshold is reached, shutdown is called but datanode 
continue to run and doesn't exit. This change will address only in secured mode 
and non secure mode shouldn't have this problem.




 In secure mode, Datanode process doesn't exit when disks fail.
 --

 Key: HDFS-1692
 URL: https://issues.apache.org/jira/browse/HDFS-1692
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.20.204.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.20.204.0

 Attachments: HDFS-1692-1.patch


 In secure mode, when disks fail more than volumes tolerated, datanode process 
 doesn't exit properly and it just hangs even though shutdown method is 
 called. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1692) In secure mode, Datanode process doesn't exit when disks fail.

2011-05-15 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033880#comment-13033880
 ] 

Bharath Mundlapudi commented on HDFS-1692:
--

As i was tracking this hang issue, i have cleaned up some threads which were 
not exiting. So the change to ipc/Server.java. But yes we can move this 
particular code to other Jira. For 0.23, we can do separately. 

 In secure mode, Datanode process doesn't exit when disks fail.
 --

 Key: HDFS-1692
 URL: https://issues.apache.org/jira/browse/HDFS-1692
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.20.204.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.20.204.0

 Attachments: HDFS-1692-1.patch


 In secure mode, when disks fail more than volumes tolerated, datanode process 
 doesn't exit properly and it just hangs even though shutdown method is 
 called. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-1941) Remove -genclusterid from NameNode startup options

2011-05-14 Thread Bharath Mundlapudi (JIRA)
Remove -genclusterid from NameNode startup options
--

 Key: HDFS-1941
 URL: https://issues.apache.org/jira/browse/HDFS-1941
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
Priority: Minor


Currently, namenode -genclusterid is a helper utility to generate unique 
clusterid. This option is useless once namenode -format automatically generates 
the clusterid.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1592) Datanode startup doesn't honor volumes.tolerated

2011-05-13 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033343#comment-13033343
 ] 

Bharath Mundlapudi commented on HDFS-1592:
--

These failing tests are not related to this patch.


 Datanode startup doesn't honor volumes.tolerated 
 -

 Key: HDFS-1592
 URL: https://issues.apache.org/jira/browse/HDFS-1592
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20.204.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.20.204.0, 0.23.0

 Attachments: HDFS-1592-1.patch, HDFS-1592-2.patch, 
 HDFS-1592-rel20.patch


 Datanode startup doesn't honor volumes.tolerated for hadoop 20 version.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HDFS-1940) Datanode can have more than one copy of same block when a failed disk is coming back in datanode

2011-05-13 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi reassigned HDFS-1940:


Assignee: Bharath Mundlapudi

 Datanode can have more than one copy of same block when a failed disk is 
 coming back in datanode
 

 Key: HDFS-1940
 URL: https://issues.apache.org/jira/browse/HDFS-1940
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.20.204.0
Reporter: Rajit
Assignee: Bharath Mundlapudi

 There is a situation where one datanode can have more than one copy of same 
 block due to a disk fails and comes back after sometime in a datanode. And 
 these duplicate blocks are not getting deleted even after datanode and 
 namenode restart.
 This situation can only happen in a corner case , when due to disk failure, 
 the data block is replicated to other disk of the same datanode.
 To simulate this scenario I copied a datablock and the associated .meta file 
 from one disk to another disk of same datanode, so the datanode is having 2 
 copy of same replica. Now I restarted datanode and namenode. Still the extra 
 data block and meta file is not deleted from the datanode
 [hdfs@gsbl90192 rajsaha]$ ls -l `find 
 /grid/{0,1,2,3}/hadoop/var/hdfs/data/current -name blk_*`
 -rw-r--r-- 1 hdfs users 7814 May 13 21:05 
 /grid/1/hadoop/var/hdfs/data/current/blk_1727421609840461376
 -rw-r--r-- 1 hdfs users   71 May 13 21:05 
 /grid/1/hadoop/var/hdfs/data/current/blk_1727421609840461376_579992.meta
 -rw-r--r-- 1 hdfs users 7814 May 13 21:14 
 /grid/3/hadoop/var/hdfs/data/current/blk_1727421609840461376
 -rw-r--r-- 1 hdfs users   71 May 13 21:14 
 /grid/3/hadoop/var/hdfs/data/current/blk_1727421609840461376_579992.meta

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1940) Datanode can have more than one copy of same block when a failed disk is coming back in datanode

2011-05-13 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1940:
-

Description: 
There is a situation where one datanode can have more than one copy of same 
block due to a disk fails and comes back after sometime in a datanode. And 
these duplicate blocks are not getting deleted even after datanode and namenode 
restart.

This situation can only happen in a corner case , when due to disk failure, the 
data block is replicated to other disk of the same datanode.


To simulate this scenario I copied a datablock and the associated .meta file 
from one disk to another disk of same datanode, so the datanode is having 2 
copy of same replica. Now I restarted datanode and namenode. Still the extra 
data block and meta file is not deleted from the datanode

ls -l `find /grid/{0,1,2,3}/hadoop/var/hdfs/data/current -name blk_*`
-rw-r--r-- 1 hdfs users 7814 May 13 21:05 
/grid/1/hadoop/var/hdfs/data/current/blk_1727421609840461376
-rw-r--r-- 1 hdfs users   71 May 13 21:05 
/grid/1/hadoop/var/hdfs/data/current/blk_1727421609840461376_579992.meta
-rw-r--r-- 1 hdfs users 7814 May 13 21:14 
/grid/3/hadoop/var/hdfs/data/current/blk_1727421609840461376
-rw-r--r-- 1 hdfs users   71 May 13 21:14 
/grid/3/hadoop/var/hdfs/data/current/blk_1727421609840461376_579992.meta

  was:
There is a situation where one datanode can have more than one copy of same 
block due to a disk fails and comes back after sometime in a datanode. And 
these duplicate blocks are not getting deleted even after datanode and namenode 
restart.

This situation can only happen in a corner case , when due to disk failure, the 
data block is replicated to other disk of the same datanode.


To simulate this scenario I copied a datablock and the associated .meta file 
from one disk to another disk of same datanode, so the datanode is having 2 
copy of same replica. Now I restarted datanode and namenode. Still the extra 
data block and meta file is not deleted from the datanode

[hdfs@gsbl90192 rajsaha]$ ls -l `find 
/grid/{0,1,2,3}/hadoop/var/hdfs/data/current -name blk_*`
-rw-r--r-- 1 hdfs users 7814 May 13 21:05 
/grid/1/hadoop/var/hdfs/data/current/blk_1727421609840461376
-rw-r--r-- 1 hdfs users   71 May 13 21:05 
/grid/1/hadoop/var/hdfs/data/current/blk_1727421609840461376_579992.meta
-rw-r--r-- 1 hdfs users 7814 May 13 21:14 
/grid/3/hadoop/var/hdfs/data/current/blk_1727421609840461376
-rw-r--r-- 1 hdfs users   71 May 13 21:14 
/grid/3/hadoop/var/hdfs/data/current/blk_1727421609840461376_579992.meta


 Datanode can have more than one copy of same block when a failed disk is 
 coming back in datanode
 

 Key: HDFS-1940
 URL: https://issues.apache.org/jira/browse/HDFS-1940
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.20.204.0
Reporter: Rajit
Assignee: Bharath Mundlapudi

 There is a situation where one datanode can have more than one copy of same 
 block due to a disk fails and comes back after sometime in a datanode. And 
 these duplicate blocks are not getting deleted even after datanode and 
 namenode restart.
 This situation can only happen in a corner case , when due to disk failure, 
 the data block is replicated to other disk of the same datanode.
 To simulate this scenario I copied a datablock and the associated .meta file 
 from one disk to another disk of same datanode, so the datanode is having 2 
 copy of same replica. Now I restarted datanode and namenode. Still the extra 
 data block and meta file is not deleted from the datanode
 ls -l `find /grid/{0,1,2,3}/hadoop/var/hdfs/data/current -name blk_*`
 -rw-r--r-- 1 hdfs users 7814 May 13 21:05 
 /grid/1/hadoop/var/hdfs/data/current/blk_1727421609840461376
 -rw-r--r-- 1 hdfs users   71 May 13 21:05 
 /grid/1/hadoop/var/hdfs/data/current/blk_1727421609840461376_579992.meta
 -rw-r--r-- 1 hdfs users 7814 May 13 21:14 
 /grid/3/hadoop/var/hdfs/data/current/blk_1727421609840461376
 -rw-r--r-- 1 hdfs users   71 May 13 21:14 
 /grid/3/hadoop/var/hdfs/data/current/blk_1727421609840461376_579992.meta

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1836) Thousand of CLOSE_WAIT socket

2011-05-13 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033466#comment-13033466
 ] 

Bharath Mundlapudi commented on HDFS-1836:
--

Thats correct. This code is part of trunk already.

Todd, One minor comment.

1. Can we also pass LOG object to this method? Users who wants to debug can 
enable debug option.
IOUtils.cleanup(LOG, blockStream, blockReplyStream);

Otherwise, patch looks good. Thank you. 

 Thousand of CLOSE_WAIT socket 
 --

 Key: HDFS-1836
 URL: https://issues.apache.org/jira/browse/HDFS-1836
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.20.2
 Environment: Linux 2.6.18-194.32.1.el5 #1 SMP Wed Jan 5 17:52:25 EST 
 2011 x86_64 x86_64 x86_64 GNU/Linux
 java version 1.6.0_23
 Java(TM) SE Runtime Environment (build 1.6.0_23-b05)
 Java HotSpot(TM) 64-Bit Server VM (build 19.0-b09, mixed mode)
Reporter: Dennis Cheung
 Attachments: hdfs-1836-0.20.txt, patch-draft-1836.patch


 $ /usr/sbin/lsof -i TCP:50010 | grep -c CLOSE_WAIT
 4471
 It is better if everything runs normal. 
 However, from time to time there are some DataStreamer Exception: 
 java.net.SocketTimeoutException and DFSClient.processDatanodeError(2507) | 
 Error Recovery for can be found from log file and the number of CLOSE_WAIT 
 socket just keep increasing
 The CLOSE_WAIT handles may remain for hours and days; then Too many open 
 file some day.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1905) Improve the usability of namenode -format

2011-05-12 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032268#comment-13032268
 ] 

Bharath Mundlapudi commented on HDFS-1905:
--

Cluster ID is displayed on the dfshealth web page. If we are having multiple 
clusters then having a proper cluster name defined by admins will be useful. 

If user executes the following command, then correct usage is indeed displayed. 
./hdfs namenode -format -help 

This should be corrected from all the paths.







 Improve the usability of namenode -format 
 --

 Key: HDFS-1905
 URL: https://issues.apache.org/jira/browse/HDFS-1905
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
Priority: Minor
 Fix For: 0.23.0


 While setting up 0.23 version based cluster, i ran into this issue. When i 
 issue a format namenode command, which got changed in 23, it should let the 
 user know to how to use this command in case where complete options were not 
 specified.
 ./hdfs namenode -format
 I get the following error msg, still its not clear what and how user should 
 use this command.
 11/05/09 15:36:25 ERROR namenode.NameNode: 
 java.lang.IllegalArgumentException: Format must be provided with clusterid
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1483)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1623)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1689)
  
 The usability of this command can be improved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1592) Datanode startup doesn't honor volumes.tolerated

2011-05-12 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032701#comment-13032701
 ] 

Bharath Mundlapudi commented on HDFS-1592:
--

Thanks for the review, Jitendra.

1. The conditions are there for better readability. Yes, we can change this 
into one condition.

2. Error is logged where the exception is caught.

 Datanode startup doesn't honor volumes.tolerated 
 -

 Key: HDFS-1592
 URL: https://issues.apache.org/jira/browse/HDFS-1592
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20.204.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.20.204.0, 0.23.0

 Attachments: HDFS-1592-1.patch, HDFS-1592-rel20.patch


 Datanode startup doesn't honor volumes.tolerated for hadoop 20 version.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-1934) Fix NullPointerException when certain File APIs return null

2011-05-12 Thread Bharath Mundlapudi (JIRA)
Fix NullPointerException when certain File APIs return null
---

 Key: HDFS-1934
 URL: https://issues.apache.org/jira/browse/HDFS-1934
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20.205.0, 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.20.205.0, 0.23.0


While testing Disk Fail Inplace, We encountered the NPE from this part of the 
code. 

File[] files = dir.listFiles();
for (File f : files) {
...
}

This is kinda of an API issue. When a disk is bad (or name is not a directory), 
this API (listFiles, list) return null rather than throwing an exception. This 
'for loop' throws a NPE exception. And same applies to dir.list() API.

Fix all the places where null condition was not checked.
 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1592) Datanode startup doesn't honor volumes.tolerated

2011-05-11 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1592:
-

Attachment: HDFS-1592-1.patch

Attaching the patch for 0.23 version.

 Datanode startup doesn't honor volumes.tolerated 
 -

 Key: HDFS-1592
 URL: https://issues.apache.org/jira/browse/HDFS-1592
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20.204.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.20.204.0, 0.23.0

 Attachments: HDFS-1592-1.patch, HDFS-1592-rel20.patch


 Datanode startup doesn't honor volumes.tolerated for hadoop 20 version.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-1905) Improve the usability of namenode -format

2011-05-09 Thread Bharath Mundlapudi (JIRA)
Improve the usability of namenode -format 
--

 Key: HDFS-1905
 URL: https://issues.apache.org/jira/browse/HDFS-1905
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
Priority: Minor
 Fix For: 0.23.0


While setting up 0.23 version based cluster, i ran into this issue. When i 
issue a format namenode command, which got changed in 23, it should let the 
user know to 

./hdfs namenode -format

I get the following error msg, still its not clear what and how user should use 
this command.

11/05/09 15:36:25 ERROR namenode.NameNode: java.lang.IllegalArgumentException: 
Format must be provided with clusterid
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1483)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1623)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1689)
 
The usability of this command can be improved.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-1905) Improve the usability of namenode -format

2011-05-09 Thread Bharath Mundlapudi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharath Mundlapudi updated HDFS-1905:
-

Description: 
While setting up 0.23 version based cluster, i ran into this issue. When i 
issue a format namenode command, which got changed in 23, it should let the 
user know to how to use this command in case where complete options were not 
specified.

./hdfs namenode -format

I get the following error msg, still its not clear what and how user should use 
this command.

11/05/09 15:36:25 ERROR namenode.NameNode: java.lang.IllegalArgumentException: 
Format must be provided with clusterid
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1483)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1623)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1689)
 
The usability of this command can be improved.


  was:
While setting up 0.23 version based cluster, i ran into this issue. When i 
issue a format namenode command, which got changed in 23, it should let the 
user know to 

./hdfs namenode -format

I get the following error msg, still its not clear what and how user should use 
this command.

11/05/09 15:36:25 ERROR namenode.NameNode: java.lang.IllegalArgumentException: 
Format must be provided with clusterid
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1483)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1623)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1689)
 
The usability of this command can be improved.



 Improve the usability of namenode -format 
 --

 Key: HDFS-1905
 URL: https://issues.apache.org/jira/browse/HDFS-1905
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
Priority: Minor
 Fix For: 0.23.0


 While setting up 0.23 version based cluster, i ran into this issue. When i 
 issue a format namenode command, which got changed in 23, it should let the 
 user know to how to use this command in case where complete options were not 
 specified.
 ./hdfs namenode -format
 I get the following error msg, still its not clear what and how user should 
 use this command.
 11/05/09 15:36:25 ERROR namenode.NameNode: 
 java.lang.IllegalArgumentException: Format must be provided with clusterid
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1483)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1623)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1689)
  
 The usability of this command can be improved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-664) Add a way to efficiently replace a disk in a live datanode

2011-05-06 Thread Bharath Mundlapudi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13029786#comment-13029786
 ] 

Bharath Mundlapudi commented on HDFS-664:
-

Is this Jira similar to this:
https://issues.apache.org/jira/browse/HDFS-1362



 Add a way to efficiently replace a disk in a live datanode
 --

 Key: HDFS-664
 URL: https://issues.apache.org/jira/browse/HDFS-664
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: data-node
Affects Versions: 0.22.0
Reporter: Steve Loughran
 Attachments: HDFS-664.0-20-3-rc2.patch.1, HDFS-664.patch


 In clusters where the datanode disks are hot swappable, you need to be able 
 to swap out a disk on a live datanode without taking down the datanode. You 
 don't want to decommission the whole node as that is overkill. on a system 
 with 4 1TB HDDs, giving 3 TB of datanode storage, a decommissioning and 
 restart will consume up to 6 TB of bandwidth. If a single disk were swapped 
 in then there would only be 1TB of data to recover over the network. More 
 importantly, if that data could be moved to free space on the same machine, 
 the recommissioning could take place at disk rates, not network speeds. 
 # Maybe have a way of decommissioning a single disk on the DN; the files 
 could be moved to space on the other disks or the other machines in the rack.
 # There may not be time to use that option, in which case pulling out the 
 disk would be done with no warning, a new disk inserted.
 # The DN needs to see that a disk has been replaced (or react to some ops 
 request telling it this), and start using the new disk again -pushing back 
 data, rebuilding the balance. 
 To complicate the process, assume there is a live TT on the system, running 
 jobs against the data. The TT would probably need to be paused while the work 
 takes place, any ongoing work handled somehow. Halting the TT and then 
 restarting it after the replacement disk went in is probably simplest. 
 The more disks you add to a node, the more this scenario becomes a need.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   >