[jira] [Comment Edited] (HDFS-8946) Improve choosing datanode storage for block placement
[ https://issues.apache.org/jira/browse/HDFS-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14713409#comment-14713409 ] Yi Liu edited comment on HDFS-8946 at 8/27/15 1:12 PM: --- # Add {{chooseStorage4Block}} to choose a good storage of given type from the datanode. It will checks the state of storage, and whether there is enough space to place the block on the datanode. These conditions are the same as original. Since the datanode only cares about the storage type when placing a block, this method will return the first storage of given type we see. Then {{isGoodTarget}} can be removed, since we check all the conditions in {{chooseStorage4Block}}. # No need to shuffle the storages of datanode and no need to iterate all the storages of datanode in {{chooseLocalStorage}} and {{chooseRandom}}, since we can choose the storage of given type using {{chooseStorage4Block}} once. # -{{addToExcludedNodes}} is redundant after finding a good storage, since we already add the data node to excludedNodes at the begging of checking the node.- # -{{numOfAvailableNodes \-= newExcludedNodes;}} in {{chooseRandom}} is redundant, since we already do {{numOfAvailableNodes--}} at the begging of checking the node.- (we can't remove {{addToExcludedNodes}}, since it's override by subclass) # In {{DatanodeDescriptor#chooseStorage4Block}}, {{t == null}} is unnecessary, since it never be null and it's added in HDFS-8863. was (Author: hitliuyi): # Add {{chooseStorage4Block}} to choose a good storage of given type from the datanode. It will checks the state of storage, and whether there is enough space to place the block on the datanode. These conditions are the same as original. Since the datanode only cares about the storage type when placing a block, this method will return the first storage of given type we see. Then {{isGoodTarget}} can be removed, since we check all the conditions in {{chooseStorage4Block}}. # No need to shuffle the storages of datanode and no need to iterate all the storages of datanode in {{chooseLocalStorage}} and {{chooseRandom}}, since we can choose the storage of given type using {{chooseStorage4Block}} once. # {{addToExcludedNodes}} is redundant after finding a good storage, since we already add the data node to excludedNodes at the begging of checking the node. # {{numOfAvailableNodes \-= newExcludedNodes;}} in {{chooseRandom}} is redundant, since we already do {{numOfAvailableNodes--}} at the begging of checking the node. # In {{DatanodeDescriptor#chooseStorage4Block}}, {{t == null}} is unnecessary, since it never be null and it's added in HDFS-8863. Improve choosing datanode storage for block placement - Key: HDFS-8946 URL: https://issues.apache.org/jira/browse/HDFS-8946 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8946.001.patch, HDFS-8946.002.patch This JIRA is to: Improve chooseing datanode storage for block placement: In {{BlockPlacementPolicyDefault}} ({{chooseLocalStorage}}, {{chooseRandom}}), we have following logic to choose datanode storage to place block. For given storage type, we iterate storages of the datanode. But for datanode, it only cares about the storage type. In the loop, we check according to Storage type and return the first storage if the storages of the type on the datanode fit in requirement. So we can remove the iteration of storages, and just need to do once to find a good storage of given type, it's efficient if the storages of the type on the datanode don't fit in requirement since we don't need to loop all storages and do the same check. Besides, no need to shuffle the storages, since we only need to check according to the storage type on the datanode once. This also improves the logic and make it more clear. {code} if (excludedNodes.add(localMachine) // was not in the excluded list isGoodDatanode(localDatanode, maxNodesPerRack, false, results, avoidStaleNodes)) { for (IteratorMap.EntryStorageType, Integer iter = storageTypes .entrySet().iterator(); iter.hasNext(); ) { Map.EntryStorageType, Integer entry = iter.next(); for (DatanodeStorageInfo localStorage : DFSUtil.shuffle( localDatanode.getStorageInfos())) { StorageType type = entry.getKey(); if (addIfIsGoodTarget(localStorage, excludedNodes, blocksize, results, type) = 0) { int num = entry.getValue(); ... {code} (current logic above) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7899) Improve EOF error message
[ https://issues.apache.org/jira/browse/HDFS-7899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716630#comment-14716630 ] Jagadesh Kiran N commented on HDFS-7899: As i observerd all the callers are using for response processing only,hence changing in utility itself, please review and let me know Improve EOF error message - Key: HDFS-7899 URL: https://issues.apache.org/jira/browse/HDFS-7899 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.6.0 Reporter: Harsh J Assignee: Jagadesh Kiran N Priority: Minor Attachments: HDFS-7899-00.patch Currently, a DN disconnection for reasons other than connection timeout or refused messages, such as an EOF message as a result of rejection or other network fault, reports in this manner: {code} WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /x.x.x.x: for block, add to deadNodes and continue. java.io.EOFException: Premature EOF: no length prefix available java.io.EOFException: Premature EOF: no length prefix available at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:171) at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:392) at org.apache.hadoop.hdfs.BlockReaderFactory.newBlockReader(BlockReaderFactory.java:137) at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:1103) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:538) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:750) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:794) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:602) {code} This is not very clear to a user (warn's at the hdfs-client). It could likely be improved with a more diagnosable message, or at least the direct reason than an EOF. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-2390) dfsadmin -setBalancerBandwidth doesnot validate -ve value
[ https://issues.apache.org/jira/browse/HDFS-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716628#comment-14716628 ] Hudson commented on HDFS-2390: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk-Java8 #310 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/310/]) HDFS-2390. dfsadmin -setBalancerBandwidth does not validate -ve value. Contributed by Gautam Gopalakrishnan. (harsh: rev 0bf285413f8fcaadbb2d5817fe8090f5fb0d37d9) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSAdminWithHA.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt dfsadmin -setBalancerBandwidth doesnot validate -ve value - Key: HDFS-2390 URL: https://issues.apache.org/jira/browse/HDFS-2390 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Affects Versions: 2.7.1 Reporter: Rajit Saha Assignee: Gautam Gopalakrishnan Priority: Minor Fix For: 2.8.0 Attachments: HDFS-2390-1.patch, HDFS-2390-2.patch, HDFS-2390-3.patch, HDFS-2390-4.patch $ hadoop dfsadmin -setBalancerBandwidth -1 does not throw any message that it is invalid although in DN log we are not getting DNA_BALANCERBANDWIDTHUPDATE. I think it should throw some message that -ve numbers are not valid , as it does for decimal numbers or non-numbers like - $ hadoop dfsadmin -setBalancerBandwidth 12.34 NumberFormatException: For input string: 12.34 Usage: java DFSAdmin [-setBalancerBandwidth bandwidth in bytes per second] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7899) Improve EOF error message
[ https://issues.apache.org/jira/browse/HDFS-7899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jagadesh Kiran N updated HDFS-7899: --- Attachment: HDFS-7899-00.patch Attached the patch please review Improve EOF error message - Key: HDFS-7899 URL: https://issues.apache.org/jira/browse/HDFS-7899 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.6.0 Reporter: Harsh J Assignee: Jagadesh Kiran N Priority: Minor Attachments: HDFS-7899-00.patch Currently, a DN disconnection for reasons other than connection timeout or refused messages, such as an EOF message as a result of rejection or other network fault, reports in this manner: {code} WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /x.x.x.x: for block, add to deadNodes and continue. java.io.EOFException: Premature EOF: no length prefix available java.io.EOFException: Premature EOF: no length prefix available at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:171) at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:392) at org.apache.hadoop.hdfs.BlockReaderFactory.newBlockReader(BlockReaderFactory.java:137) at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:1103) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:538) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:750) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:794) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:602) {code} This is not very clear to a user (warn's at the hdfs-client). It could likely be improved with a more diagnosable message, or at least the direct reason than an EOF. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8977) NNBench result wrong when number of reducers greater than 1
[ https://issues.apache.org/jira/browse/HDFS-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated HDFS-8977: --- Attachment: 0001-HDFS-8977.patch Attaching patch for initial review NNBench result wrong when number of reducers greater than 1 --- Key: HDFS-8977 URL: https://issues.apache.org/jira/browse/HDFS-8977 Project: Hadoop HDFS Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Priority: Minor Attachments: 0001-HDFS-8977.patch Currently NNBench#analyzeResults consider only the part- for analysis {code} TPS: Create/Write/Close: 0 Avg exec time (ms): Create/Write/Close: Infinity Avg Lat (ms): Create/Write: Infinity Avg Lat (ms): Close: NaN {code} Should consider all part files for output. or disable reduces option -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8946) Improve choosing datanode storage for block placement
[ https://issues.apache.org/jira/browse/HDFS-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-8946: - Description: This JIRA is to: Improve chooseing datanode storage for block placement: In {{BlockPlacementPolicyDefault}} ({{chooseLocalStorage}}, {{chooseRandom}}), we have following logic to choose datanode storage to place block. For given storage type, we iterate storages of the datanode. But for datanode, it only cares about the storage type. In the loop, we check according to Storage type and return the first storage if the storages of the type on the datanode fit in requirement. So we can remove the iteration of storages, and just need to do once to find a good storage of given type, it's efficient if the storages of the type on the datanode don't fit in requirement since we don't need to loop all storages and do the same check. Besides, no need to shuffle the storages, since we only need to check according to the storage type on the datanode once. This also improves the logic and make it more clear. {code} if (excludedNodes.add(localMachine) // was not in the excluded list isGoodDatanode(localDatanode, maxNodesPerRack, false, results, avoidStaleNodes)) { for (IteratorMap.EntryStorageType, Integer iter = storageTypes .entrySet().iterator(); iter.hasNext(); ) { Map.EntryStorageType, Integer entry = iter.next(); for (DatanodeStorageInfo localStorage : DFSUtil.shuffle( localDatanode.getStorageInfos())) { StorageType type = entry.getKey(); if (addIfIsGoodTarget(localStorage, excludedNodes, blocksize, results, type) = 0) { int num = entry.getValue(); ... {code} (current logic above) was: This JIRA is to: *1.* Improve chooseing datanode storage for block placement: In {{BlockPlacementPolicyDefault}} ({{chooseLocalStorage}}, {{chooseRandom}}), we have following logic to choose datanode storage to place block. For given storage type, we iterate storages of the datanode. But for datanode, it only cares about the storage type. In the loop, we check according to Storage type and return the first storage if the storages of the type on the datanode fit in requirement. So we can remove the iteration of storages, and just need to do once to find a good storage of given type, it's efficient if the storages of the type on the datanode don't fit in requirement since we don't need to loop all storages and do the same check. Besides, no need to shuffle the storages, since we only need to check according to the storage type on the datanode once. This also improves the logic and make it more clear. {code} if (excludedNodes.add(localMachine) // was not in the excluded list isGoodDatanode(localDatanode, maxNodesPerRack, false, results, avoidStaleNodes)) { for (IteratorMap.EntryStorageType, Integer iter = storageTypes .entrySet().iterator(); iter.hasNext(); ) { Map.EntryStorageType, Integer entry = iter.next(); for (DatanodeStorageInfo localStorage : DFSUtil.shuffle( localDatanode.getStorageInfos())) { StorageType type = entry.getKey(); if (addIfIsGoodTarget(localStorage, excludedNodes, blocksize, results, type) = 0) { int num = entry.getValue(); ... {code} (current logic above) *2.* Improve the logic and remove some duplicated code for example, In {{chooseLocalStorage}}, {{chooseRandom}}, we add the node to excludeNodes before the {{for}}, and we do it again if we find it's a good target. {{numOfAvailableNodes -= newExcludedNodes}} is duplicated too. Improve choosing datanode storage for block placement - Key: HDFS-8946 URL: https://issues.apache.org/jira/browse/HDFS-8946 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8946.001.patch, HDFS-8946.002.patch This JIRA is to: Improve chooseing datanode storage for block placement: In {{BlockPlacementPolicyDefault}} ({{chooseLocalStorage}}, {{chooseRandom}}), we have following logic to choose datanode storage to place block. For given storage type, we iterate storages of the datanode. But for datanode, it only cares about the storage type. In the loop, we check according to Storage type and return the first storage if the storages of the type on the datanode fit in requirement. So we can remove the iteration of storages, and just need to do once to find a good storage of given type, it's efficient if the storages of the type on the datanode don't fit in requirement since we don't need to loop all storages and do the same check.
[jira] [Assigned] (HDFS-328) fs -setrep should have better error message
[ https://issues.apache.org/jira/browse/HDFS-328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton reassigned HDFS-328: - Assignee: Daniel Templeton (was: Ravi Phulari) No recent activity, reassigning. fs -setrep should have better error message - Key: HDFS-328 URL: https://issues.apache.org/jira/browse/HDFS-328 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Daniel Templeton Labels: newbie When the replication # is larger than dfs.replication.max (defined in conf), fs -setrep shows a meaningless error message. For example, {noformat} //dfs.replication.max is 512 $ hadoop fs -setrep 1000 r.txt setrep: java.io.IOException: file /user/tsz/r.txt. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-2390) dfsadmin -setBalancerBandwidth doesnot validate -ve value
[ https://issues.apache.org/jira/browse/HDFS-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716776#comment-14716776 ] Hudson commented on HDFS-2390: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #302 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/302/]) HDFS-2390. dfsadmin -setBalancerBandwidth does not validate -ve value. Contributed by Gautam Gopalakrishnan. (harsh: rev 0bf285413f8fcaadbb2d5817fe8090f5fb0d37d9) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSAdminWithHA.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java dfsadmin -setBalancerBandwidth doesnot validate -ve value - Key: HDFS-2390 URL: https://issues.apache.org/jira/browse/HDFS-2390 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Affects Versions: 2.7.1 Reporter: Rajit Saha Assignee: Gautam Gopalakrishnan Priority: Minor Fix For: 2.8.0 Attachments: HDFS-2390-1.patch, HDFS-2390-2.patch, HDFS-2390-3.patch, HDFS-2390-4.patch $ hadoop dfsadmin -setBalancerBandwidth -1 does not throw any message that it is invalid although in DN log we are not getting DNA_BALANCERBANDWIDTHUPDATE. I think it should throw some message that -ve numbers are not valid , as it does for decimal numbers or non-numbers like - $ hadoop dfsadmin -setBalancerBandwidth 12.34 NumberFormatException: For input string: 12.34 Usage: java DFSAdmin [-setBalancerBandwidth bandwidth in bytes per second] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4412) Support HDFS IO throttling
[ https://issues.apache.org/jira/browse/HDFS-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716577#comment-14716577 ] Yong Zhang commented on HDFS-4412: -- Why not try IO throttling in a fire mode, like HADOOP-9640? Support HDFS IO throttling -- Key: HDFS-4412 URL: https://issues.apache.org/jira/browse/HDFS-4412 Project: Hadoop HDFS Issue Type: New Feature Reporter: Zhenxiao Luo When an applications upload/download files from/to HDFS clusters, it would be nice if the IO could be throttled so that they won't go beyond the specified maximum bandwidth. Two options to implement this IO throttling: #1. IO Throttling happens at the FSDataInputStream and FSDataOutputStream level. Add an IO Throttler to FSDataInputStream/FSDataOutputStram, and whenever an read/write happens, throttle it first(if throttler is set), then do the actual read/write. We may need to add new FileSystem apis to take an IO throttler as input parameter. #2. IO Throttling happens at the application level. Instead of changing the FSDataInputStream/FSDataOutputStream, all IO throttling is done at the application level. In this approach, FileSystem api remains unchanged. Either case, an IO throttler interface is needed, which has a: public void throttle(long numOfBytes); The current DataTransferThrottler could be an implementation of this IO throttler interface. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4412) Support HDFS IO throttling
[ https://issues.apache.org/jira/browse/HDFS-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716580#comment-14716580 ] Yong Zhang commented on HDFS-4412: -- sorry, fair mode Support HDFS IO throttling -- Key: HDFS-4412 URL: https://issues.apache.org/jira/browse/HDFS-4412 Project: Hadoop HDFS Issue Type: New Feature Reporter: Zhenxiao Luo When an applications upload/download files from/to HDFS clusters, it would be nice if the IO could be throttled so that they won't go beyond the specified maximum bandwidth. Two options to implement this IO throttling: #1. IO Throttling happens at the FSDataInputStream and FSDataOutputStream level. Add an IO Throttler to FSDataInputStream/FSDataOutputStram, and whenever an read/write happens, throttle it first(if throttler is set), then do the actual read/write. We may need to add new FileSystem apis to take an IO throttler as input parameter. #2. IO Throttling happens at the application level. Instead of changing the FSDataInputStream/FSDataOutputStream, all IO throttling is done at the application level. In this approach, FileSystem api remains unchanged. Either case, an IO throttler interface is needed, which has a: public void throttle(long numOfBytes); The current DataTransferThrottler could be an implementation of this IO throttler interface. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4412) Support HDFS IO throttling
[ https://issues.apache.org/jira/browse/HDFS-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716581#comment-14716581 ] Yong Zhang commented on HDFS-4412: -- sorry, fair mode Support HDFS IO throttling -- Key: HDFS-4412 URL: https://issues.apache.org/jira/browse/HDFS-4412 Project: Hadoop HDFS Issue Type: New Feature Reporter: Zhenxiao Luo When an applications upload/download files from/to HDFS clusters, it would be nice if the IO could be throttled so that they won't go beyond the specified maximum bandwidth. Two options to implement this IO throttling: #1. IO Throttling happens at the FSDataInputStream and FSDataOutputStream level. Add an IO Throttler to FSDataInputStream/FSDataOutputStram, and whenever an read/write happens, throttle it first(if throttler is set), then do the actual read/write. We may need to add new FileSystem apis to take an IO throttler as input parameter. #2. IO Throttling happens at the application level. Instead of changing the FSDataInputStream/FSDataOutputStream, all IO throttling is done at the application level. In this approach, FileSystem api remains unchanged. Either case, an IO throttler interface is needed, which has a: public void throttle(long numOfBytes); The current DataTransferThrottler could be an implementation of this IO throttler interface. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-4412) Support HDFS IO throttling
[ https://issues.apache.org/jira/browse/HDFS-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716579#comment-14716579 ] Yong Zhang commented on HDFS-4412: -- sorry, fair mode Support HDFS IO throttling -- Key: HDFS-4412 URL: https://issues.apache.org/jira/browse/HDFS-4412 Project: Hadoop HDFS Issue Type: New Feature Reporter: Zhenxiao Luo When an applications upload/download files from/to HDFS clusters, it would be nice if the IO could be throttled so that they won't go beyond the specified maximum bandwidth. Two options to implement this IO throttling: #1. IO Throttling happens at the FSDataInputStream and FSDataOutputStream level. Add an IO Throttler to FSDataInputStream/FSDataOutputStram, and whenever an read/write happens, throttle it first(if throttler is set), then do the actual read/write. We may need to add new FileSystem apis to take an IO throttler as input parameter. #2. IO Throttling happens at the application level. Instead of changing the FSDataInputStream/FSDataOutputStream, all IO throttling is done at the application level. In this approach, FileSystem api remains unchanged. Either case, an IO throttler interface is needed, which has a: public void throttle(long numOfBytes); The current DataTransferThrottler could be an implementation of this IO throttler interface. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8973) NameNode exit without any exception log
[ https://issues.apache.org/jira/browse/HDFS-8973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716600#comment-14716600 ] He Xiaoqiao commented on HDFS-8973: --- Thanks Kanaka for your comments. The follow is the main GC messages before process exit in .out file. actually GC works well as usual via following info and monitor system. at 19:35:02.508 when 'log4j:ERROR Failed to flush writer' appears no more logs output to .log file but GC info continue print about 5mins until 19:40:10 and namenode process exit. at that time, enough Memory space for JVM working. {code:borderStyle=solid} 2015-08-26T19:34:30.537+0800: [GC [ParNew: 8315771K-63022K(9292032K), 0.1909130 secs] 96423904K-88172502K(133185344K), 0.1910150 secs] [Times: user=3.37 sys=0.01, real=0.19 secs] 2015-08-26T19:34:42.296+0800: [GC [ParNew: 8322670K-71664K(9292032K), 0.2214550 secs] 96432150K-88183374K(133185344K), 0.2215720 secs] [Times: user=3.92 sys=0.01, real=0.22 secs] 2015-08-26T19:34:52.412+0800: [GC [ParNew: 8331312K-82431K(9292032K), 0.2173850 secs] 96443022K-88195492K(133185344K), 0.2174950 secs] [Times: user=3.86 sys=0.00, real=0.22 secs] 2015-08-26T19:35:02.508+0800: [GC [ParNew: 8342079K-101837K(9292032K), 0.1873830 secs] 96455140K-88216487K(133185344K), 0.1874800 secs] [Times: user=3.26 sys=0.02, real=0.18 secs] log4j:ERROR Failed to flush writer, java.io.IOException: 错误的文件描述符 at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:318) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291) at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295) at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141) at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229) at org.apache.log4j.helpers.QuietWriter.flush(QuietWriter.java:59) at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:324) at org.apache.log4j.RollingFileAppender.subAppend(RollingFileAppender.java:276) at org.apache.log4j.WriterAppender.append(WriterAppender.java:162) at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251) at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66) at org.apache.log4j.Category.callAppenders(Category.java:206) at org.apache.log4j.Category.forcedLog(Category.java:391) at org.apache.log4j.Category.log(Category.java:856) at org.apache.commons.logging.impl.Log4JLogger.info(Log4JLogger.java:176) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.logAddStoredBlock(BlockManager.java:2391) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addStoredBlock(BlockManager.java:2312) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processAndHandleReportedBlock(BlockManager.java:2919) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.addBlock(BlockManager.java:2894) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.processIncrementalBlockReport(BlockManager.java:2976) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.processIncrementalBlockReport(FSNamesystem.java:5432) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.blockReceivedAndDeleted(NameNodeRpcServer.java:1061) at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.blockReceivedAndDeleted(DatanodeProtocolServerSideTranslatorPB.java:209) at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:28065) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007) 2015-08-26T19:35:14.959+0800: [GC [ParNew: 8361485K-93419K(9292032K), 0.1904630 secs] 96476135K-88211796K(133185344K), 0.1905540 secs] [Times: user=3.38 sys=0.00, real=0.19 secs] 2015-08-26T19:35:25.424+0800: [GC [ParNew: 8353067K-54117K(9292032K), 0.1892230 secs] 96471444K-88174133K(133185344K), 0.1893260 secs] [Times: user=3.31 sys=0.01, real=0.19 secs] 2015-08-26T19:35:36.512+0800: [GC [ParNew: 8313765K-55946K(9292032K), 0.1901160 secs] 96433781K-88177578K(133185344K), 0.1902050 secs]
[jira] [Updated] (HDFS-8946) Improve choosing datanode storage for block placement
[ https://issues.apache.org/jira/browse/HDFS-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Liu updated HDFS-8946: - Attachment: HDFS-8946.002.patch Improve choosing datanode storage for block placement - Key: HDFS-8946 URL: https://issues.apache.org/jira/browse/HDFS-8946 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8946.001.patch, HDFS-8946.002.patch This JIRA is to: *1.* Improve chooseing datanode storage for block placement: In {{BlockPlacementPolicyDefault}} ({{chooseLocalStorage}}, {{chooseRandom}}), we have following logic to choose datanode storage to place block. For given storage type, we iterate storages of the datanode. But for datanode, it only cares about the storage type. In the loop, we check according to Storage type and return the first storage if the storages of the type on the datanode fit in requirement. So we can remove the iteration of storages, and just need to do once to find a good storage of given type, it's efficient if the storages of the type on the datanode don't fit in requirement since we don't need to loop all storages and do the same check. Besides, no need to shuffle the storages, since we only need to check according to the storage type on the datanode once. This also improves the logic and make it more clear. {code} if (excludedNodes.add(localMachine) // was not in the excluded list isGoodDatanode(localDatanode, maxNodesPerRack, false, results, avoidStaleNodes)) { for (IteratorMap.EntryStorageType, Integer iter = storageTypes .entrySet().iterator(); iter.hasNext(); ) { Map.EntryStorageType, Integer entry = iter.next(); for (DatanodeStorageInfo localStorage : DFSUtil.shuffle( localDatanode.getStorageInfos())) { StorageType type = entry.getKey(); if (addIfIsGoodTarget(localStorage, excludedNodes, blocksize, results, type) = 0) { int num = entry.getValue(); ... {code} (current logic above) *2.* Improve the logic and remove some duplicated code for example, In {{chooseLocalStorage}}, {{chooseRandom}}, we add the node to excludeNodes before the {{for}}, and we do it again if we find it's a good target. {{numOfAvailableNodes -= newExcludedNodes}} is duplicated too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-2390) dfsadmin -setBalancerBandwidth doesnot validate -ve value
[ https://issues.apache.org/jira/browse/HDFS-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716715#comment-14716715 ] Hudson commented on HDFS-2390: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2259 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2259/]) HDFS-2390. dfsadmin -setBalancerBandwidth does not validate -ve value. Contributed by Gautam Gopalakrishnan. (harsh: rev 0bf285413f8fcaadbb2d5817fe8090f5fb0d37d9) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSAdminWithHA.java dfsadmin -setBalancerBandwidth doesnot validate -ve value - Key: HDFS-2390 URL: https://issues.apache.org/jira/browse/HDFS-2390 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Affects Versions: 2.7.1 Reporter: Rajit Saha Assignee: Gautam Gopalakrishnan Priority: Minor Fix For: 2.8.0 Attachments: HDFS-2390-1.patch, HDFS-2390-2.patch, HDFS-2390-3.patch, HDFS-2390-4.patch $ hadoop dfsadmin -setBalancerBandwidth -1 does not throw any message that it is invalid although in DN log we are not getting DNA_BALANCERBANDWIDTHUPDATE. I think it should throw some message that -ve numbers are not valid , as it does for decimal numbers or non-numbers like - $ hadoop dfsadmin -setBalancerBandwidth 12.34 NumberFormatException: For input string: 12.34 Usage: java DFSAdmin [-setBalancerBandwidth bandwidth in bytes per second] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-7899) Improve EOF error message
[ https://issues.apache.org/jira/browse/HDFS-7899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jagadesh Kiran N updated HDFS-7899: --- Status: Patch Available (was: Open) Improve EOF error message - Key: HDFS-7899 URL: https://issues.apache.org/jira/browse/HDFS-7899 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.6.0 Reporter: Harsh J Assignee: Jagadesh Kiran N Priority: Minor Attachments: HDFS-7899-00.patch Currently, a DN disconnection for reasons other than connection timeout or refused messages, such as an EOF message as a result of rejection or other network fault, reports in this manner: {code} WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /x.x.x.x: for block, add to deadNodes and continue. java.io.EOFException: Premature EOF: no length prefix available java.io.EOFException: Premature EOF: no length prefix available at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:171) at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:392) at org.apache.hadoop.hdfs.BlockReaderFactory.newBlockReader(BlockReaderFactory.java:137) at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:1103) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:538) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:750) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:794) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:602) {code} This is not very clear to a user (warn's at the hdfs-client). It could likely be improved with a more diagnosable message, or at least the direct reason than an EOF. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8841) Catch throwable return null
[ https://issues.apache.org/jira/browse/HDFS-8841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jagadesh Kiran N updated HDFS-8841: --- Assignee: (was: Jagadesh Kiran N) Catch throwable return null --- Key: HDFS-8841 URL: https://issues.apache.org/jira/browse/HDFS-8841 Project: Hadoop HDFS Issue Type: Bug Reporter: songwanging Priority: Minor In method map of class: \hadoop-2.7.1-src\hadoop-tools\hadoop-extras\src\main\java\org\apache\hadoop\tools\DistCpV1.java. This method has this code: public void map(LongWritable key, FilePair value, OutputCollectorWritableComparable?, Text out, Reporter reporter) throws IOException { ... } catch (Throwable ex) { // ignore, we are just cleaning up LOG.debug(Ignoring cleanup exception, ex); } } } ... } Throwable is the parent type of Exception and Error, so catching Throwable means catching both Exceptions as well as Errors. An Exception is something you could recover (like IOException), an Error is something more serious and usually you could'nt recover easily (like ClassNotFoundError) so it doesn't make much sense to catch an Error. We should convert to catch Exception instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8946) Improve choosing datanode storage for block placement
[ https://issues.apache.org/jira/browse/HDFS-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716632#comment-14716632 ] Yi Liu commented on HDFS-8946: -- Update the patch to fix test failure, the reason is: we can't remove addToExcludedNodes, since it's override by subclass. Improve choosing datanode storage for block placement - Key: HDFS-8946 URL: https://issues.apache.org/jira/browse/HDFS-8946 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8946.001.patch, HDFS-8946.002.patch This JIRA is to: Improve chooseing datanode storage for block placement: In {{BlockPlacementPolicyDefault}} ({{chooseLocalStorage}}, {{chooseRandom}}), we have following logic to choose datanode storage to place block. For given storage type, we iterate storages of the datanode. But for datanode, it only cares about the storage type. In the loop, we check according to Storage type and return the first storage if the storages of the type on the datanode fit in requirement. So we can remove the iteration of storages, and just need to do once to find a good storage of given type, it's efficient if the storages of the type on the datanode don't fit in requirement since we don't need to loop all storages and do the same check. Besides, no need to shuffle the storages, since we only need to check according to the storage type on the datanode once. This also improves the logic and make it more clear. {code} if (excludedNodes.add(localMachine) // was not in the excluded list isGoodDatanode(localDatanode, maxNodesPerRack, false, results, avoidStaleNodes)) { for (IteratorMap.EntryStorageType, Integer iter = storageTypes .entrySet().iterator(); iter.hasNext(); ) { Map.EntryStorageType, Integer entry = iter.next(); for (DatanodeStorageInfo localStorage : DFSUtil.shuffle( localDatanode.getStorageInfos())) { StorageType type = entry.getKey(); if (addIfIsGoodTarget(localStorage, excludedNodes, blocksize, results, type) = 0) { int num = entry.getValue(); ... {code} (current logic above) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HDFS-8946) Improve choosing datanode storage for block placement
[ https://issues.apache.org/jira/browse/HDFS-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14713409#comment-14713409 ] Yi Liu edited comment on HDFS-8946 at 8/27/15 1:12 PM: --- # Add {{chooseStorage4Block}} to choose a good storage of given type from the datanode. It will checks the state of storage, and whether there is enough space to place the block on the datanode. These conditions are the same as original. Since the datanode only cares about the storage type when placing a block, this method will return the first storage of given type we see. Then {{isGoodTarget}} can be removed, since we check all the conditions in {{chooseStorage4Block}}. # No need to shuffle the storages of datanode and no need to iterate all the storages of datanode in {{chooseLocalStorage}} and {{chooseRandom}}, since we can choose the storage of given type using {{chooseStorage4Block}} once. # -{{addToExcludedNodes}} is redundant after finding a good storage, since we already add the data node to excludedNodes at the begging of checking the node.- # -{{numOfAvailableNodes \-= newExcludedNodes;}} in {{chooseRandom}} is redundant, since we already do {{numOfAvailableNodes\-\-}} at the begging of checking the node.- (we can't remove {{addToExcludedNodes}}, since it's override by subclass) # In {{DatanodeDescriptor#chooseStorage4Block}}, {{t == null}} is unnecessary, since it never be null and it's added in HDFS-8863. was (Author: hitliuyi): # Add {{chooseStorage4Block}} to choose a good storage of given type from the datanode. It will checks the state of storage, and whether there is enough space to place the block on the datanode. These conditions are the same as original. Since the datanode only cares about the storage type when placing a block, this method will return the first storage of given type we see. Then {{isGoodTarget}} can be removed, since we check all the conditions in {{chooseStorage4Block}}. # No need to shuffle the storages of datanode and no need to iterate all the storages of datanode in {{chooseLocalStorage}} and {{chooseRandom}}, since we can choose the storage of given type using {{chooseStorage4Block}} once. # -{{addToExcludedNodes}} is redundant after finding a good storage, since we already add the data node to excludedNodes at the begging of checking the node.- # -{{numOfAvailableNodes \-= newExcludedNodes;}} in {{chooseRandom}} is redundant, since we already do {{numOfAvailableNodes--}} at the begging of checking the node.- (we can't remove {{addToExcludedNodes}}, since it's override by subclass) # In {{DatanodeDescriptor#chooseStorage4Block}}, {{t == null}} is unnecessary, since it never be null and it's added in HDFS-8863. Improve choosing datanode storage for block placement - Key: HDFS-8946 URL: https://issues.apache.org/jira/browse/HDFS-8946 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8946.001.patch, HDFS-8946.002.patch This JIRA is to: Improve chooseing datanode storage for block placement: In {{BlockPlacementPolicyDefault}} ({{chooseLocalStorage}}, {{chooseRandom}}), we have following logic to choose datanode storage to place block. For given storage type, we iterate storages of the datanode. But for datanode, it only cares about the storage type. In the loop, we check according to Storage type and return the first storage if the storages of the type on the datanode fit in requirement. So we can remove the iteration of storages, and just need to do once to find a good storage of given type, it's efficient if the storages of the type on the datanode don't fit in requirement since we don't need to loop all storages and do the same check. Besides, no need to shuffle the storages, since we only need to check according to the storage type on the datanode once. This also improves the logic and make it more clear. {code} if (excludedNodes.add(localMachine) // was not in the excluded list isGoodDatanode(localDatanode, maxNodesPerRack, false, results, avoidStaleNodes)) { for (IteratorMap.EntryStorageType, Integer iter = storageTypes .entrySet().iterator(); iter.hasNext(); ) { Map.EntryStorageType, Integer entry = iter.next(); for (DatanodeStorageInfo localStorage : DFSUtil.shuffle( localDatanode.getStorageInfos())) { StorageType type = entry.getKey(); if (addIfIsGoodTarget(localStorage, excludedNodes, blocksize, results, type) = 0) { int num = entry.getValue(); ... {code} (current logic above) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8961) Investigate lock issue in o.a.h.hdfs.shortcircuit.DfsClientShmManager.EndpointShmManager
[ https://issues.apache.org/jira/browse/HDFS-8961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717038#comment-14717038 ] Hudson commented on HDFS-8961: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #316 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/316/]) HDFS-8961. Investigate lock issue in o.a.h.hdfs.shortcircuit.DfsClientShmManager.EndpointShmManager. Contributed by Mingliang Liu. (wheat9: rev 1e5f69e85c035f9507e8b788df0b3ce20290a770) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml * hadoop-hdfs-project/hadoop-hdfs-client/dev-support/findbugsExcludeFile.xml Investigate lock issue in o.a.h.hdfs.shortcircuit.DfsClientShmManager.EndpointShmManager Key: HDFS-8961 URL: https://issues.apache.org/jira/browse/HDFS-8961 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Mingliang Liu Fix For: 2.8.0 Attachments: HDFS-8961.000.patch There are two clauses in {{hadoop-hdfs}} to filter out the findbugs warnings in {{org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager}}: {code} Match Class name=org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager / Method name=allocSlot / Bug pattern=UL_UNRELEASED_LOCK_EXCEPTION_PATH / /Match Match Class name=org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager / Method name=allocSlot / Bug pattern=UL_UNRELEASED_LOCK / /Match {code} These two warnings show up in the Jenkins run as these classes are moved into the {{hadoop-hdfs-client}} module. We either need to fix the code or move these clauses to the {{hadoop-hdfs-client}} module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8962) Clean up checkstyle warnings in o.a.h.hdfs.DfsClientConf
[ https://issues.apache.org/jira/browse/HDFS-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717039#comment-14717039 ] Hudson commented on HDFS-8962: -- FAILURE: Integrated in Hadoop-Yarn-trunk-Java8 #316 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk-Java8/316/]) HDFS-8962. Clean up checkstyle warnings in o.a.h.hdfs.DfsClientConf. Contributed by Mingliang Liu. (wheat9: rev 7e971b7315fa2942b4db7ba11ed513766957b777) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderFactory.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestCachingStrategy.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/shortcircuit/TestShortCircuitLocalRead.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/shortcircuit/TestShortCircuitCache.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/HdfsClientConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/impl/DfsClientConf.java Clean up checkstyle warnings in o.a.h.hdfs.DfsClientConf Key: HDFS-8962 URL: https://issues.apache.org/jira/browse/HDFS-8962 Project: Hadoop HDFS Issue Type: Sub-task Components: build Reporter: Mingliang Liu Assignee: Mingliang Liu Fix For: 2.8.0 Attachments: HDFS-8962.000.patch, HDFS-8962.001.patch, HDFS-8962.002.patch This is a follow up of HDFS-8803. HDFS-8803 exposes multiple checkstyles and whitespace warnings in the Jenkins run. These warnings should be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8970) Clean up checkstyle warnings in shortcircuit package
[ https://issues.apache.org/jira/browse/HDFS-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-8970: Description: We moved the {{shortcircuit}} package from {{hadoop-hdfs}} to {{hadoop-hdfs-client}} module in JIRA [HDFS-8934|https://issues.apache.org/jira/browse/HDFS-8934] and [HDFS-8951|https://issues.apache.org/jira/browse/HDFS-8951]. There are some checkstyle issues which are not fixed in those commits since they only tracked the effort of moving. This jira tracks the effort of fixing the checkstyle warnings. was:We moved the {{ShortCircuitShm}} class into the {{hdfs-client}} module in HDFS-8934. There are some checkstyle issues which are not fixed in that jira since it only tracked the effort of moving. We need to fix the checkstyle issues in {{ShortCircuitShm}} related classes. Clean up checkstyle warnings in shortcircuit package Key: HDFS-8970 URL: https://issues.apache.org/jira/browse/HDFS-8970 Project: Hadoop HDFS Issue Type: Sub-task Components: build Reporter: Mingliang Liu Assignee: Mingliang Liu We moved the {{shortcircuit}} package from {{hadoop-hdfs}} to {{hadoop-hdfs-client}} module in JIRA [HDFS-8934|https://issues.apache.org/jira/browse/HDFS-8934] and [HDFS-8951|https://issues.apache.org/jira/browse/HDFS-8951]. There are some checkstyle issues which are not fixed in those commits since they only tracked the effort of moving. This jira tracks the effort of fixing the checkstyle warnings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8979) Clean up checkstyle warnings in o.a.h.hdfs.client package
[ https://issues.apache.org/jira/browse/HDFS-8979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717144#comment-14717144 ] Mingliang Liu commented on HDFS-8979: - We can fix this after moving {{BlockReader}} to client module. Clean up checkstyle warnings in o.a.h.hdfs.client package - Key: HDFS-8979 URL: https://issues.apache.org/jira/browse/HDFS-8979 Project: Hadoop HDFS Issue Type: Task Reporter: Mingliang Liu Assignee: Mingliang Liu This jira tracks the effort of cleaning up checkstyle warnings in {{org.apache.hadoop.hdfs.client}} package. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8938) Refactor BlockManager in blockmanagement
[ https://issues.apache.org/jira/browse/HDFS-8938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-8938: Attachment: HDFS-8938.006.patch Refactor BlockManager in blockmanagement Key: HDFS-8938 URL: https://issues.apache.org/jira/browse/HDFS-8938 Project: Hadoop HDFS Issue Type: Task Components: build Reporter: Mingliang Liu Assignee: Mingliang Liu Attachments: HDFS-8938.000.patch, HDFS-8938.001.patch, HDFS-8938.002.patch, HDFS-8938.003.patch, HDFS-8938.004.patch, HDFS-8938.005.patch, HDFS-8938.006.patch This jira tracks the effort of refactoring inner classes {{BlockManager$BlockToMarkCorrupt}} and {{BlockManager$ReplicationWork}} in {{hdfs.server.blockmanagement}} package. As the line number of {{BlockManager}} is getting larger than 2000, we can move those two inner classes out of the it. Meanwhile, the logic in method {{computeReplicationWorkForBlocks}} can be simplified if we extract code sections to _schedule replication_ and to _validate replication work_ to private helper methods respectively. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8978) Erasure coding: fix 2 failed tests of DFSStripedOutputStream
[ https://issues.apache.org/jira/browse/HDFS-8978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8978: Status: Open (was: Patch Available) Erasure coding: fix 2 failed tests of DFSStripedOutputStream Key: HDFS-8978 URL: https://issues.apache.org/jira/browse/HDFS-8978 Project: Hadoop HDFS Issue Type: Sub-task Components: test Reporter: Walter Su Assignee: Walter Su Priority: Minor Attachments: HDFS-8978-HDFS-7285.01.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8962) Fix checkstyle and whitespace issues in HDFS-8803
[ https://issues.apache.org/jira/browse/HDFS-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716914#comment-14716914 ] Haohui Mai commented on HDFS-8962: -- +1. I'll commit it shortly. Fix checkstyle and whitespace issues in HDFS-8803 - Key: HDFS-8962 URL: https://issues.apache.org/jira/browse/HDFS-8962 Project: Hadoop HDFS Issue Type: Sub-task Components: build Reporter: Mingliang Liu Assignee: Mingliang Liu Attachments: HDFS-8962.000.patch, HDFS-8962.001.patch, HDFS-8962.002.patch This jira tracks the effort of fixing whitespace and checkstyle issues after moving {{DfsClientConf}} to the hdfs-client module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8925) Move BlockReader to hdfs-client
[ https://issues.apache.org/jira/browse/HDFS-8925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716972#comment-14716972 ] Hadoop QA commented on HDFS-8925: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752584/HDFS-8925.000.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 1e5f69e | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12167/console | This message was automatically generated. Move BlockReader to hdfs-client --- Key: HDFS-8925 URL: https://issues.apache.org/jira/browse/HDFS-8925 Project: Hadoop HDFS Issue Type: Sub-task Components: build Reporter: Mingliang Liu Assignee: Mingliang Liu Attachments: HDFS-8925.000.patch This jira tracks the effort of moving the {{BlockReader}} class into the hdfs-client module. We also move {{BlockReaderLocal}} class which implements the {{BlockReader}} interface to {{hdfs-client}} module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8967) Create a BlockManagerLock class to represent the lock used in the BlockManager
[ https://issues.apache.org/jira/browse/HDFS-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717070#comment-14717070 ] Kihwal Lee commented on HDFS-8967: -- [~daryn] has a similar implementation being tested. I will tell him to review it. Create a BlockManagerLock class to represent the lock used in the BlockManager -- Key: HDFS-8967 URL: https://issues.apache.org/jira/browse/HDFS-8967 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-8967.000.patch This jira proposes to create a {{BlockManagerLock}} class to represent the lock used in {{BlockManager}}. Currently it directly points to the {{FSNamesystem}} lock thus there are no functionality changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HDFS-8970) Clean up checkstyle warnings in shortcircuit package
[ https://issues.apache.org/jira/browse/HDFS-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai resolved HDFS-8970. -- Resolution: Duplicate The granularity of cleaning things up might be too small -- let's combine the effort with HDFS-8979. Clean up checkstyle warnings in shortcircuit package Key: HDFS-8970 URL: https://issues.apache.org/jira/browse/HDFS-8970 Project: Hadoop HDFS Issue Type: Sub-task Components: build Reporter: Mingliang Liu Assignee: Mingliang Liu We moved the {{shortcircuit}} package from {{hadoop-hdfs}} to {{hadoop-hdfs-client}} module in JIRA [HDFS-8934|https://issues.apache.org/jira/browse/HDFS-8934] and [HDFS-8951|https://issues.apache.org/jira/browse/HDFS-8951]. There are some checkstyle issues which are not fixed in those commits since they only tracked the effort of moving. This jira tracks the effort of fixing the checkstyle warnings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7285) Erasure Coding Support inside HDFS
[ https://issues.apache.org/jira/browse/HDFS-7285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717085#comment-14717085 ] Zhe Zhang commented on HDFS-7285: - [~drankye] Thanks for raising the question. Since we have reached the conclusion to allow git merge in the workflow, I have merged trunk changes into the main HDFS-7285 branch. Going forward let's just use {{HDFS-7285}} branch for all commits. I have actually deleted the intermediate {{HDFS-7285-merge}} branch. [~vinayrpet] Do you think we should delete the {{HDFS-7285-REBASE}} branch too, given that the result has been compared against and incorporated into the current {{HDFS-7285}} branch? Erasure Coding Support inside HDFS -- Key: HDFS-7285 URL: https://issues.apache.org/jira/browse/HDFS-7285 Project: Hadoop HDFS Issue Type: New Feature Reporter: Weihua Jiang Assignee: Zhe Zhang Attachments: Compare-consolidated-20150824.diff, Consolidated-20150707.patch, Consolidated-20150806.patch, Consolidated-20150810.patch, ECAnalyzer.py, ECParser.py, HDFS-7285-initial-PoC.patch, HDFS-7285-merge-consolidated-01.patch, HDFS-7285-merge-consolidated-trunk-01.patch, HDFS-7285-merge-consolidated.trunk.03.patch, HDFS-7285-merge-consolidated.trunk.04.patch, HDFS-EC-Merge-PoC-20150624.patch, HDFS-EC-merge-consolidated-01.patch, HDFS-bistriped.patch, HDFSErasureCodingDesign-20141028.pdf, HDFSErasureCodingDesign-20141217.pdf, HDFSErasureCodingDesign-20150204.pdf, HDFSErasureCodingDesign-20150206.pdf, HDFSErasureCodingPhaseITestPlan.pdf, HDFSErasureCodingSystemTestPlan-20150824.pdf, fsimage-analysis-20150105.pdf Erasure Coding (EC) can greatly reduce the storage overhead without sacrifice of data reliability, comparing to the existing HDFS 3-replica approach. For example, if we use a 10+4 Reed Solomon coding, we can allow loss of 4 blocks, with storage overhead only being 40%. This makes EC a quite attractive alternative for big data storage, particularly for cold data. Facebook had a related open source project called HDFS-RAID. It used to be one of the contribute packages in HDFS but had been removed since Hadoop 2.0 for maintain reason. The drawbacks are: 1) it is on top of HDFS and depends on MapReduce to do encoding and decoding tasks; 2) it can only be used for cold files that are intended not to be appended anymore; 3) the pure Java EC coding implementation is extremely slow in practical use. Due to these, it might not be a good idea to just bring HDFS-RAID back. We (Intel and Cloudera) are working on a design to build EC into HDFS that gets rid of any external dependencies, makes it self-contained and independently maintained. This design lays the EC feature on the storage type support and considers compatible with existing HDFS features like caching, snapshot, encryption, high availability and etc. This design will also support different EC coding schemes, implementations and policies for different deployment scenarios. By utilizing advanced libraries (e.g. Intel ISA-L library), an implementation can greatly improve the performance of EC encoding/decoding and makes the EC solution even more attractive. We will post the design document soon. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8970) Clean up checkstyle warnings in HDFS-8934
[ https://issues.apache.org/jira/browse/HDFS-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-8970: Summary: Clean up checkstyle warnings in HDFS-8934 (was: Clean up checkstyle issues in HDFS-8934) Clean up checkstyle warnings in HDFS-8934 - Key: HDFS-8970 URL: https://issues.apache.org/jira/browse/HDFS-8970 Project: Hadoop HDFS Issue Type: Sub-task Components: build Reporter: Mingliang Liu Assignee: Mingliang Liu We moved the {{ShortCircuitShm}} class into the {{hdfs-client}} module in HDFS-8934. There are some checkstyle issues which are not fixed in that jira since it only tracked the effort of moving. We need to fix the checkstyle issues in {{ShortCircuitShm}} related classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8960) DFS client says no more good datanodes being available to try on a single drive failure
[ https://issues.apache.org/jira/browse/HDFS-8960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717137#comment-14717137 ] Yongjun Zhang commented on HDFS-8960: - Thanks, would you please provide the DN log on r12s8 too? DFS client says no more good datanodes being available to try on a single drive failure - Key: HDFS-8960 URL: https://issues.apache.org/jira/browse/HDFS-8960 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.7.1 Environment: openjdk version 1.8.0_45-internal OpenJDK Runtime Environment (build 1.8.0_45-internal-b14) OpenJDK 64-Bit Server VM (build 25.45-b02, mixed mode) Reporter: Benoit Sigoure Attachments: blk_1073817519_77099.log, r12s13-datanode.log, r12s16-datanode.log Since we upgraded to 2.7.1 we regularly see single-drive failures cause widespread problems at the HBase level (with the default 3x replication target). Here's an example. This HBase RegionServer is r12s16 (172.24.32.16) and is writing its WAL to [172.24.32.16:10110, 172.24.32.8:10110, 172.24.32.13:10110] as can be seen by the following occasional messages: {code} 2015-08-23 06:28:40,272 INFO [sync.3] wal.FSHLog: Slow sync cost: 123 ms, current pipeline: [172.24.32.16:10110, 172.24.32.8:10110, 172.24.32.13:10110] {code} A bit later, the second node in the pipeline above is going to experience an HDD failure. {code} 2015-08-23 07:21:58,720 WARN [DataStreamer for file /hbase/WALs/r12s16.sjc.aristanetworks.com,9104,1439917659071/r12s16.sjc.aristanetworks.com%2C9104%2C1439917659071.default.1440314434998 block BP-1466258523-172.24.32.1-1437768622582:blk_1073817519_77099] hdfs.DFSClient: Error Recovery for block BP-1466258523-172.24.32.1-1437768622582:blk_1073817519_77099 in pipeline 172.24.32.16:10110, 172.24.32.13:10110, 172.24.32.8:10110: bad datanode 172.24.32.8:10110 {code} And then HBase will go like omg I can't write to my WAL, let me commit suicide. {code} 2015-08-23 07:22:26,060 FATAL [regionserver/r12s16.sjc.aristanetworks.com/172.24.32.16:9104.append-pool1-t1] wal.FSHLog: Could not append. Requesting close of wal java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[172.24.32.16:10110, 172.24.32.13:10110], original=[172.24.32.16:10110, 172.24.32.13:10110]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:969) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1035) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1184) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:933) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:487) {code} Whereas this should be mostly a non-event as the DFS client should just drop the bad replica from the write pipeline. This is a small cluster but has 16 DNs so the failed DN in the pipeline should be easily replaced. I didn't set {{dfs.client.block.write.replace-datanode-on-failure.policy}} (so it's still {{DEFAULT}}) and didn't set {{dfs.client.block.write.replace-datanode-on-failure.enable}} (so it's still {{true}}). I don't see anything noteworthy in the NN log around the time of the failure, it just seems like the DFS client gave up or threw an exception back to HBase that it wasn't throwing before or something else, and that made this single drive failure lethal. We've occasionally be unlucky enough to have a single-drive failure cause multiple RegionServers to commit suicide because they had their WALs on that drive. We upgraded from 2.7.0 about a month ago, and I'm not sure whether we were seeing this with 2.7 or not – prior to that we were running in a quite different environment, but this is a fairly new deployment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8979) Clean up checkstyle warnings in o.a.h.hdfs.client package
Mingliang Liu created HDFS-8979: --- Summary: Clean up checkstyle warnings in o.a.h.hdfs.client package Key: HDFS-8979 URL: https://issues.apache.org/jira/browse/HDFS-8979 Project: Hadoop HDFS Issue Type: Task Reporter: Mingliang Liu Assignee: Mingliang Liu This jira tracks the effort of cleaning up checkstyle warnings in {{org.apache.hadoop.hdfs.client}} package. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8833) Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones
[ https://issues.apache.org/jira/browse/HDFS-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717142#comment-14717142 ] Zhe Zhang commented on HDFS-8833: - Thanks for the comment Kai. Great to know that the branch is working reliably. Although the size of the patch is large, most of it is refactoring (changing CLI, RPC etc.). There are only a few non-trivial changes as [summarized | https://issues.apache.org/jira/browse/HDFS-8833?focusedCommentId=14700453page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14700453] above. I think it's worthwhile to finalize the *semantics* of using EC while it's still in a feature branch. This change also affects the fsimage format (whether ECPolicy ID is stored per file), which we should also finalize before merging to trunk. Erasure coding: store EC schema and cell size in INodeFile and eliminate notion of EC zones --- Key: HDFS-8833 URL: https://issues.apache.org/jira/browse/HDFS-8833 Project: Hadoop HDFS Issue Type: Sub-task Components: namenode Affects Versions: HDFS-7285 Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8833-HDFS-7285-merge.00.patch, HDFS-8833-HDFS-7285-merge.01.patch, HDFS-8833-HDFS-7285.02.patch We have [discussed | https://issues.apache.org/jira/browse/HDFS-7285?focusedCommentId=14357754page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14357754] storing EC schema with files instead of EC zones and recently revisited the discussion under HDFS-8059. As a recap, the _zone_ concept has severe limitations including renaming and nested configuration. Those limitations are valid in encryption for security reasons and it doesn't make sense to carry them over in EC. This JIRA aims to store EC schema and cell size on {{INodeFile}} level. For simplicity, we should first implement it as an xattr and consider memory optimizations (such as moving it to file header) as a follow-on. We should also disable changing EC policy on a non-empty file / dir in the first phase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8967) Create a BlockManagerLock class to represent the lock used in the BlockManager
[ https://issues.apache.org/jira/browse/HDFS-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-8967: - Attachment: HDFS-8967.001.patch Create a BlockManagerLock class to represent the lock used in the BlockManager -- Key: HDFS-8967 URL: https://issues.apache.org/jira/browse/HDFS-8967 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-8967.000.patch, HDFS-8967.001.patch This jira proposes to create a {{BlockManagerLock}} class to represent the lock used in {{BlockManager}}. Currently it directly points to the {{FSNamesystem}} lock thus there are no functionality changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8970) Clean up checkstyle warnings in shortcircuit package
[ https://issues.apache.org/jira/browse/HDFS-8970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-8970: Summary: Clean up checkstyle warnings in shortcircuit package (was: Clean up checkstyle warnings in HDFS-8934) Clean up checkstyle warnings in shortcircuit package Key: HDFS-8970 URL: https://issues.apache.org/jira/browse/HDFS-8970 Project: Hadoop HDFS Issue Type: Sub-task Components: build Reporter: Mingliang Liu Assignee: Mingliang Liu We moved the {{ShortCircuitShm}} class into the {{hdfs-client}} module in HDFS-8934. There are some checkstyle issues which are not fixed in that jira since it only tracked the effort of moving. We need to fix the checkstyle issues in {{ShortCircuitShm}} related classes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8950) NameNode refresh doesn't remove DataNodes
[ https://issues.apache.org/jira/browse/HDFS-8950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717154#comment-14717154 ] Hadoop QA commented on HDFS-8950: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 16m 9s | Findbugs (version ) appears to be broken on trunk. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 2 new or modified test files. | | {color:green}+1{color} | javac | 8m 15s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 1s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 32s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 40s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 2m 33s | The patch appears to introduce 4 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 15s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 73m 20s | Tests failed in hadoop-hdfs. | | | | 116m 46s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | Failed unit tests | hadoop.hdfs.server.blockmanagement.TestNodeCount | | | hadoop.hdfs.TestBalancerBandwidth | | | hadoop.hdfs.server.blockmanagement.TestHostFileManager | | Timed out tests | org.apache.hadoop.hdfs.TestDFSClientRetries | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752784/HDFS-8950.004.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 0bf2854 | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/12166/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12166/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12166/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12166/console | This message was automatically generated. NameNode refresh doesn't remove DataNodes - Key: HDFS-8950 URL: https://issues.apache.org/jira/browse/HDFS-8950 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 2.6.0 Reporter: Daniel Templeton Assignee: Daniel Templeton Fix For: 2.8.0 Attachments: HDFS-8950.001.patch, HDFS-8950.002.patch, HDFS-8950.003.patch, HDFS-8950.004.patch If you remove a DN from NN's allowed host list (HDFS was HA) and then do NN refresh, it doesn't remove it actually and the NN UI keeps showing that node. It may try to allocate some blocks to that DN as well during an MR job. This issue is independent from DN decommission. To reproduce: 1. Add a DN to dfs_hosts_allow 2. Refresh NN 3. Start DN. Now NN starts seeing DN. 4. Stop DN 5. Remove DN from dfs_hosts_allow 6. Refresh NN - NN is still reporting DN as being used by HDFS. This is different from decom because there DN is added to exclude list in addition to being removed from allowed list, and in that case everything works correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8978) Erasure coding: fix 2 failed tests of DFSStripedOutputStream
Walter Su created HDFS-8978: --- Summary: Erasure coding: fix 2 failed tests of DFSStripedOutputStream Key: HDFS-8978 URL: https://issues.apache.org/jira/browse/HDFS-8978 Project: Hadoop HDFS Issue Type: Sub-task Components: test Reporter: Walter Su Assignee: Walter Su Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8969) Inconsistent synchronization of FSEditLog.editLogStream;
[ https://issues.apache.org/jira/browse/HDFS-8969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-8969: --- Status: Patch Available (was: Open) Inconsistent synchronization of FSEditLog.editLogStream; Key: HDFS-8969 URL: https://issues.apache.org/jira/browse/HDFS-8969 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.8.0 Reporter: Anu Engineer Assignee: Anu Engineer Attachments: HDFS-8969.001.patch Fix Findbug warnings -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8961) Investigate lock issue in o.a.h.hdfs.shortcircuit.DfsClientShmManager.EndpointShmManager
[ https://issues.apache.org/jira/browse/HDFS-8961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716980#comment-14716980 ] Hudson commented on HDFS-8961: -- FAILURE: Integrated in Hadoop-trunk-Commit #8356 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8356/]) HDFS-8961. Investigate lock issue in o.a.h.hdfs.shortcircuit.DfsClientShmManager.EndpointShmManager. Contributed by Mingliang Liu. (wheat9: rev 1e5f69e85c035f9507e8b788df0b3ce20290a770) * hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-client/dev-support/findbugsExcludeFile.xml Investigate lock issue in o.a.h.hdfs.shortcircuit.DfsClientShmManager.EndpointShmManager Key: HDFS-8961 URL: https://issues.apache.org/jira/browse/HDFS-8961 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Mingliang Liu Fix For: 2.8.0 Attachments: HDFS-8961.000.patch There are two clauses in {{hadoop-hdfs}} to filter out the findbugs warnings in {{org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager}}: {code} Match Class name=org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager / Method name=allocSlot / Bug pattern=UL_UNRELEASED_LOCK_EXCEPTION_PATH / /Match Match Class name=org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager / Method name=allocSlot / Bug pattern=UL_UNRELEASED_LOCK / /Match {code} These two warnings show up in the Jenkins run as these classes are moved into the {{hadoop-hdfs-client}} module. We either need to fix the code or move these clauses to the {{hadoop-hdfs-client}} module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8978) Erasure coding: fix 2 failed tests of DFSStripedOutputStream
[ https://issues.apache.org/jira/browse/HDFS-8978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8978: Attachment: HDFS-8978-HDFS-7285.01.patch 1. fix {{TestDFSStripedOutputStreamWithFailure}}. Right now StripedOutputStream doesn't handle remote exception. So I just throw it out. 2. fix a race condition. After file closed, a race between block reader and IBR of some parity blocks. The test case verify parity block before parity block was reported. So it failed. Add {{StripedFileTestUtil.waitBlockGroupsReported(..)}}, similar to {{DFSTestUtil.waitReplication(..)}} {noformat} 2015-08-27 17:01:14,158 [Thread-1] INFO hdfs.TestDFSStripedOutputStreamWithFailure (TestDFSStripedOutputStreamWithFailure.java:checkData(444)) - i,j=0, 0, numCellInBlock=1, blockSize=65536, lb=LocatedBlock{BP-1919703598-9.96.1.34-1440666072467:blk_-9223372036854775792_1001; getBlockSize()=65536; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[127.0.0.1:45887,DS-0c4ce2aa-40a8-4ed8-9941-489205a59299,DISK]]} 2015-08-27 17:01:14,161 [Thread-1] INFO hdfs.TestDFSStripedOutputStreamWithFailure (TestDFSStripedOutputStreamWithFailure.java:checkData(444)) - i,j=1, 1, numCellInBlock=1, blockSize=65536, lb=LocatedBlock{BP-1919703598-9.96.1.34-1440666072467:blk_-9223372036854775791_1001; getBlockSize()=65536; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[127.0.0.1:39496,DS-e4002b94-cc0a-4020-bb6a-d995a45e31fa,DISK]]} 2015-08-27 17:01:14,164 [Thread-1] INFO hdfs.TestDFSStripedOutputStreamWithFailure (TestDFSStripedOutputStreamWithFailure.java:checkData(444)) - i,j=2, 2, numCellInBlock=1, blockSize=65535, lb=LocatedBlock{BP-1919703598-9.96.1.34-1440666072467:blk_-9223372036854775790_1001; getBlockSize()=65535; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[127.0.0.1:39584,DS-1b7892ee-f465-4755-87a2-f87c8f323c08,DISK]]} 2015-08-27 17:01:14,166 [Thread-1] INFO hdfs.TestDFSStripedOutputStreamWithFailure (TestDFSStripedOutputStreamWithFailure.java:checkData(444)) - i,j=3, 3, numCellInBlock=0, blockSize=0, lb=null 2015-08-27 17:01:14,166 [Thread-1] INFO hdfs.TestDFSStripedOutputStreamWithFailure (TestDFSStripedOutputStreamWithFailure.java:checkData(444)) - i,j=4, 4, numCellInBlock=0, blockSize=0, lb=null 2015-08-27 17:01:14,166 [Thread-1] INFO hdfs.TestDFSStripedOutputStreamWithFailure (TestDFSStripedOutputStreamWithFailure.java:checkData(444)) - i,j=5, 5, numCellInBlock=0, blockSize=0, lb=null 2015-08-27 17:01:14,166 [Thread-1] INFO hdfs.TestDFSStripedOutputStreamWithFailure (TestDFSStripedOutputStreamWithFailure.java:checkData(444)) - i,j=6, 0, numCellInBlock=1, blockSize=65536, lb=LocatedBlock{BP-1919703598-9.96.1.34-1440666072467:blk_-9223372036854775786_1001; getBlockSize()=65536; corrupt=false; offset=0; locs=[DatanodeInfoWithStorage[127.0.0.1:38728,DS-aa540f42-bb6f-4923-a951-5196d4c2a810,DISK]]} 2015-08-27 17:01:14,169 [Thread-1] INFO hdfs.TestDFSStripedOutputStreamWithFailure (TestDFSStripedOutputStreamWithFailure.java:checkData(444)) - i,j=7, 0, numCellInBlock=1, blockSize=65536, lb=null 2015-08-27 17:01:14,169 [Thread-1] INFO hdfs.TestDFSStripedOutputStreamWithFailure (TestDFSStripedOutputStreamWithFailure.java:checkData(444)) - i,j=8, 0, numCellInBlock=1, blockSize=65536, lb=null i=2, killedDnIndex=1: arrays first differed at element [1]; Expected :127 Actual :0 Click to see difference at org.junit.internal.ComparisonCriteria.arrayEquals(ComparisonCriteria.java:50) at org.junit.Assert.internalArrayEquals(Assert.java:473) at org.junit.Assert.assertArrayEquals(Assert.java:294) at org.apache.hadoop.hdfs.TestDFSStripedOutputStream.verifyParity(TestDFSStripedOutputStream.java:288) at org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.checkData(TestDFSStripedOutputStreamWithFailure.java:487) at org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.runTest(TestDFSStripedOutputStreamWithFailure.java:340) at org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.runTest(TestDFSStripedOutputStreamWithFailure.java:268) at org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure.runTest(TestDFSStripedOutputStreamWithFailure.java:146) at org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure$TestBase.run(TestDFSStripedOutputStreamWithFailure.java:520) at org.apache.hadoop.hdfs.TestDFSStripedOutputStreamWithFailure$TestBase.test0(TestDFSStripedOutputStreamWithFailure.java:523) {noformat} 3. disable block recovery for some test cases. Just for correctness. Erasure coding: fix 2 failed tests of DFSStripedOutputStream Key: HDFS-8978 URL: https://issues.apache.org/jira/browse/HDFS-8978 Project: Hadoop HDFS Issue Type: Sub-task Components: test Reporter: Walter Su
[jira] [Commented] (HDFS-2390) dfsadmin -setBalancerBandwidth doesnot validate -ve value
[ https://issues.apache.org/jira/browse/HDFS-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716808#comment-14716808 ] Hudson commented on HDFS-2390: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2240 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2240/]) HDFS-2390. dfsadmin -setBalancerBandwidth does not validate -ve value. Contributed by Gautam Gopalakrishnan. (harsh: rev 0bf285413f8fcaadbb2d5817fe8090f5fb0d37d9) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSAdmin.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSAdminWithHA.java dfsadmin -setBalancerBandwidth doesnot validate -ve value - Key: HDFS-2390 URL: https://issues.apache.org/jira/browse/HDFS-2390 Project: Hadoop HDFS Issue Type: Improvement Components: balancer mover Affects Versions: 2.7.1 Reporter: Rajit Saha Assignee: Gautam Gopalakrishnan Priority: Minor Fix For: 2.8.0 Attachments: HDFS-2390-1.patch, HDFS-2390-2.patch, HDFS-2390-3.patch, HDFS-2390-4.patch $ hadoop dfsadmin -setBalancerBandwidth -1 does not throw any message that it is invalid although in DN log we are not getting DNA_BALANCERBANDWIDTHUPDATE. I think it should throw some message that -ve numbers are not valid , as it does for decimal numbers or non-numbers like - $ hadoop dfsadmin -setBalancerBandwidth 12.34 NumberFormatException: For input string: 12.34 Usage: java DFSAdmin [-setBalancerBandwidth bandwidth in bytes per second] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8969) Inconsistent synchronization of FSEditLog.editLogStream;
[ https://issues.apache.org/jira/browse/HDFS-8969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-8969: --- Attachment: HDFS-8969.001.patch Inconsistent synchronization of FSEditLog.editLogStream; Key: HDFS-8969 URL: https://issues.apache.org/jira/browse/HDFS-8969 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.8.0 Reporter: Anu Engineer Assignee: Anu Engineer Attachments: HDFS-8969.001.patch Fix Findbug warnings -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8961) Investigate lock issue in o.a.h.hdfs.shortcircuit.DfsClientShmManager.EndpointShmManager
[ https://issues.apache.org/jira/browse/HDFS-8961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716960#comment-14716960 ] Haohui Mai commented on HDFS-8961: -- +1. I'll commit it shortly. Investigate lock issue in o.a.h.hdfs.shortcircuit.DfsClientShmManager.EndpointShmManager Key: HDFS-8961 URL: https://issues.apache.org/jira/browse/HDFS-8961 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Mingliang Liu Attachments: HDFS-8961.000.patch There are two clauses in {{hadoop-hdfs}} to filter out the findbugs warnings in {{org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager}}: {code} Match Class name=org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager / Method name=allocSlot / Bug pattern=UL_UNRELEASED_LOCK_EXCEPTION_PATH / /Match Match Class name=org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager / Method name=allocSlot / Bug pattern=UL_UNRELEASED_LOCK / /Match {code} These two warnings show up in the Jenkins run as these classes are moved into the {{hadoop-hdfs-client}} module. We either need to fix the code or move these clauses to the {{hadoop-hdfs-client}} module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8969) Fix findbug issues
[ https://issues.apache.org/jira/browse/HDFS-8969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716957#comment-14716957 ] Brahma Reddy Battula commented on HDFS-8969: [~anu] thanks a lot for your patch..Patch looks good to me..+1 ( non binding).. Fix findbug issues -- Key: HDFS-8969 URL: https://issues.apache.org/jira/browse/HDFS-8969 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.8.0 Reporter: Anu Engineer Assignee: Anu Engineer Attachments: HDFS-8969.001.patch Fix Findbug warnings -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8946) Improve choosing datanode storage for block placement
[ https://issues.apache.org/jira/browse/HDFS-8946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717005#comment-14717005 ] Hadoop QA commented on HDFS-8946: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 24s | Pre-patch trunk has 4 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 40s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 52s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 23s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 19s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 27s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 26s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 7s | Pre-build of native portion | | {color:green}+1{color} | hdfs tests | 161m 10s | Tests passed in hadoop-hdfs. | | | | 205m 25s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752741/HDFS-8946.002.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 0bf2854 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/12162/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12162/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12162/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12162/console | This message was automatically generated. Improve choosing datanode storage for block placement - Key: HDFS-8946 URL: https://issues.apache.org/jira/browse/HDFS-8946 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Yi Liu Assignee: Yi Liu Attachments: HDFS-8946.001.patch, HDFS-8946.002.patch This JIRA is to: Improve chooseing datanode storage for block placement: In {{BlockPlacementPolicyDefault}} ({{chooseLocalStorage}}, {{chooseRandom}}), we have following logic to choose datanode storage to place block. For given storage type, we iterate storages of the datanode. But for datanode, it only cares about the storage type. In the loop, we check according to Storage type and return the first storage if the storages of the type on the datanode fit in requirement. So we can remove the iteration of storages, and just need to do once to find a good storage of given type, it's efficient if the storages of the type on the datanode don't fit in requirement since we don't need to loop all storages and do the same check. Besides, no need to shuffle the storages, since we only need to check according to the storage type on the datanode once. This also improves the logic and make it more clear. {code} if (excludedNodes.add(localMachine) // was not in the excluded list isGoodDatanode(localDatanode, maxNodesPerRack, false, results, avoidStaleNodes)) { for (IteratorMap.EntryStorageType, Integer iter = storageTypes .entrySet().iterator(); iter.hasNext(); ) { Map.EntryStorageType, Integer entry = iter.next(); for (DatanodeStorageInfo localStorage : DFSUtil.shuffle( localDatanode.getStorageInfos())) { StorageType type = entry.getKey(); if (addIfIsGoodTarget(localStorage, excludedNodes, blocksize, results, type) = 0) { int num = entry.getValue(); ... {code} (current logic above) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8969) Fix findbug issues
[ https://issues.apache.org/jira/browse/HDFS-8969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716981#comment-14716981 ] Haohui Mai commented on HDFS-8969: -- +1 pending jenkins. Fix findbug issues -- Key: HDFS-8969 URL: https://issues.apache.org/jira/browse/HDFS-8969 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.8.0 Reporter: Anu Engineer Assignee: Anu Engineer Attachments: HDFS-8969.001.patch Fix Findbug warnings -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8962) Clean up checkstyle warnings in o.a.h.hdfs.DfsClientConf
[ https://issues.apache.org/jira/browse/HDFS-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-8962: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) I've committed the patch to trunk and branch-2. Thanks [~liuml07] for the contribution. Clean up checkstyle warnings in o.a.h.hdfs.DfsClientConf Key: HDFS-8962 URL: https://issues.apache.org/jira/browse/HDFS-8962 Project: Hadoop HDFS Issue Type: Sub-task Components: build Reporter: Mingliang Liu Assignee: Mingliang Liu Fix For: 2.8.0 Attachments: HDFS-8962.000.patch, HDFS-8962.001.patch, HDFS-8962.002.patch This is a follow up of HDFS-8803. HDFS-8803 exposes multiple checkstyles and whitespace warnings in the Jenkins run. These warnings should be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8969) Inconsistent synchronization of FSEditLog.editLogStream
[ https://issues.apache.org/jira/browse/HDFS-8969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-8969: --- Summary: Inconsistent synchronization of FSEditLog.editLogStream (was: Inconsistent synchronization of FSEditLog.editLogStream;) Inconsistent synchronization of FSEditLog.editLogStream --- Key: HDFS-8969 URL: https://issues.apache.org/jira/browse/HDFS-8969 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.8.0 Reporter: Anu Engineer Assignee: Anu Engineer Attachments: HDFS-8969.001.patch Fix Findbug warnings -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8969) Inconsistent synchronization of FSEditLog.editLogStream;
[ https://issues.apache.org/jira/browse/HDFS-8969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-8969: --- Description: Fix Findbug warnings (was: Fix Findbug warning on Synchronization.) Inconsistent synchronization of FSEditLog.editLogStream; Key: HDFS-8969 URL: https://issues.apache.org/jira/browse/HDFS-8969 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.8.0 Reporter: Anu Engineer Assignee: Anu Engineer Fix Findbug warnings -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8969) Fix findbug issues
[ https://issues.apache.org/jira/browse/HDFS-8969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716859#comment-14716859 ] Anu Engineer commented on HDFS-8969: [~brahmareddy] I have fixed both issues in the patch Fix findbug issues -- Key: HDFS-8969 URL: https://issues.apache.org/jira/browse/HDFS-8969 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.8.0 Reporter: Anu Engineer Assignee: Anu Engineer Attachments: HDFS-8969.001.patch Fix Findbug warnings -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8978) Erasure coding: fix 2 failed tests of DFSStripedOutputStream
[ https://issues.apache.org/jira/browse/HDFS-8978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716873#comment-14716873 ] Walter Su commented on HDFS-8978: - And I forgot to remove setBlockId in UCFeature. I'll include it in 02 patch tomorrow. Canceled the patch. Erasure coding: fix 2 failed tests of DFSStripedOutputStream Key: HDFS-8978 URL: https://issues.apache.org/jira/browse/HDFS-8978 Project: Hadoop HDFS Issue Type: Sub-task Components: test Reporter: Walter Su Assignee: Walter Su Priority: Minor Attachments: HDFS-8978-HDFS-7285.01.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8975) Erasure coding : Fix random failure in TestSafeModeWithStripedFile
[ https://issues.apache.org/jira/browse/HDFS-8975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716891#comment-14716891 ] Hadoop QA commented on HDFS-8975: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 5m 42s | Findbugs (version ) appears to be broken on HDFS-7285. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 1 new or modified test files. | | {color:green}+1{color} | javac | 7m 45s | There were no new javac warning messages. | | {color:red}-1{color} | release audit | 0m 13s | The applied patch generated 1 release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 33s | There were no new checkstyle issues. | | {color:red}-1{color} | whitespace | 0m 0s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 36s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 35s | The patch built with eclipse:eclipse. | | {color:red}-1{color} | findbugs | 2m 39s | The patch appears to introduce 4 new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 1m 11s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 223m 11s | Tests failed in hadoop-hdfs. | | | | 243m 28s | | \\ \\ || Reason || Tests || | FindBugs | module:hadoop-hdfs | | Failed unit tests | hadoop.hdfs.TestReplaceDatanodeOnFailure | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure | | | hadoop.hdfs.server.blockmanagement.TestNodeCount | | | hadoop.hdfs.TestWriteStripedFileWithFailure | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752720/HDFS-8975-HDFS-7285-01.patch | | Optional Tests | javac unit findbugs checkstyle | | git revision | HDFS-7285 / 164cbe6 | | Release Audit | https://builds.apache.org/job/PreCommit-HDFS-Build/12160/artifact/patchprocess/patchReleaseAuditProblems.txt | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/12160/artifact/patchprocess/whitespace.txt | | Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/12160/artifact/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12160/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12160/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12160/console | This message was automatically generated. Erasure coding : Fix random failure in TestSafeModeWithStripedFile -- Key: HDFS-8975 URL: https://issues.apache.org/jira/browse/HDFS-8975 Project: Hadoop HDFS Issue Type: Sub-task Reporter: J.Andreina Assignee: J.Andreina Attachments: HDFS-8975-HDFS-7285-01.patch TestSafeModeWithStripedFile#testStripedFile0 fails randomly because: In below code they are restarting DN and even before DN registration happens to NN, block report is triggered and following operation fails i) Safemode safe block is counted (which is coming as 0 randomly , eventhough one block is safe ) ii) Check for NN in safemode. {code} cluster.restartDataNode(dnprops.remove(0)); cluster.triggerBlockReports(); assertEquals(1, NameNodeAdapter.getSafeModeSafeBlocks(nn)); {code} {code} dnProperty = dnprops.remove(0); restartDN(dnProperty,nameNodeAddress); assertFalse(nn.isInSafeMode()); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8950) NameNode refresh doesn't remove DataNodes
[ https://issues.apache.org/jira/browse/HDFS-8950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-8950: --- Attachment: HDFS-8950.004.patch Here's another pass that replaces most of the Whitebox stuff with a refactored refresh() method in HostFileManager. Please review when you have a chance. NameNode refresh doesn't remove DataNodes - Key: HDFS-8950 URL: https://issues.apache.org/jira/browse/HDFS-8950 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Affects Versions: 2.6.0 Reporter: Daniel Templeton Assignee: Daniel Templeton Fix For: 2.8.0 Attachments: HDFS-8950.001.patch, HDFS-8950.002.patch, HDFS-8950.003.patch, HDFS-8950.004.patch If you remove a DN from NN's allowed host list (HDFS was HA) and then do NN refresh, it doesn't remove it actually and the NN UI keeps showing that node. It may try to allocate some blocks to that DN as well during an MR job. This issue is independent from DN decommission. To reproduce: 1. Add a DN to dfs_hosts_allow 2. Refresh NN 3. Start DN. Now NN starts seeing DN. 4. Stop DN 5. Remove DN from dfs_hosts_allow 6. Refresh NN - NN is still reporting DN as being used by HDFS. This is different from decom because there DN is added to exclude list in addition to being removed from allowed list, and in that case everything works correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8962) Clean up checkstyle warnings in o.a.h.hdfs.DfsClientConf
[ https://issues.apache.org/jira/browse/HDFS-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716963#comment-14716963 ] Hudson commented on HDFS-8962: -- FAILURE: Integrated in Hadoop-trunk-Commit #8355 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8355/]) HDFS-8962. Clean up checkstyle warnings in o.a.h.hdfs.DfsClientConf. Contributed by Mingliang Liu. (wheat9: rev 7e971b7315fa2942b4db7ba11ed513766957b777) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestCachingStrategy.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/shortcircuit/TestShortCircuitLocalRead.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/shortcircuit/TestShortCircuitCache.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderFactory.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/HdfsClientConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/impl/DfsClientConf.java Clean up checkstyle warnings in o.a.h.hdfs.DfsClientConf Key: HDFS-8962 URL: https://issues.apache.org/jira/browse/HDFS-8962 Project: Hadoop HDFS Issue Type: Sub-task Components: build Reporter: Mingliang Liu Assignee: Mingliang Liu Fix For: 2.8.0 Attachments: HDFS-8962.000.patch, HDFS-8962.001.patch, HDFS-8962.002.patch This is a follow up of HDFS-8803. HDFS-8803 exposes multiple checkstyles and whitespace warnings in the Jenkins run. These warnings should be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7899) Improve EOF error message
[ https://issues.apache.org/jira/browse/HDFS-7899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716974#comment-14716974 ] Jagadesh Kiran N commented on HDFS-7899: Changes are on changing exception message hence no tests included.Trunk failure is not related to patch changes. Improve EOF error message - Key: HDFS-7899 URL: https://issues.apache.org/jira/browse/HDFS-7899 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.6.0 Reporter: Harsh J Assignee: Jagadesh Kiran N Priority: Minor Attachments: HDFS-7899-00.patch Currently, a DN disconnection for reasons other than connection timeout or refused messages, such as an EOF message as a result of rejection or other network fault, reports in this manner: {code} WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /x.x.x.x: for block, add to deadNodes and continue. java.io.EOFException: Premature EOF: no length prefix available java.io.EOFException: Premature EOF: no length prefix available at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:171) at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:392) at org.apache.hadoop.hdfs.BlockReaderFactory.newBlockReader(BlockReaderFactory.java:137) at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:1103) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:538) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:750) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:794) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:602) {code} This is not very clear to a user (warn's at the hdfs-client). It could likely be improved with a more diagnosable message, or at least the direct reason than an EOF. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-7899) Improve EOF error message
[ https://issues.apache.org/jira/browse/HDFS-7899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716793#comment-14716793 ] Hadoop QA commented on HDFS-7899: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 16s | Pre-patch trunk has 2 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 54s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 2s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 1m 30s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 30s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 1m 58s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 13s | Pre-build of native portion | | {color:green}+1{color} | hdfs tests | 0m 28s | Tests passed in hadoop-hdfs-client. | | | | 44m 52s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752739/HDFS-7899-00.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 0bf2854 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/12163/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs-client.html | | hadoop-hdfs-client test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12163/artifact/patchprocess/testrun_hadoop-hdfs-client.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12163/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf900.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12163/console | This message was automatically generated. Improve EOF error message - Key: HDFS-7899 URL: https://issues.apache.org/jira/browse/HDFS-7899 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.6.0 Reporter: Harsh J Assignee: Jagadesh Kiran N Priority: Minor Attachments: HDFS-7899-00.patch Currently, a DN disconnection for reasons other than connection timeout or refused messages, such as an EOF message as a result of rejection or other network fault, reports in this manner: {code} WARN org.apache.hadoop.hdfs.DFSClient: Failed to connect to /x.x.x.x: for block, add to deadNodes and continue. java.io.EOFException: Premature EOF: no length prefix available java.io.EOFException: Premature EOF: no length prefix available at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:171) at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:392) at org.apache.hadoop.hdfs.BlockReaderFactory.newBlockReader(BlockReaderFactory.java:137) at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:1103) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:538) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:750) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:794) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:602) {code} This is not very clear to a user (warn's at the hdfs-client). It could likely be improved with a more diagnosable message, or at least the direct reason than an EOF. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8977) NNBench result wrong when number of reducers greater than 1
[ https://issues.apache.org/jira/browse/HDFS-8977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bibin A Chundatt updated HDFS-8977: --- Priority: Major (was: Minor) NNBench result wrong when number of reducers greater than 1 --- Key: HDFS-8977 URL: https://issues.apache.org/jira/browse/HDFS-8977 Project: Hadoop HDFS Issue Type: Bug Reporter: Bibin A Chundatt Assignee: Bibin A Chundatt Attachments: 0001-HDFS-8977.patch Currently NNBench#analyzeResults consider only the part- for analysis {code} TPS: Create/Write/Close: 0 Avg exec time (ms): Create/Write/Close: Infinity Avg Lat (ms): Create/Write: Infinity Avg Lat (ms): Close: NaN {code} Should consider all part files for output. or disable reduces option -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8969) Fix findbug issues
[ https://issues.apache.org/jira/browse/HDFS-8969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-8969: --- Summary: Fix findbug issues (was: Inconsistent synchronization of FSEditLog.editLogStream) Fix findbug issues -- Key: HDFS-8969 URL: https://issues.apache.org/jira/browse/HDFS-8969 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.8.0 Reporter: Anu Engineer Assignee: Anu Engineer Attachments: HDFS-8969.001.patch Fix Findbug warnings -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8962) Clean up checkstyle warnings in o.a.h.hdfs.DfsClientConf
[ https://issues.apache.org/jira/browse/HDFS-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-8962: - Summary: Clean up checkstyle warnings in o.a.h.hdfs.DfsClientConf (was: Fix checkstyle and whitespace issues in HDFS-8803) Clean up checkstyle warnings in o.a.h.hdfs.DfsClientConf Key: HDFS-8962 URL: https://issues.apache.org/jira/browse/HDFS-8962 Project: Hadoop HDFS Issue Type: Sub-task Components: build Reporter: Mingliang Liu Assignee: Mingliang Liu Attachments: HDFS-8962.000.patch, HDFS-8962.001.patch, HDFS-8962.002.patch This jira tracks the effort of fixing whitespace and checkstyle issues after moving {{DfsClientConf}} to the hdfs-client module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8961) Investigate lock issue in o.a.h.hdfs.shortcircuit.DfsClientShmManager.EndpointShmManager
[ https://issues.apache.org/jira/browse/HDFS-8961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-8961: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) I've committed the patch to trunk and branch-2. Thanks [~liuml07] for the contribution. Investigate lock issue in o.a.h.hdfs.shortcircuit.DfsClientShmManager.EndpointShmManager Key: HDFS-8961 URL: https://issues.apache.org/jira/browse/HDFS-8961 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Mingliang Liu Fix For: 2.8.0 Attachments: HDFS-8961.000.patch There are two clauses in {{hadoop-hdfs}} to filter out the findbugs warnings in {{org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager}}: {code} Match Class name=org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager / Method name=allocSlot / Bug pattern=UL_UNRELEASED_LOCK_EXCEPTION_PATH / /Match Match Class name=org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager / Method name=allocSlot / Bug pattern=UL_UNRELEASED_LOCK / /Match {code} These two warnings show up in the Jenkins run as these classes are moved into the {{hadoop-hdfs-client}} module. We either need to fix the code or move these clauses to the {{hadoop-hdfs-client}} module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8978) Erasure coding: fix 2 failed tests of DFSStripedOutputStream
[ https://issues.apache.org/jira/browse/HDFS-8978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Walter Su updated HDFS-8978: Status: Patch Available (was: Open) Erasure coding: fix 2 failed tests of DFSStripedOutputStream Key: HDFS-8978 URL: https://issues.apache.org/jira/browse/HDFS-8978 Project: Hadoop HDFS Issue Type: Sub-task Components: test Reporter: Walter Su Assignee: Walter Su Priority: Minor Attachments: HDFS-8978-HDFS-7285.01.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8962) Clean up checkstyle warnings in o.a.h.hdfs.DfsClientConf
[ https://issues.apache.org/jira/browse/HDFS-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-8962: - Description: This is a follow up of HDFS-8803. HDFS-8803 exposes multiple checkstyles and whitespace warnings in the Jenkins run. These warnings should be fixed. (was: This jira tracks the effort of fixing whitespace and checkstyle issues after moving {{DfsClientConf}} to the hdfs-client module.) Clean up checkstyle warnings in o.a.h.hdfs.DfsClientConf Key: HDFS-8962 URL: https://issues.apache.org/jira/browse/HDFS-8962 Project: Hadoop HDFS Issue Type: Sub-task Components: build Reporter: Mingliang Liu Assignee: Mingliang Liu Attachments: HDFS-8962.000.patch, HDFS-8962.001.patch, HDFS-8962.002.patch This is a follow up of HDFS-8803. HDFS-8803 exposes multiple checkstyles and whitespace warnings in the Jenkins run. These warnings should be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8829) DataNode sets SO_RCVBUF explicitly is disabling tcp auto-tuning
[ https://issues.apache.org/jira/browse/HDFS-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717260#comment-14717260 ] Colin Patrick McCabe commented on HDFS-8829: {{DFSClient#dataSocketRecvBufferSize}} seems not to be used anywhere. Since the main performance benefit you saw was with setting the DataNode receive buffer size, how about just creating {{dfs.datanode.transfer.socket.send.buffer.size}} and {{dfs.datanode.transfer.socket.recv.buffer.size}} that are explicitly just for {{DataTransferProtocol}} on the {{DataNode}}? Then we can avoid making any changes to the DFSClient in this patch. This may effect TCP connection throughput. should be This may affect TCP connection throughput. DataNode sets SO_RCVBUF explicitly is disabling tcp auto-tuning --- Key: HDFS-8829 URL: https://issues.apache.org/jira/browse/HDFS-8829 Project: Hadoop HDFS Issue Type: Improvement Components: datanode Affects Versions: 2.3.0, 2.6.0 Reporter: He Tianyi Assignee: He Tianyi Attachments: HDFS-8829.0001.patch, HDFS-8829.0002.patch, HDFS-8829.0003.patch {code:java} private void initDataXceiver(Configuration conf) throws IOException { // find free port or use privileged port provided TcpPeerServer tcpPeerServer; if (secureResources != null) { tcpPeerServer = new TcpPeerServer(secureResources); } else { tcpPeerServer = new TcpPeerServer(dnConf.socketWriteTimeout, DataNode.getStreamingAddr(conf)); } tcpPeerServer.setReceiveBufferSize(HdfsConstants.DEFAULT_DATA_SOCKET_SIZE); {code} The last line sets SO_RCVBUF explicitly, thus disabling tcp auto-tuning on some system. Shall we make this behavior configurable? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8964) Provide max TxId when validating in-progress edit log files
[ https://issues.apache.org/jira/browse/HDFS-8964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717279#comment-14717279 ] Colin Patrick McCabe commented on HDFS-8964: The overall idea looks good. Rather than special-casing negative values for maxTxId, we can just pass in {{Long.MAX_VALUE}} in cases where we really don't want a limit. Of course we also need to plug this into the relevant places in NameNode and JournalNode. Provide max TxId when validating in-progress edit log files --- Key: HDFS-8964 URL: https://issues.apache.org/jira/browse/HDFS-8964 Project: Hadoop HDFS Issue Type: Bug Components: journal-node, namenode Affects Versions: 2.7.1 Reporter: Zhe Zhang Assignee: Zhe Zhang Attachments: HDFS-8964.00.patch NN/JN validates in-progress edit log files in multiple scenarios, via {{EditLogFile#validateLog}}. The method scans through the edit log file to find the last transaction ID. However, an in-progress edit log file could be actively written to, which creates a race condition and causes incorrect data to be read (and later we attempt to interpret the data as ops). Currently {{validateLog}} is used in 3 places: # NN {{getEditsFromTxid}} # JN {{getEditLogManifest}} # NN/JN {{recoverUnfinalizedSegments}} In the first two scenarios we should provide a maximum TxId to validate in the in-progress file. The 3rd scenario won't cause a race condition because only non-current in-progress edit log files are validated. {{validateLog}} is actually only used with in-progress files, and could use a better name and Javadoc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8971) Remove guards when calling LOG.debug() and LOG.trace() in shortcircuit package
[ https://issues.apache.org/jira/browse/HDFS-8971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717430#comment-14717430 ] Mingliang Liu commented on HDFS-8971: - We can fix this after moving {{BlockReader}} to client module. Remove guards when calling LOG.debug() and LOG.trace() in shortcircuit package -- Key: HDFS-8971 URL: https://issues.apache.org/jira/browse/HDFS-8971 Project: Hadoop HDFS Issue Type: Sub-task Components: build Reporter: Mingliang Liu Assignee: Mingliang Liu We moved the {{shortcircuit}} package from {{hadoop-hdfs}} to {{hadoop-hdfs-client}} module in JIRA [HDFS-8934|https://issues.apache.org/jira/browse/HDFS-8934] and [HDFS-8951|https://issues.apache.org/jira/browse/HDFS-8951]. Meanwhile, we also replaced the _log4j_ log with _slf4j_ logger. There were existing code in the {{shortcircuit}} package to guard the log when calling {{LOG.debug()}} and {{LOG.trace()}}, e.g. in {{ShortCircuitCache.java}}, we have code like this: {code} 724if (LOG.isTraceEnabled()) { 725 LOG.trace(this + : found waitable for + key); 726} {code} In _slf4j_, this kind of guard is not necessary. We should clean the code by removing the guard. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8971) Remove guards when calling LOG.debug() and LOG.trace() in client package
[ https://issues.apache.org/jira/browse/HDFS-8971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-8971: Summary: Remove guards when calling LOG.debug() and LOG.trace() in client package (was: Remove guards when calling LOG.debug() and LOG.trace() in shortcircuit package) Remove guards when calling LOG.debug() and LOG.trace() in client package Key: HDFS-8971 URL: https://issues.apache.org/jira/browse/HDFS-8971 Project: Hadoop HDFS Issue Type: Sub-task Components: build Reporter: Mingliang Liu Assignee: Mingliang Liu We moved the {{shortcircuit}} package from {{hadoop-hdfs}} to {{hadoop-hdfs-client}} module in JIRA [HDFS-8934|https://issues.apache.org/jira/browse/HDFS-8934] and [HDFS-8951|https://issues.apache.org/jira/browse/HDFS-8951]. Meanwhile, we also replaced the _log4j_ log with _slf4j_ logger. There were existing code in the {{shortcircuit}} package to guard the log when calling {{LOG.debug()}} and {{LOG.trace()}}, e.g. in {{ShortCircuitCache.java}}, we have code like this: {code} 724if (LOG.isTraceEnabled()) { 725 LOG.trace(this + : found waitable for + key); 726} {code} In _slf4j_, this kind of guard is not necessary. We should clean the code by removing the guard. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8961) Investigate lock issue in o.a.h.hdfs.shortcircuit.DfsClientShmManager.EndpointShmManager
[ https://issues.apache.org/jira/browse/HDFS-8961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717238#comment-14717238 ] Hudson commented on HDFS-8961: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1044 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1044/]) HDFS-8961. Investigate lock issue in o.a.h.hdfs.shortcircuit.DfsClientShmManager.EndpointShmManager. Contributed by Mingliang Liu. (wheat9: rev 1e5f69e85c035f9507e8b788df0b3ce20290a770) * hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-client/dev-support/findbugsExcludeFile.xml Investigate lock issue in o.a.h.hdfs.shortcircuit.DfsClientShmManager.EndpointShmManager Key: HDFS-8961 URL: https://issues.apache.org/jira/browse/HDFS-8961 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Mingliang Liu Fix For: 2.8.0 Attachments: HDFS-8961.000.patch There are two clauses in {{hadoop-hdfs}} to filter out the findbugs warnings in {{org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager}}: {code} Match Class name=org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager / Method name=allocSlot / Bug pattern=UL_UNRELEASED_LOCK_EXCEPTION_PATH / /Match Match Class name=org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager / Method name=allocSlot / Bug pattern=UL_UNRELEASED_LOCK / /Match {code} These two warnings show up in the Jenkins run as these classes are moved into the {{hadoop-hdfs-client}} module. We either need to fix the code or move these clauses to the {{hadoop-hdfs-client}} module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8962) Clean up checkstyle warnings in o.a.h.hdfs.DfsClientConf
[ https://issues.apache.org/jira/browse/HDFS-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717239#comment-14717239 ] Hudson commented on HDFS-8962: -- FAILURE: Integrated in Hadoop-Yarn-trunk #1044 (See [https://builds.apache.org/job/Hadoop-Yarn-trunk/1044/]) HDFS-8962. Clean up checkstyle warnings in o.a.h.hdfs.DfsClientConf. Contributed by Mingliang Liu. (wheat9: rev 7e971b7315fa2942b4db7ba11ed513766957b777) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestCachingStrategy.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/HdfsClientConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/shortcircuit/TestShortCircuitLocalRead.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/shortcircuit/TestShortCircuitCache.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/impl/DfsClientConf.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderFactory.java Clean up checkstyle warnings in o.a.h.hdfs.DfsClientConf Key: HDFS-8962 URL: https://issues.apache.org/jira/browse/HDFS-8962 Project: Hadoop HDFS Issue Type: Sub-task Components: build Reporter: Mingliang Liu Assignee: Mingliang Liu Fix For: 2.8.0 Attachments: HDFS-8962.000.patch, HDFS-8962.001.patch, HDFS-8962.002.patch This is a follow up of HDFS-8803. HDFS-8803 exposes multiple checkstyles and whitespace warnings in the Jenkins run. These warnings should be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8961) Investigate lock issue in o.a.h.hdfs.shortcircuit.DfsClientShmManager.EndpointShmManager
[ https://issues.apache.org/jira/browse/HDFS-8961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717250#comment-14717250 ] Hudson commented on HDFS-8961: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #311 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/311/]) HDFS-8961. Investigate lock issue in o.a.h.hdfs.shortcircuit.DfsClientShmManager.EndpointShmManager. Contributed by Mingliang Liu. (wheat9: rev 1e5f69e85c035f9507e8b788df0b3ce20290a770) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-client/dev-support/findbugsExcludeFile.xml * hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml Investigate lock issue in o.a.h.hdfs.shortcircuit.DfsClientShmManager.EndpointShmManager Key: HDFS-8961 URL: https://issues.apache.org/jira/browse/HDFS-8961 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Mingliang Liu Fix For: 2.8.0 Attachments: HDFS-8961.000.patch There are two clauses in {{hadoop-hdfs}} to filter out the findbugs warnings in {{org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager}}: {code} Match Class name=org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager / Method name=allocSlot / Bug pattern=UL_UNRELEASED_LOCK_EXCEPTION_PATH / /Match Match Class name=org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager / Method name=allocSlot / Bug pattern=UL_UNRELEASED_LOCK / /Match {code} These two warnings show up in the Jenkins run as these classes are moved into the {{hadoop-hdfs-client}} module. We either need to fix the code or move these clauses to the {{hadoop-hdfs-client}} module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8962) Clean up checkstyle warnings in o.a.h.hdfs.DfsClientConf
[ https://issues.apache.org/jira/browse/HDFS-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717251#comment-14717251 ] Hudson commented on HDFS-8962: -- SUCCESS: Integrated in Hadoop-Mapreduce-trunk-Java8 #311 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Java8/311/]) HDFS-8962. Clean up checkstyle warnings in o.a.h.hdfs.DfsClientConf. Contributed by Mingliang Liu. (wheat9: rev 7e971b7315fa2942b4db7ba11ed513766957b777) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/shortcircuit/TestShortCircuitCache.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/shortcircuit/TestShortCircuitLocalRead.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/HdfsClientConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestCachingStrategy.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderFactory.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/impl/DfsClientConf.java Clean up checkstyle warnings in o.a.h.hdfs.DfsClientConf Key: HDFS-8962 URL: https://issues.apache.org/jira/browse/HDFS-8962 Project: Hadoop HDFS Issue Type: Sub-task Components: build Reporter: Mingliang Liu Assignee: Mingliang Liu Fix For: 2.8.0 Attachments: HDFS-8962.000.patch, HDFS-8962.001.patch, HDFS-8962.002.patch This is a follow up of HDFS-8803. HDFS-8803 exposes multiple checkstyles and whitespace warnings in the Jenkins run. These warnings should be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8969) Fix findbug issues
[ https://issues.apache.org/jira/browse/HDFS-8969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717299#comment-14717299 ] Hadoop QA commented on HDFS-8969: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 22s | Pre-patch trunk has 4 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:red}-1{color} | tests included | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | | {color:green}+1{color} | javac | 7m 41s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 9m 50s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 24s | The applied patch does not increase the total number of release audit warnings. | | {color:green}+1{color} | checkstyle | 0m 45s | There were no new checkstyle issues. | | {color:green}+1{color} | whitespace | 0m 0s | The patch has no lines that end in whitespace. | | {color:green}+1{color} | install | 1m 28s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 34s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 30s | The patch does not introduce any new Findbugs (version 3.0.0) warnings, and fixes 2 pre-existing warnings. | | {color:green}+1{color} | native | 3m 9s | Pre-build of native portion | | {color:green}+1{color} | hdfs tests | 161m 10s | Tests passed in hadoop-hdfs. | | | | 204m 57s | | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752779/HDFS-8969.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 0bf2854 | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/12165/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12165/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12165/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12165/console | This message was automatically generated. Fix findbug issues -- Key: HDFS-8969 URL: https://issues.apache.org/jira/browse/HDFS-8969 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.8.0 Reporter: Anu Engineer Assignee: Anu Engineer Attachments: HDFS-8969.001.patch Fix Findbug warnings -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8962) Clean up checkstyle warnings in o.a.h.hdfs.DfsClientConf
[ https://issues.apache.org/jira/browse/HDFS-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717311#comment-14717311 ] Hudson commented on HDFS-8962: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2260 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2260/]) HDFS-8962. Clean up checkstyle warnings in o.a.h.hdfs.DfsClientConf. Contributed by Mingliang Liu. (wheat9: rev 7e971b7315fa2942b4db7ba11ed513766957b777) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/shortcircuit/TestShortCircuitLocalRead.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestCachingStrategy.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/HdfsClientConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderFactory.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/impl/DfsClientConf.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/shortcircuit/TestShortCircuitCache.java Clean up checkstyle warnings in o.a.h.hdfs.DfsClientConf Key: HDFS-8962 URL: https://issues.apache.org/jira/browse/HDFS-8962 Project: Hadoop HDFS Issue Type: Sub-task Components: build Reporter: Mingliang Liu Assignee: Mingliang Liu Fix For: 2.8.0 Attachments: HDFS-8962.000.patch, HDFS-8962.001.patch, HDFS-8962.002.patch This is a follow up of HDFS-8803. HDFS-8803 exposes multiple checkstyles and whitespace warnings in the Jenkins run. These warnings should be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8925) Move BlockReader to hdfs-client
[ https://issues.apache.org/jira/browse/HDFS-8925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-8925: Attachment: HDFS-8925.001.patch Move BlockReader to hdfs-client --- Key: HDFS-8925 URL: https://issues.apache.org/jira/browse/HDFS-8925 Project: Hadoop HDFS Issue Type: Sub-task Components: build Reporter: Mingliang Liu Assignee: Mingliang Liu Attachments: HDFS-8925.000.patch, HDFS-8925.001.patch This jira tracks the effort of moving the {{BlockReader}} class into the hdfs-client module. We also move {{BlockReaderLocal}} class which implements the {{BlockReader}} interface to {{hdfs-client}} module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8960) DFS client says no more good datanodes being available to try on a single drive failure
[ https://issues.apache.org/jira/browse/HDFS-8960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717242#comment-14717242 ] Benoit Sigoure commented on HDFS-8960: -- No, these were lost in the HDD failure. Do you see any sign of pipeline recovery even trying to happen here? I'm kinda confused because none of the logs I've looked at show any indication of it happening. Am I missing something? DFS client says no more good datanodes being available to try on a single drive failure - Key: HDFS-8960 URL: https://issues.apache.org/jira/browse/HDFS-8960 Project: Hadoop HDFS Issue Type: Bug Components: hdfs-client Affects Versions: 2.7.1 Environment: openjdk version 1.8.0_45-internal OpenJDK Runtime Environment (build 1.8.0_45-internal-b14) OpenJDK 64-Bit Server VM (build 25.45-b02, mixed mode) Reporter: Benoit Sigoure Attachments: blk_1073817519_77099.log, r12s13-datanode.log, r12s16-datanode.log Since we upgraded to 2.7.1 we regularly see single-drive failures cause widespread problems at the HBase level (with the default 3x replication target). Here's an example. This HBase RegionServer is r12s16 (172.24.32.16) and is writing its WAL to [172.24.32.16:10110, 172.24.32.8:10110, 172.24.32.13:10110] as can be seen by the following occasional messages: {code} 2015-08-23 06:28:40,272 INFO [sync.3] wal.FSHLog: Slow sync cost: 123 ms, current pipeline: [172.24.32.16:10110, 172.24.32.8:10110, 172.24.32.13:10110] {code} A bit later, the second node in the pipeline above is going to experience an HDD failure. {code} 2015-08-23 07:21:58,720 WARN [DataStreamer for file /hbase/WALs/r12s16.sjc.aristanetworks.com,9104,1439917659071/r12s16.sjc.aristanetworks.com%2C9104%2C1439917659071.default.1440314434998 block BP-1466258523-172.24.32.1-1437768622582:blk_1073817519_77099] hdfs.DFSClient: Error Recovery for block BP-1466258523-172.24.32.1-1437768622582:blk_1073817519_77099 in pipeline 172.24.32.16:10110, 172.24.32.13:10110, 172.24.32.8:10110: bad datanode 172.24.32.8:10110 {code} And then HBase will go like omg I can't write to my WAL, let me commit suicide. {code} 2015-08-23 07:22:26,060 FATAL [regionserver/r12s16.sjc.aristanetworks.com/172.24.32.16:9104.append-pool1-t1] wal.FSHLog: Could not append. Requesting close of wal java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[172.24.32.16:10110, 172.24.32.13:10110], original=[172.24.32.16:10110, 172.24.32.13:10110]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:969) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1035) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1184) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:933) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:487) {code} Whereas this should be mostly a non-event as the DFS client should just drop the bad replica from the write pipeline. This is a small cluster but has 16 DNs so the failed DN in the pipeline should be easily replaced. I didn't set {{dfs.client.block.write.replace-datanode-on-failure.policy}} (so it's still {{DEFAULT}}) and didn't set {{dfs.client.block.write.replace-datanode-on-failure.enable}} (so it's still {{true}}). I don't see anything noteworthy in the NN log around the time of the failure, it just seems like the DFS client gave up or threw an exception back to HBase that it wasn't throwing before or something else, and that made this single drive failure lethal. We've occasionally be unlucky enough to have a single-drive failure cause multiple RegionServers to commit suicide because they had their WALs on that drive. We upgraded from 2.7.0 about a month ago, and I'm not sure whether we were seeing this with 2.7 or not – prior to that we were running in a quite different environment, but this is a fairly new deployment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8961) Investigate lock issue in o.a.h.hdfs.shortcircuit.DfsClientShmManager.EndpointShmManager
[ https://issues.apache.org/jira/browse/HDFS-8961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717310#comment-14717310 ] Hudson commented on HDFS-8961: -- FAILURE: Integrated in Hadoop-Mapreduce-trunk #2260 (See [https://builds.apache.org/job/Hadoop-Mapreduce-trunk/2260/]) HDFS-8961. Investigate lock issue in o.a.h.hdfs.shortcircuit.DfsClientShmManager.EndpointShmManager. Contributed by Mingliang Liu. (wheat9: rev 1e5f69e85c035f9507e8b788df0b3ce20290a770) * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml * hadoop-hdfs-project/hadoop-hdfs-client/dev-support/findbugsExcludeFile.xml Investigate lock issue in o.a.h.hdfs.shortcircuit.DfsClientShmManager.EndpointShmManager Key: HDFS-8961 URL: https://issues.apache.org/jira/browse/HDFS-8961 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Mingliang Liu Fix For: 2.8.0 Attachments: HDFS-8961.000.patch There are two clauses in {{hadoop-hdfs}} to filter out the findbugs warnings in {{org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager}}: {code} Match Class name=org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager / Method name=allocSlot / Bug pattern=UL_UNRELEASED_LOCK_EXCEPTION_PATH / /Match Match Class name=org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager / Method name=allocSlot / Bug pattern=UL_UNRELEASED_LOCK / /Match {code} These two warnings show up in the Jenkins run as these classes are moved into the {{hadoop-hdfs-client}} module. We either need to fix the code or move these clauses to the {{hadoop-hdfs-client}} module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8961) Investigate lock issue in o.a.h.hdfs.shortcircuit.DfsClientShmManager.EndpointShmManager
[ https://issues.apache.org/jira/browse/HDFS-8961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717412#comment-14717412 ] Hudson commented on HDFS-8961: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2241 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2241/]) HDFS-8961. Investigate lock issue in o.a.h.hdfs.shortcircuit.DfsClientShmManager.EndpointShmManager. Contributed by Mingliang Liu. (wheat9: rev 1e5f69e85c035f9507e8b788df0b3ce20290a770) * hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml * hadoop-hdfs-project/hadoop-hdfs-client/dev-support/findbugsExcludeFile.xml * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Investigate lock issue in o.a.h.hdfs.shortcircuit.DfsClientShmManager.EndpointShmManager Key: HDFS-8961 URL: https://issues.apache.org/jira/browse/HDFS-8961 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Mingliang Liu Fix For: 2.8.0 Attachments: HDFS-8961.000.patch There are two clauses in {{hadoop-hdfs}} to filter out the findbugs warnings in {{org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager}}: {code} Match Class name=org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager / Method name=allocSlot / Bug pattern=UL_UNRELEASED_LOCK_EXCEPTION_PATH / /Match Match Class name=org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager / Method name=allocSlot / Bug pattern=UL_UNRELEASED_LOCK / /Match {code} These two warnings show up in the Jenkins run as these classes are moved into the {{hadoop-hdfs-client}} module. We either need to fix the code or move these clauses to the {{hadoop-hdfs-client}} module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8969) Clean up findbugs warnings for HDFS-8823 and HDFS-8932
[ https://issues.apache.org/jira/browse/HDFS-8969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717432#comment-14717432 ] Hudson commented on HDFS-8969: -- FAILURE: Integrated in Hadoop-trunk-Commit #8358 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8358/]) HDFS-8969. Clean up findbugs warnings for HDFS-8823 and HDFS-8932. Contributed by Anu Engineer. (wheat9: rev f97a0f8c2cdad0668a3892319f6969fafc2f04cd) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java Clean up findbugs warnings for HDFS-8823 and HDFS-8932 -- Key: HDFS-8969 URL: https://issues.apache.org/jira/browse/HDFS-8969 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.8.0 Reporter: Anu Engineer Assignee: Anu Engineer Fix For: 2.8.0 Attachments: HDFS-8969.001.patch Fix Findbug warnings -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8823) Move replication factor into individual blocks
[ https://issues.apache.org/jira/browse/HDFS-8823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717434#comment-14717434 ] Hudson commented on HDFS-8823: -- FAILURE: Integrated in Hadoop-trunk-Commit #8358 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8358/]) HDFS-8969. Clean up findbugs warnings for HDFS-8823 and HDFS-8932. Contributed by Anu Engineer. (wheat9: rev f97a0f8c2cdad0668a3892319f6969fafc2f04cd) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java Move replication factor into individual blocks -- Key: HDFS-8823 URL: https://issues.apache.org/jira/browse/HDFS-8823 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.8.0 Attachments: HDFS-8823.000.patch, HDFS-8823.001.patch, HDFS-8823.002.patch, HDFS-8823.003.patch, HDFS-8823.004.patch, HDFS-8823.005.patch, HDFS-8823.006.patch This jira proposes to record the replication factor in the {{BlockInfo}} class. The changes have two advantages: * Decoupling the namespace and the block management layer. It is a prerequisite step to move block management off the heap or to a separate process. * Increased flexibility on replicating blocks. Currently the replication factors of all blocks have to be the same. The replication factors of these blocks are equal to the highest replication factor across all snapshots. The changes will allow blocks in a file to have different replication factor, potentially saving some space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8971) Remove guards when calling LOG.debug() and LOG.trace() in client package
[ https://issues.apache.org/jira/browse/HDFS-8971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mingliang Liu updated HDFS-8971: Description: We moved the {{shortcircuit}} package from {{hadoop-hdfs}} to {{hadoop-hdfs-client}} module in JIRA [HDFS-8934|https://issues.apache.org/jira/browse/HDFS-8934] and [HDFS-8951|https://issues.apache.org/jira/browse/HDFS-8951], and {{BlockReader}} in [HDFS-8925|https://issues.apache.org/jira/browse/HDFS-8925]. Meanwhile, we also replaced the _log4j_ log with _slf4j_ logger. There were existing code in the client package to guard the log when calling {{LOG.debug()}} and {{LOG.trace()}}, e.g. in {{ShortCircuitCache.java}}, we have code like this: {code} 724if (LOG.isTraceEnabled()) { 725 LOG.trace(this + : found waitable for + key); 726} {code} In _slf4j_, this kind of guard is not necessary. We should clean the code by removing the guard from the client package. was: We moved the {{shortcircuit}} package from {{hadoop-hdfs}} to {{hadoop-hdfs-client}} module in JIRA [HDFS-8934|https://issues.apache.org/jira/browse/HDFS-8934] and [HDFS-8951|https://issues.apache.org/jira/browse/HDFS-8951]. Meanwhile, we also replaced the _log4j_ log with _slf4j_ logger. There were existing code in the {{shortcircuit}} package to guard the log when calling {{LOG.debug()}} and {{LOG.trace()}}, e.g. in {{ShortCircuitCache.java}}, we have code like this: {code} 724if (LOG.isTraceEnabled()) { 725 LOG.trace(this + : found waitable for + key); 726} {code} In _slf4j_, this kind of guard is not necessary. We should clean the code by removing the guard. Remove guards when calling LOG.debug() and LOG.trace() in client package Key: HDFS-8971 URL: https://issues.apache.org/jira/browse/HDFS-8971 Project: Hadoop HDFS Issue Type: Sub-task Components: build Reporter: Mingliang Liu Assignee: Mingliang Liu We moved the {{shortcircuit}} package from {{hadoop-hdfs}} to {{hadoop-hdfs-client}} module in JIRA [HDFS-8934|https://issues.apache.org/jira/browse/HDFS-8934] and [HDFS-8951|https://issues.apache.org/jira/browse/HDFS-8951], and {{BlockReader}} in [HDFS-8925|https://issues.apache.org/jira/browse/HDFS-8925]. Meanwhile, we also replaced the _log4j_ log with _slf4j_ logger. There were existing code in the client package to guard the log when calling {{LOG.debug()}} and {{LOG.trace()}}, e.g. in {{ShortCircuitCache.java}}, we have code like this: {code} 724if (LOG.isTraceEnabled()) { 725 LOG.trace(this + : found waitable for + key); 726} {code} In _slf4j_, this kind of guard is not necessary. We should clean the code by removing the guard from the client package. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8962) Clean up checkstyle warnings in o.a.h.hdfs.DfsClientConf
[ https://issues.apache.org/jira/browse/HDFS-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717379#comment-14717379 ] Hudson commented on HDFS-8962: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #303 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/303/]) HDFS-8962. Clean up checkstyle warnings in o.a.h.hdfs.DfsClientConf. Contributed by Mingliang Liu. (wheat9: rev 7e971b7315fa2942b4db7ba11ed513766957b777) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/shortcircuit/TestShortCircuitLocalRead.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderFactory.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/HdfsClientConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/impl/DfsClientConf.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestCachingStrategy.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/shortcircuit/TestShortCircuitCache.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java Clean up checkstyle warnings in o.a.h.hdfs.DfsClientConf Key: HDFS-8962 URL: https://issues.apache.org/jira/browse/HDFS-8962 Project: Hadoop HDFS Issue Type: Sub-task Components: build Reporter: Mingliang Liu Assignee: Mingliang Liu Fix For: 2.8.0 Attachments: HDFS-8962.000.patch, HDFS-8962.001.patch, HDFS-8962.002.patch This is a follow up of HDFS-8803. HDFS-8803 exposes multiple checkstyles and whitespace warnings in the Jenkins run. These warnings should be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8961) Investigate lock issue in o.a.h.hdfs.shortcircuit.DfsClientShmManager.EndpointShmManager
[ https://issues.apache.org/jira/browse/HDFS-8961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717378#comment-14717378 ] Hudson commented on HDFS-8961: -- FAILURE: Integrated in Hadoop-Hdfs-trunk-Java8 #303 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk-Java8/303/]) HDFS-8961. Investigate lock issue in o.a.h.hdfs.shortcircuit.DfsClientShmManager.EndpointShmManager. Contributed by Mingliang Liu. (wheat9: rev 1e5f69e85c035f9507e8b788df0b3ce20290a770) * hadoop-hdfs-project/hadoop-hdfs-client/dev-support/findbugsExcludeFile.xml * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/dev-support/findbugsExcludeFile.xml Investigate lock issue in o.a.h.hdfs.shortcircuit.DfsClientShmManager.EndpointShmManager Key: HDFS-8961 URL: https://issues.apache.org/jira/browse/HDFS-8961 Project: Hadoop HDFS Issue Type: Bug Reporter: Haohui Mai Assignee: Mingliang Liu Fix For: 2.8.0 Attachments: HDFS-8961.000.patch There are two clauses in {{hadoop-hdfs}} to filter out the findbugs warnings in {{org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager}}: {code} Match Class name=org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager / Method name=allocSlot / Bug pattern=UL_UNRELEASED_LOCK_EXCEPTION_PATH / /Match Match Class name=org.apache.hadoop.hdfs.shortcircuit.DfsClientShmManager$EndpointShmManager / Method name=allocSlot / Bug pattern=UL_UNRELEASED_LOCK / /Match {code} These two warnings show up in the Jenkins run as these classes are moved into the {{hadoop-hdfs-client}} module. We either need to fix the code or move these clauses to the {{hadoop-hdfs-client}} module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8969) Clean up findbugs warnings for HDFS-8823 and HDFS-8932
[ https://issues.apache.org/jira/browse/HDFS-8969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-8969: - Summary: Clean up findbugs warnings for HDFS-8823 and HDFS-8932 (was: Fix findbug issues) Clean up findbugs warnings for HDFS-8823 and HDFS-8932 -- Key: HDFS-8969 URL: https://issues.apache.org/jira/browse/HDFS-8969 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.8.0 Reporter: Anu Engineer Assignee: Anu Engineer Attachments: HDFS-8969.001.patch Fix Findbug warnings -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8932) NPE thrown in NameNode when try to get TotalSyncCount metric before editLogStream initialization
[ https://issues.apache.org/jira/browse/HDFS-8932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717433#comment-14717433 ] Hudson commented on HDFS-8932: -- FAILURE: Integrated in Hadoop-trunk-Commit #8358 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/8358/]) HDFS-8969. Clean up findbugs warnings for HDFS-8823 and HDFS-8932. Contributed by Anu Engineer. (wheat9: rev f97a0f8c2cdad0668a3892319f6969fafc2f04cd) * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLog.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java NPE thrown in NameNode when try to get TotalSyncCount metric before editLogStream initialization -- Key: HDFS-8932 URL: https://issues.apache.org/jira/browse/HDFS-8932 Project: Hadoop HDFS Issue Type: Bug Reporter: Surendra Singh Lilhore Assignee: Surendra Singh Lilhore Fix For: 2.8.0 Attachments: HDFS-8932.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8969) Fix findbug issues
[ https://issues.apache.org/jira/browse/HDFS-8969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717352#comment-14717352 ] Anu Engineer commented on HDFS-8969: working as expected. The patch does not introduce any new Findbugs (version 3.0.0) warnings, and fixes 2 pre-existing warnings and this patch did not modify any tests, hadoop QA tests are as expected. Fix findbug issues -- Key: HDFS-8969 URL: https://issues.apache.org/jira/browse/HDFS-8969 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.8.0 Reporter: Anu Engineer Assignee: Anu Engineer Attachments: HDFS-8969.001.patch Fix Findbug warnings -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8965) Harden edit log reading code against out of memory errors
[ https://issues.apache.org/jira/browse/HDFS-8965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717372#comment-14717372 ] Colin Patrick McCabe commented on HDFS-8965: Thanks for the review. bq. A comment mentions a 1-byte txid, I think you meant opid. Having the expected format in the comment is super helpful too, thanks. Mind replicating it for the other two readers too? sure bq. Why move the temp array into a member? It'll probably be stack allocated and thus no GC pressure, 4KB should fit too. Hmm. I'm not sure if it will be stack allocated or not. The compiler would have to do some pretty fancy escape analysis since we're passing the array to many functions, any of which might hold a reference. It seems easier just to keep the buffer around, since we're using it a lot. bq. Great opportunity to fix the LayoutVersion.EDITS_CHESKUM typo ok bq. LengthPrefixedReader#scanOp, shouldn't this be basically readOp but without the second pass to readFields? Not sure why it doesn't validate the max edit size or the checksum. The TODO is relevant. Yeah, you're right... we should be validating checksums here. Fixed. It looks like this was actually a regression in edit log validation introduced by HDFS-6038. Previously, the JN did validate checksums when attempting to skip to a specific op, and after that change, it didn't. bq. Can we get a unit test for this situation? Yeah, let me add a unit test that scanOp validates checksums. It should be possible to do as a true unit test too. bq. Is some code sharing is possible among the three readOp's for getting the FSEditLogOp, this bit? I'm on the fence about this since I think it might make it harder to read since the control flow would be passing from base-derived-base again. Harden edit log reading code against out of memory errors - Key: HDFS-8965 URL: https://issues.apache.org/jira/browse/HDFS-8965 Project: Hadoop HDFS Issue Type: Improvement Affects Versions: 2.0.0-alpha Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Attachments: HDFS-8965.001.patch, HDFS-8965.002.patch, HDFS-8965.003.patch We should harden the edit log reading code against out of memory errors. Now that each op has a length prefix and a checksum, we can validate the checksum before trying to load the Op data. This should avoid out of memory errors when trying to load garbage data as Op data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8962) Clean up checkstyle warnings in o.a.h.hdfs.DfsClientConf
[ https://issues.apache.org/jira/browse/HDFS-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717413#comment-14717413 ] Hudson commented on HDFS-8962: -- FAILURE: Integrated in Hadoop-Hdfs-trunk #2241 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2241/]) HDFS-8962. Clean up checkstyle warnings in o.a.h.hdfs.DfsClientConf. Contributed by Mingliang Liu. (wheat9: rev 7e971b7315fa2942b4db7ba11ed513766957b777) * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDataTransferProtocol.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataXceiver.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/shortcircuit/TestShortCircuitCache.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/impl/DfsClientConf.java * hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/client/HdfsClientConfigKeys.java * hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderFactory.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/shortcircuit/TestShortCircuitLocalRead.java * hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestCachingStrategy.java * hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt Clean up checkstyle warnings in o.a.h.hdfs.DfsClientConf Key: HDFS-8962 URL: https://issues.apache.org/jira/browse/HDFS-8962 Project: Hadoop HDFS Issue Type: Sub-task Components: build Reporter: Mingliang Liu Assignee: Mingliang Liu Fix For: 2.8.0 Attachments: HDFS-8962.000.patch, HDFS-8962.001.patch, HDFS-8962.002.patch This is a follow up of HDFS-8803. HDFS-8803 exposes multiple checkstyles and whitespace warnings in the Jenkins run. These warnings should be fixed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8823) Move replication factor into individual blocks
[ https://issues.apache.org/jira/browse/HDFS-8823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717422#comment-14717422 ] Haohui Mai commented on HDFS-8823: -- It has been addressed in HDFS-8929. Thanks. Move replication factor into individual blocks -- Key: HDFS-8823 URL: https://issues.apache.org/jira/browse/HDFS-8823 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Fix For: 2.8.0 Attachments: HDFS-8823.000.patch, HDFS-8823.001.patch, HDFS-8823.002.patch, HDFS-8823.003.patch, HDFS-8823.004.patch, HDFS-8823.005.patch, HDFS-8823.006.patch This jira proposes to record the replication factor in the {{BlockInfo}} class. The changes have two advantages: * Decoupling the namespace and the block management layer. It is a prerequisite step to move block management off the heap or to a separate process. * Increased flexibility on replicating blocks. Currently the replication factors of all blocks have to be the same. The replication factors of these blocks are equal to the highest replication factor across all snapshots. The changes will allow blocks in a file to have different replication factor, potentially saving some space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8969) Clean up findbugs warnings for HDFS-8823 and HDFS-8932
[ https://issues.apache.org/jira/browse/HDFS-8969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haohui Mai updated HDFS-8969: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) I've committed the patch to trunk and branch-2. Thanks [~anu] for the contribution. Clean up findbugs warnings for HDFS-8823 and HDFS-8932 -- Key: HDFS-8969 URL: https://issues.apache.org/jira/browse/HDFS-8969 Project: Hadoop HDFS Issue Type: Bug Components: HDFS Affects Versions: 2.8.0 Reporter: Anu Engineer Assignee: Anu Engineer Fix For: 2.8.0 Attachments: HDFS-8969.001.patch Fix Findbug warnings -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HDFS-8981) Adding revision to data node jmx getVersion() method
Siqi Li created HDFS-8981: - Summary: Adding revision to data node jmx getVersion() method Key: HDFS-8981 URL: https://issues.apache.org/jira/browse/HDFS-8981 Project: Hadoop HDFS Issue Type: Bug Reporter: Siqi Li Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-328) fs -setrep should have better error message
[ https://issues.apache.org/jira/browse/HDFS-328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-328: -- Attachment: HDFS-328.001.patch Adding patch that includes a change to the BlockManager.verifyReplication() method as well as adding new -setrep tests. fs -setrep should have better error message - Key: HDFS-328 URL: https://issues.apache.org/jira/browse/HDFS-328 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Daniel Templeton Labels: newbie Attachments: HDFS-328.001.patch When the replication # is larger than dfs.replication.max (defined in conf), fs -setrep shows a meaningless error message. For example, {noformat} //dfs.replication.max is 512 $ hadoop fs -setrep 1000 r.txt setrep: java.io.IOException: file /user/tsz/r.txt. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8981) Adding revision to data node jmx getVersion() method
[ https://issues.apache.org/jira/browse/HDFS-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated HDFS-8981: -- Attachment: HDFS-8981.v1.patch Adding revision to data node jmx getVersion() method Key: HDFS-8981 URL: https://issues.apache.org/jira/browse/HDFS-8981 Project: Hadoop HDFS Issue Type: Bug Reporter: Siqi Li Priority: Minor Attachments: HDFS-8981.v1.patch to be consistent with namenode jmx, datanode jmx should also output revision number -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8981) Adding revision to data node jmx getVersion() method
[ https://issues.apache.org/jira/browse/HDFS-8981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siqi Li updated HDFS-8981: -- Assignee: Siqi Li Status: Patch Available (was: Open) Adding revision to data node jmx getVersion() method Key: HDFS-8981 URL: https://issues.apache.org/jira/browse/HDFS-8981 Project: Hadoop HDFS Issue Type: Bug Reporter: Siqi Li Assignee: Siqi Li Priority: Minor Attachments: HDFS-8981.v1.patch to be consistent with namenode jmx, datanode jmx should also output revision number -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8967) Create a BlockManagerLock class to represent the lock used in the BlockManager
[ https://issues.apache.org/jira/browse/HDFS-8967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717547#comment-14717547 ] Hadoop QA commented on HDFS-8967: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | pre-patch | 17m 31s | Pre-patch trunk has 4 extant Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | @author | 0m 0s | The patch does not contain any @author tags. | | {color:green}+1{color} | tests included | 0m 0s | The patch appears to include 3 new or modified test files. | | {color:green}+1{color} | javac | 7m 49s | There were no new javac warning messages. | | {color:green}+1{color} | javadoc | 10m 3s | There were no new javadoc warning messages. | | {color:green}+1{color} | release audit | 0m 22s | The applied patch does not increase the total number of release audit warnings. | | {color:red}-1{color} | checkstyle | 1m 21s | The applied patch generated 2 new checkstyle issues (total was 546, now 546). | | {color:red}-1{color} | whitespace | 0m 2s | The patch has 1 line(s) that end in whitespace. Use git apply --whitespace=fix. | | {color:green}+1{color} | install | 1m 30s | mvn install still works. | | {color:green}+1{color} | eclipse:eclipse | 0m 33s | The patch built with eclipse:eclipse. | | {color:green}+1{color} | findbugs | 2m 29s | The patch does not introduce any new Findbugs (version 3.0.0) warnings. | | {color:green}+1{color} | native | 3m 14s | Pre-build of native portion | | {color:red}-1{color} | hdfs tests | 159m 31s | Tests failed in hadoop-hdfs. | | | | 204m 29s | | \\ \\ || Reason || Tests || | Failed unit tests | hadoop.hdfs.server.namenode.TestFSNamesystem | | | hadoop.hdfs.TestRollingUpgrade | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752810/HDFS-8967.001.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / 90fe7bc | | Pre-patch Findbugs warnings | https://builds.apache.org/job/PreCommit-HDFS-Build/12168/artifact/patchprocess/trunkFindbugsWarningshadoop-hdfs.html | | checkstyle | https://builds.apache.org/job/PreCommit-HDFS-Build/12168/artifact/patchprocess/diffcheckstylehadoop-hdfs.txt | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/12168/artifact/patchprocess/whitespace.txt | | hadoop-hdfs test log | https://builds.apache.org/job/PreCommit-HDFS-Build/12168/artifact/patchprocess/testrun_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/12168/testReport/ | | Java | 1.7.0_55 | | uname | Linux asf904.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12168/console | This message was automatically generated. Create a BlockManagerLock class to represent the lock used in the BlockManager -- Key: HDFS-8967 URL: https://issues.apache.org/jira/browse/HDFS-8967 Project: Hadoop HDFS Issue Type: Sub-task Reporter: Haohui Mai Assignee: Haohui Mai Attachments: HDFS-8967.000.patch, HDFS-8967.001.patch This jira proposes to create a {{BlockManagerLock}} class to represent the lock used in {{BlockManager}}. Currently it directly points to the {{FSNamesystem}} lock thus there are no functionality changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8155) Support OAuth2 in WebHDFS
[ https://issues.apache.org/jira/browse/HDFS-8155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-8155: -- Attachment: HDFS-8155.003.patch New patch addressing checkstyle and ChrisN's review. bq. ConfRefreshTokenBasedAccessTokenProvider, ConfRefreshTokenBasedAccessTokenProvider: There are no timeouts specified in the calls to the refresh URL. Timeouts can be controlled by calling client.setConnectTimeout and client.setReadTimeout. Done. bq. AccessTokenProvider: Optional - consider extending Configured so that it inherits the implementations of getConf and setConf for free. Configured sets the conf as part of the constructor, which breaks the way ATP implementations sets its values. I kept it as Configurable just to avoid code churn. bq. WebHDFS.md: Typo: toekns instead of tokens Done. bq. Please address the javac and checkstyle warnings. Done. my local test patch is happy (at last) Support OAuth2 in WebHDFS - Key: HDFS-8155 URL: https://issues.apache.org/jira/browse/HDFS-8155 Project: Hadoop HDFS Issue Type: New Feature Components: webhdfs Reporter: Jakob Homan Assignee: Jakob Homan Attachments: HDFS-8155-1.patch, HDFS-8155.002.patch, HDFS-8155.003.patch WebHDFS should be able to accept OAuth2 credentials. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-328) fs -setrep should have better error message
[ https://issues.apache.org/jira/browse/HDFS-328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Templeton updated HDFS-328: -- Attachment: (was: HDFS-328.001.patch) fs -setrep should have better error message - Key: HDFS-328 URL: https://issues.apache.org/jira/browse/HDFS-328 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Reporter: Tsz Wo Nicholas Sze Assignee: Daniel Templeton Labels: newbie When the replication # is larger than dfs.replication.max (defined in conf), fs -setrep shows a meaningless error message. For example, {noformat} //dfs.replication.max is 512 $ hadoop fs -setrep 1000 r.txt setrep: java.io.IOException: file /user/tsz/r.txt. {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8155) Support OAuth2 in WebHDFS
[ https://issues.apache.org/jira/browse/HDFS-8155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-8155: -- Status: Patch Available (was: Open) Support OAuth2 in WebHDFS - Key: HDFS-8155 URL: https://issues.apache.org/jira/browse/HDFS-8155 Project: Hadoop HDFS Issue Type: New Feature Components: webhdfs Reporter: Jakob Homan Assignee: Jakob Homan Attachments: HDFS-8155-1.patch, HDFS-8155.002.patch, HDFS-8155.003.patch WebHDFS should be able to accept OAuth2 credentials. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HDFS-8938) Refactor BlockManager in blockmanagement
[ https://issues.apache.org/jira/browse/HDFS-8938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717533#comment-14717533 ] Hadoop QA commented on HDFS-8938: - \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12752818/HDFS-8938.006.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / a9c8ea7 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/12171/console | This message was automatically generated. Refactor BlockManager in blockmanagement Key: HDFS-8938 URL: https://issues.apache.org/jira/browse/HDFS-8938 Project: Hadoop HDFS Issue Type: Task Components: build Reporter: Mingliang Liu Assignee: Mingliang Liu Attachments: HDFS-8938.000.patch, HDFS-8938.001.patch, HDFS-8938.002.patch, HDFS-8938.003.patch, HDFS-8938.004.patch, HDFS-8938.005.patch, HDFS-8938.006.patch This jira tracks the effort of refactoring inner classes {{BlockManager$BlockToMarkCorrupt}} and {{BlockManager$ReplicationWork}} in {{hdfs.server.blockmanagement}} package. As the line number of {{BlockManager}} is getting larger than 2000, we can move those two inner classes out of the it. Meanwhile, the logic in method {{computeReplicationWorkForBlocks}} can be simplified if we extract code sections to _schedule replication_ and to _validate replication work_ to private helper methods respectively. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8155) Support OAuth2 in WebHDFS
[ https://issues.apache.org/jira/browse/HDFS-8155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-8155: -- Status: Open (was: Patch Available) Support OAuth2 in WebHDFS - Key: HDFS-8155 URL: https://issues.apache.org/jira/browse/HDFS-8155 Project: Hadoop HDFS Issue Type: New Feature Components: webhdfs Reporter: Jakob Homan Assignee: Jakob Homan Attachments: HDFS-8155-1.patch, HDFS-8155.002.patch, HDFS-8155.003.patch, HDFS-8155.004.patch WebHDFS should be able to accept OAuth2 credentials. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8155) Support OAuth2 in WebHDFS
[ https://issues.apache.org/jira/browse/HDFS-8155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakob Homan updated HDFS-8155: -- Status: Patch Available (was: Open) Support OAuth2 in WebHDFS - Key: HDFS-8155 URL: https://issues.apache.org/jira/browse/HDFS-8155 Project: Hadoop HDFS Issue Type: New Feature Components: webhdfs Reporter: Jakob Homan Assignee: Jakob Homan Attachments: HDFS-8155-1.patch, HDFS-8155.002.patch, HDFS-8155.003.patch, HDFS-8155.004.patch WebHDFS should be able to accept OAuth2 credentials. -- This message was sent by Atlassian JIRA (v6.3.4#6332)