[jira] [Updated] (HDFS-8901) Use ByteBuffer in striping positional read
[ https://issues.apache.org/jira/browse/HDFS-8901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SammiChen updated HDFS-8901: Attachment: HDFS-8901.v16.patch Fix one style checker reported issue. Another reported issue is a false alert. > Use ByteBuffer in striping positional read > -- > > Key: HDFS-8901 > URL: https://issues.apache.org/jira/browse/HDFS-8901 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Kai Zheng >Assignee: SammiChen > Attachments: HDFS-8901-v10.patch, HDFS-8901-v2.patch, > HDFS-8901-v3.patch, HDFS-8901-v4.patch, HDFS-8901-v5.patch, > HDFS-8901-v6.patch, HDFS-8901-v7.patch, HDFS-8901-v8.patch, > HDFS-8901-v9.patch, HDFS-8901.v11.patch, HDFS-8901.v12.patch, > HDFS-8901.v13.patch, HDFS-8901.v14.patch, HDFS-8901.v15.patch, > HDFS-8901.v16.patch, initial-poc.patch > > > Native erasure coder prefers to direct ByteBuffer for performance > consideration. To prepare for it, this change uses ByteBuffer through the > codes in implementing striping position read. It will also fix avoiding > unnecessary data copying between striping read chunk buffers and decode input > buffers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-4210) NameNode Format should not fail for DNS resolution on minority of JournalNode
[ https://issues.apache.org/jira/browse/HDFS-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Zhuge updated HDFS-4210: - Attachment: HDFS-4210.002.patch Patch 002: * Throw {{UnknownHostException}} earlier in {{getLoggerAddresses}} when a JN hostname can not be resolved * Use JUnit rule ExpectedException > NameNode Format should not fail for DNS resolution on minority of JournalNode > - > > Key: HDFS-4210 > URL: https://issues.apache.org/jira/browse/HDFS-4210 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, journal-node, namenode >Affects Versions: 2.6.0 >Reporter: Damien Hardy >Assignee: John Zhuge >Priority: Trivial > Labels: BB2015-05-TBR > Attachments: HDFS-4210.001.patch, HDFS-4210.002.patch > > > Setting : > qjournal://cdh4master01:8485;cdh4master02:8485;cdh4worker03:8485/hdfscluster > cdh4master01 and cdh4master02 JournalNode up and running, > cdh4worker03 not yet provisionning (no DNS entrie) > With : > `hadoop namenode -format` fails with : > 12/11/19 14:42:42 FATAL namenode.NameNode: Exception in namenode join > java.lang.IllegalArgumentException: Unable to construct journal, > qjournal://cdh4master01:8485;cdh4master02:8485;cdh4worker03:8485/hdfscluster > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1235) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournals(FSEditLog.java:226) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournalsForWrite(FSEditLog.java:193) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:745) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1099) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1233) > ... 5 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannelMetrics.getName(IPCLoggerChannelMetrics.java:107) > at > org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannelMetrics.create(IPCLoggerChannelMetrics.java:91) > at > org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.(IPCLoggerChannel.java:161) > at > org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$1.createLogger(IPCLoggerChannel.java:141) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:353) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:135) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.(QuorumJournalManager.java:104) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.(QuorumJournalManager.java:93) > ... 10 more > I suggest that if quorum is up format should not fails. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10807) Doc about upgrading to a version of HDFS with snapshots may be confusing
[ https://issues.apache.org/jira/browse/HDFS-10807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15444814#comment-15444814 ] Akira Ajisaka commented on HDFS-10807: -- LGTM, +1. > Doc about upgrading to a version of HDFS with snapshots may be confusing > > > Key: HDFS-10807 > URL: https://issues.apache.org/jira/browse/HDFS-10807 > Project: Hadoop HDFS > Issue Type: Improvement > Components: documentation >Reporter: Mingliang Liu >Assignee: Mingliang Liu >Priority: Minor > Attachments: HDFS-10807-branch-2.7.000.patch, HDFS-10807.000.patch > > > {code} > diff --git > a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsSnapshots.md > b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsSnapshots.md > index 94a37cd..d856e8c 100644 > --- a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsSnapshots.md > +++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsSnapshots.md > @@ -113,7 +113,7 @@ Upgrading to a version of HDFS with snapshots > > The HDFS snapshot feature introduces a new reserved path name used to > interact with snapshots: `.snapshot`. When upgrading from an > -older version of HDFS, existing paths named `.snapshot` need > +older version of HDFS which does not support snapshots, existing paths named > `.snapshot` need > to first be renamed or deleted to avoid conflicting with the reserved path. > See the upgrade section in > [the HDFS user guide](HdfsUserGuide.html#Upgrade_and_Rollback) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-9668) Optimize the locking in FsDatasetImpl
[ https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15444798#comment-15444798 ] Jingcheng Du edited comment on HDFS-9668 at 8/29/16 4:39 AM: - Upload a new patch V5 according to [~eddyxu]'s comments. Any of the comments are appreciated, thanks a lot. was (Author: jingcheng...@intel.com): Upload a new patch V5 according to [~eddyxu]'s comments. Any comments are welcome, thanks a lot. > Optimize the locking in FsDatasetImpl > - > > Key: HDFS-9668 > URL: https://issues.apache.org/jira/browse/HDFS-9668 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Jingcheng Du >Assignee: Jingcheng Du > Attachments: HDFS-9668-1.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, > HDFS-9668-4.patch, HDFS-9668-5.patch, execution_time.png > > > During the HBase test on a tiered storage of HDFS (WAL is stored in > SSD/RAMDISK, and all other files are stored in HDD), we observe many > long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part > of the jstack result: > {noformat} > "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at > /192.168.50.16:48521 [Receiving block > BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread > t@93336 >java.lang.Thread.State: BLOCKED > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:) > - waiting to lock <18324c9> (a > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by > "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at > /192.168.50.16:48520 [Receiving block > BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) > at java.lang.Thread.run(Thread.java:745) >Locked ownable synchronizers: > - None > > "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at > /192.168.50.16:48520 [Receiving block > BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread > t@93335 >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createFileExclusively(Native Method) > at java.io.File.createNewFile(File.java:1012) > at > org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140) > - locked <18324c9> (a > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) > at java.lang.Thread.run(Thread.java:745) >Locked ownable synchronizers: > - None > {noformat} > We measured the execution of some operations in FsDatasetImpl during the > test. Here following is the result. > !execution_time.png! > The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy > load take a really long time. > It means one slow operation of finalizeBlock, addBlock and createRbw in a > slow storage can block all the other same operations in the same DataNode, > especially in HBase when many wal/flusher/compactor are configured. > We need a finer grained lock mechanism in a new FsDatasetImpl implementation > and users can choose the implementation by configuring > "dfs.datanode.fsdataset.factory" in DataNode. > We can implement the lock by either storage level or block-level.
[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl
[ https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jingcheng Du updated HDFS-9668: --- Attachment: HDFS-9668-5.patch Upload a new patch V5 according to [~eddyxu]'s comments. Any comments are welcome, thanks a lot. > Optimize the locking in FsDatasetImpl > - > > Key: HDFS-9668 > URL: https://issues.apache.org/jira/browse/HDFS-9668 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Jingcheng Du >Assignee: Jingcheng Du > Attachments: HDFS-9668-1.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, > HDFS-9668-4.patch, HDFS-9668-5.patch, execution_time.png > > > During the HBase test on a tiered storage of HDFS (WAL is stored in > SSD/RAMDISK, and all other files are stored in HDD), we observe many > long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part > of the jstack result: > {noformat} > "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at > /192.168.50.16:48521 [Receiving block > BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread > t@93336 >java.lang.Thread.State: BLOCKED > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:) > - waiting to lock <18324c9> (a > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by > "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at > /192.168.50.16:48520 [Receiving block > BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335 > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) > at java.lang.Thread.run(Thread.java:745) >Locked ownable synchronizers: > - None > > "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at > /192.168.50.16:48520 [Receiving block > BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread > t@93335 >java.lang.Thread.State: RUNNABLE > at java.io.UnixFileSystem.createFileExclusively(Native Method) > at java.io.File.createNewFile(File.java:1012) > at > org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140) > - locked <18324c9> (a > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113) > at > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235) > at java.lang.Thread.run(Thread.java:745) >Locked ownable synchronizers: > - None > {noformat} > We measured the execution of some operations in FsDatasetImpl during the > test. Here following is the result. > !execution_time.png! > The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy > load take a really long time. > It means one slow operation of finalizeBlock, addBlock and createRbw in a > slow storage can block all the other same operations in the same DataNode, > especially in HBase when many wal/flusher/compactor are configured. > We need a finer grained lock mechanism in a new FsDatasetImpl implementation > and users can choose the implementation by configuring > "dfs.datanode.fsdataset.factory" in DataNode. > We can implement the lock by either storage level or block-level. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For
[jira] [Comment Edited] (HDFS-4210) NameNode Format should not fail for DNS resolution on minority of JournalNode
[ https://issues.apache.org/jira/browse/HDFS-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15444786#comment-15444786 ] John Zhuge edited comment on HDFS-4210 at 8/29/16 4:30 AM: --- Discovered that exception {{UnknownHostException}} already thrown and caught in the following call path earlier than the NPE call path listed in JIRA Description: {noformat} at org.apache.hadoop.net.NetUtils.createSocketAddrForHost(NetUtils.java:245) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:217) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.getLoggerAddresses(QuorumJournalManager.java:390) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:364) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:149) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.(QuorumJournalManager.java:116) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.(QuorumJournalManager.java:105) at org.apache.hadoop.hdfs.qjournal.client.TestQJMWithFaults.testUnresolvableHostNameFailsGracefully(TestQJMWithFaults.java:201) {noformat} {code:title='createSocketAddrForHost'} } catch (UnknownHostException e) { addr = InetSocketAddress.createUnresolved(host, port); } {code} {{createSocketAddrForHost}} swallows the UHE exception and creates an unsolved {{InetSocketAddress}}. Callers are supposed to check with {{isUnresolved}}. {{getLoggerAddresses}} is the earliest opportunity to throw UHE. was (Author: jzhuge): Discovered that exception {{UnknownHostException}} already thrown and caught in the earlier call path than the NPE call path listed in JIRA Description: {noformat} at org.apache.hadoop.net.NetUtils.createSocketAddrForHost(NetUtils.java:245) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:217) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.getLoggerAddresses(QuorumJournalManager.java:390) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:364) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:149) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.(QuorumJournalManager.java:116) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.(QuorumJournalManager.java:105) at org.apache.hadoop.hdfs.qjournal.client.TestQJMWithFaults.testUnresolvableHostNameFailsGracefully(TestQJMWithFaults.java:201) {noformat} {code:title='createSocketAddrForHost'} } catch (UnknownHostException e) { addr = InetSocketAddress.createUnresolved(host, port); } {code} {{createSocketAddrForHost}} swallows the UHE exception and creates an unsolved {{InetSocketAddress}}. Callers are supposed to check with {{isUnresolved}}. {{getLoggerAddresses}} is the earliest opportunity to throw UHE. > NameNode Format should not fail for DNS resolution on minority of JournalNode > - > > Key: HDFS-4210 > URL: https://issues.apache.org/jira/browse/HDFS-4210 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, journal-node, namenode >Affects Versions: 2.6.0 >Reporter: Damien Hardy >Assignee: John Zhuge >Priority: Trivial > Labels: BB2015-05-TBR > Attachments: HDFS-4210.001.patch > > > Setting : > qjournal://cdh4master01:8485;cdh4master02:8485;cdh4worker03:8485/hdfscluster > cdh4master01 and cdh4master02 JournalNode up and running, > cdh4worker03 not yet provisionning (no DNS entrie) > With : > `hadoop namenode -format` fails with : > 12/11/19 14:42:42 FATAL namenode.NameNode: Exception in namenode join > java.lang.IllegalArgumentException: Unable to construct journal, > qjournal://cdh4master01:8485;cdh4master02:8485;cdh4worker03:8485/hdfscluster > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1235) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournals(FSEditLog.java:226) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournalsForWrite(FSEditLog.java:193) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:745) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1099) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204) > Caused by:
[jira] [Commented] (HDFS-4210) NameNode Format should not fail for DNS resolution on minority of JournalNode
[ https://issues.apache.org/jira/browse/HDFS-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15444786#comment-15444786 ] John Zhuge commented on HDFS-4210: -- Discovered that exception {{UnknownHostException}} already thrown and caught in the earlier call path than the NPE call path listed in JIRA Description: {noformat} at org.apache.hadoop.net.NetUtils.createSocketAddrForHost(NetUtils.java:245) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:217) at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.getLoggerAddresses(QuorumJournalManager.java:390) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:364) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:149) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.(QuorumJournalManager.java:116) at org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.(QuorumJournalManager.java:105) at org.apache.hadoop.hdfs.qjournal.client.TestQJMWithFaults.testUnresolvableHostNameFailsGracefully(TestQJMWithFaults.java:201) {noformat} {code:title='createSocketAddrForHost'} } catch (UnknownHostException e) { addr = InetSocketAddress.createUnresolved(host, port); } {code} {{createSocketAddrForHost}} swallows the UHE exception and creates an unsolved {{InetSocketAddress}}. Callers are supposed to check with {{isUnresolved}}. {{getLoggerAddresses}} is the earliest opportunity to throw UHE. > NameNode Format should not fail for DNS resolution on minority of JournalNode > - > > Key: HDFS-4210 > URL: https://issues.apache.org/jira/browse/HDFS-4210 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, journal-node, namenode >Affects Versions: 2.6.0 >Reporter: Damien Hardy >Assignee: John Zhuge >Priority: Trivial > Labels: BB2015-05-TBR > Attachments: HDFS-4210.001.patch > > > Setting : > qjournal://cdh4master01:8485;cdh4master02:8485;cdh4worker03:8485/hdfscluster > cdh4master01 and cdh4master02 JournalNode up and running, > cdh4worker03 not yet provisionning (no DNS entrie) > With : > `hadoop namenode -format` fails with : > 12/11/19 14:42:42 FATAL namenode.NameNode: Exception in namenode join > java.lang.IllegalArgumentException: Unable to construct journal, > qjournal://cdh4master01:8485;cdh4master02:8485;cdh4worker03:8485/hdfscluster > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1235) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournals(FSEditLog.java:226) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournalsForWrite(FSEditLog.java:193) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:745) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1099) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1233) > ... 5 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannelMetrics.getName(IPCLoggerChannelMetrics.java:107) > at > org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannelMetrics.create(IPCLoggerChannelMetrics.java:91) > at > org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.(IPCLoggerChannel.java:161) > at > org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$1.createLogger(IPCLoggerChannel.java:141) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:353) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:135) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.(QuorumJournalManager.java:104) > at > org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.(QuorumJournalManager.java:93) > ... 10 more > I suggest that if quorum is up format should not fails. -- This message was sent by Atlassian
[jira] [Comment Edited] (HDFS-10794) [SPS]: Provide storage policy satisfy worker at DN for co-ordinating the block storage movement work
[ https://issues.apache.org/jira/browse/HDFS-10794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15444780#comment-15444780 ] Rakesh R edited comment on HDFS-10794 at 8/29/16 4:29 AM: -- Thanks [~umamaheswararao] for the clarification, I've uploaded the previous patch again by following the naming pattern. Could you please tell me, how to add {{HDFS-10285}} to the target versions so that we could mark this jira. was (Author: rakeshr): Thanks [~umamaheswararao] for the clarification, I've uploaded the previous patch again by following the naming pattern. > [SPS]: Provide storage policy satisfy worker at DN for co-ordinating the > block storage movement work > > > Key: HDFS-10794 > URL: https://issues.apache.org/jira/browse/HDFS-10794 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Rakesh R >Assignee: Rakesh R > Attachments: HDFS-10794-00.patch, HDFS-10794-HDFS-10285.00.patch > > > The idea of this jira is to implement a mechanism to move the blocks to the > given target in order to satisfy the block storage policy. Datanode receives > {{blocktomove}} details via heart beat response from NN. More specifically, > its a datanode side extension to handle the block storage movement commands. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10794) [SPS]: Provide storage policy satisfy worker at DN for co-ordinating the block storage movement work
[ https://issues.apache.org/jira/browse/HDFS-10794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-10794: Status: Patch Available (was: Open) > [SPS]: Provide storage policy satisfy worker at DN for co-ordinating the > block storage movement work > > > Key: HDFS-10794 > URL: https://issues.apache.org/jira/browse/HDFS-10794 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Rakesh R >Assignee: Rakesh R > Attachments: HDFS-10794-00.patch, HDFS-10794-HDFS-10285.00.patch > > > The idea of this jira is to implement a mechanism to move the blocks to the > given target in order to satisfy the block storage policy. Datanode receives > {{blocktomove}} details via heart beat response from NN. More specifically, > its a datanode side extension to handle the block storage movement commands. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10794) [SPS]: Provide storage policy satisfy worker at DN for co-ordinating the block storage movement work
[ https://issues.apache.org/jira/browse/HDFS-10794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15444780#comment-15444780 ] Rakesh R commented on HDFS-10794: - Thanks [~umamaheswararao] for the clarification, I've uploaded the previous patch again by following the naming pattern. > [SPS]: Provide storage policy satisfy worker at DN for co-ordinating the > block storage movement work > > > Key: HDFS-10794 > URL: https://issues.apache.org/jira/browse/HDFS-10794 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Rakesh R >Assignee: Rakesh R > Attachments: HDFS-10794-00.patch, HDFS-10794-HDFS-10285.00.patch > > > The idea of this jira is to implement a mechanism to move the blocks to the > given target in order to satisfy the block storage policy. Datanode receives > {{blocktomove}} details via heart beat response from NN. More specifically, > its a datanode side extension to handle the block storage movement commands. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10794) [SPS]: Provide storage policy satisfy worker at DN for co-ordinating the block storage movement work
[ https://issues.apache.org/jira/browse/HDFS-10794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh R updated HDFS-10794: Attachment: HDFS-10794-HDFS-10285.00.patch > [SPS]: Provide storage policy satisfy worker at DN for co-ordinating the > block storage movement work > > > Key: HDFS-10794 > URL: https://issues.apache.org/jira/browse/HDFS-10794 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode, namenode >Reporter: Rakesh R >Assignee: Rakesh R > Attachments: HDFS-10794-00.patch, HDFS-10794-HDFS-10285.00.patch > > > The idea of this jira is to implement a mechanism to move the blocks to the > given target in order to satisfy the block storage policy. Datanode receives > {{blocktomove}} details via heart beat response from NN. More specifically, > its a datanode side extension to handle the block storage movement commands. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-10682) Replace FsDatasetImpl object lock with a separate lock object
[ https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15444774#comment-15444774 ] Fenghua Hu edited comment on HDFS-10682 at 8/29/16 4:25 AM: In FsDatasetImpl#FsDatasetImpl() and FsDatasetImpl#addVolume(): volumeMap = new ReplicaMap(this); and ReplicaMap tempVolumeMap = new ReplicaMap(this); "this" is used as synchronization object: ReplicaMap(Object mutex) { if (mutex == null) { throw new HadoopIllegalArgumentException( "Object to synchronize on cannot be null"); } this.mutex = mutex; } ReplicaMap uses synchronized(mutex) {...} for synchronization. Do we need change it accordingly? [~vagarychen] [~arpitagarwal] was (Author: fenghua_hu): In FsDatasetImpl#FsDatasetImpl() and FsDatasetImpl#addVolume(): volumeMap = new ReplicaMap(this); and ReplicaMap tempVolumeMap = new ReplicaMap(this); "this" is used as synchronization object: 52 ReplicaMap(Object mutex) { 53 if (mutex == null) { 54 throw new HadoopIllegalArgumentException( 55 "Object to synchronize on cannot be null"); 56 } 57 this.mutex = mutex; ReplicaMap uses synchronized(mutex) {...} for synchronization. Do we need change it accordingly? [~vagarychen] [~arpitagarwal] > Replace FsDatasetImpl object lock with a separate lock object > - > > Key: HDFS-10682 > URL: https://issues.apache.org/jira/browse/HDFS-10682 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Chen Liang >Assignee: Chen Liang > Fix For: 2.8.0 > > Attachments: HDFS-10682-branch-2.001.patch, > HDFS-10682-branch-2.002.patch, HDFS-10682-branch-2.003.patch, > HDFS-10682-branch-2.004.patch, HDFS-10682-branch-2.005.patch, > HDFS-10682-branch-2.006.patch, HDFS-10682.001.patch, HDFS-10682.002.patch, > HDFS-10682.003.patch, HDFS-10682.004.patch, HDFS-10682.005.patch, > HDFS-10682.006.patch, HDFS-10682.007.patch, HDFS-10682.008.patch, > HDFS-10682.009.patch, HDFS-10682.010.patch > > > This Jira proposes to replace the FsDatasetImpl object lock with a separate > lock object. Doing so will make it easier to measure lock statistics like > lock held time and warn about potential lock contention due to slow disk > operations. > Right now we can use org.apache.hadoop.util.AutoCloseableLock. In the future > we can also consider replacing the lock with a read-write lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10682) Replace FsDatasetImpl object lock with a separate lock object
[ https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15444774#comment-15444774 ] Fenghua Hu commented on HDFS-10682: --- In FsDatasetImpl#FsDatasetImpl() and FsDatasetImpl#addVolume(): volumeMap = new ReplicaMap(this); and ReplicaMap tempVolumeMap = new ReplicaMap(this); "this" is used as synchronization object: 52 ReplicaMap(Object mutex) { 53 if (mutex == null) { 54 throw new HadoopIllegalArgumentException( 55 "Object to synchronize on cannot be null"); 56 } 57 this.mutex = mutex; ReplicaMap uses synchronized(mutex) {...} for synchronization. Do we need change it accordingly? [~vagarychen] [~arpitagarwal] > Replace FsDatasetImpl object lock with a separate lock object > - > > Key: HDFS-10682 > URL: https://issues.apache.org/jira/browse/HDFS-10682 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode >Reporter: Chen Liang >Assignee: Chen Liang > Fix For: 2.8.0 > > Attachments: HDFS-10682-branch-2.001.patch, > HDFS-10682-branch-2.002.patch, HDFS-10682-branch-2.003.patch, > HDFS-10682-branch-2.004.patch, HDFS-10682-branch-2.005.patch, > HDFS-10682-branch-2.006.patch, HDFS-10682.001.patch, HDFS-10682.002.patch, > HDFS-10682.003.patch, HDFS-10682.004.patch, HDFS-10682.005.patch, > HDFS-10682.006.patch, HDFS-10682.007.patch, HDFS-10682.008.patch, > HDFS-10682.009.patch, HDFS-10682.010.patch > > > This Jira proposes to replace the FsDatasetImpl object lock with a separate > lock object. Doing so will make it easier to measure lock statistics like > lock held time and warn about potential lock contention due to slow disk > operations. > Right now we can use org.apache.hadoop.util.AutoCloseableLock. In the future > we can also consider replacing the lock with a read-write lock. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10799) NameNode should use loginUser(hdfs) to serve iNotify requests
[ https://issues.apache.org/jira/browse/HDFS-10799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1524#comment-1524 ] Allen Wittenauer commented on HDFS-10799: - If the client is expired, we shouldn't be giving it answers at all. > NameNode should use loginUser(hdfs) to serve iNotify requests > - > > Key: HDFS-10799 > URL: https://issues.apache.org/jira/browse/HDFS-10799 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.6.0 > Environment: Kerberized, HA cluster, iNotify client, CDH5.7.0 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang > Attachments: HDFS-10799.001.patch > > > When a NameNode serves iNotify requests from a client, it verifies the client > has superuser permission and then uses the client's Kerberos principal to > read edits from journal nodes. > However, if the client does not renew its tgt tickets, the connection from > NameNode to journal nodes may fail. In which case, the NameNode thinks the > edits are corrupt, and prints a scary error message: > "During automatic edit log failover, we noticed that all of the remaining > edit log streams are shorter than the current one! The best remaining edit > log ends at transaction 11577603, but we thought we could read up to > transaction 11577606. If you continue, metadata will be lost forever!" > However, the edits are actually good. NameNode _should not freak out when an > iNotify client's tgt ticket expires_. > I think that an easy solution to this bug, is that after NameNode verifies > client has superuser permission, call {{SecurityUtil.doAsLoginUser}} and then > read edits. This will make sure the operation does not fail due to an expired > client ticket. > Excerpt of related logs: > {noformat} > 2016-08-18 19:05:13,979 WARN org.apache.hadoop.security.UserGroupInformation: > PriviledgedActionException as:h...@example.com (auth:KERBEROS) > cause:java.io.IOException: We encountered an error reading > http://jn1.example.com:8480/getJournal?jid=nameservice1=11577487=yyy, > > http://jn1.example.com:8480/getJournal?jid=nameservice1=11577487=yyy. > During automatic edit log failover, we noticed that all of the remaining > edit log streams are shorter than the current one! The best remaining edit > log ends at transaction 11577603, but we thought we could read up to > transaction 11577606. If you continue, metadata will be lost forever! > 2016-08-18 19:05:13,979 INFO org.apache.hadoop.ipc.Server: IPC Server handler > 112 on 8020, call > org.apache.hadoop.hdfs.protocol.ClientProtocol.getEditsFromTxid from [client > IP:port] Call#73 Retry#0 > java.io.IOException: We encountered an error reading > http://jn1.example.com:8480/getJournal?jid=nameservice1=11577487=yyy, > > http://jn1.example.com:8480/getJournal?jid=nameservice1=11577487=yyy. > During automatic edit log failover, we noticed that all of the remaining > edit log streams are shorter than the current one! The best remaining edit > log ends at transaction 11577603, but we thought we could read up to > transaction 11577606. If you continue, metadata will be lost forever! > at > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:213) > at > org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.readOp(NameNodeRpcServer.java:1674) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getEditsFromTxid(NameNodeRpcServer.java:1736) > at > org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getEditsFromTxid(AuthorizationProviderProxyClientProtocol.java:1010) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolServerSideTranslatorPB.java:1475) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080) > {noformat} -- This message was
[jira] [Updated] (HDFS-10652) Add a unit test for HDFS-4660
[ https://issues.apache.org/jira/browse/HDFS-10652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongjun Zhang updated HDFS-10652: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Committed to trunk, branch-2, branch-2.8. Thanks [~vinayrpet] and [~jojochuang]. > Add a unit test for HDFS-4660 > - > > Key: HDFS-10652 > URL: https://issues.apache.org/jira/browse/HDFS-10652 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs >Reporter: Yongjun Zhang >Assignee: Vinayakumar B > Fix For: 2.8.0 > > Attachments: HDFS-10652-002.patch, HDFS-10652.001.patch, > HDFS-10652.003.patch, HDFS-10652.004.patch, HDFS-10652.005.patch, > HDFS-10652.006.patch, HDFS-10652.007.patch, HDFS-10652.008.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10652) Add a unit test for HDFS-4660
[ https://issues.apache.org/jira/browse/HDFS-10652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15442881#comment-15442881 ] Hudson commented on HDFS-10652: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10363 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10363/]) HDFS-10652. Add a unit test for HDFS-4660. Contributed by Vinayakumar (yzhang: rev c25817159af17753b398956cfe6ff14984801b01) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeFaultInjector.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestClientProtocolForPipelineRecovery.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java > Add a unit test for HDFS-4660 > - > > Key: HDFS-10652 > URL: https://issues.apache.org/jira/browse/HDFS-10652 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, hdfs >Reporter: Yongjun Zhang >Assignee: Vinayakumar B > Attachments: HDFS-10652-002.patch, HDFS-10652.001.patch, > HDFS-10652.003.patch, HDFS-10652.004.patch, HDFS-10652.005.patch, > HDFS-10652.006.patch, HDFS-10652.007.patch, HDFS-10652.008.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10626) VolumeScanner prints incorrect IOException in reportBadBlocks operation
[ https://issues.apache.org/jira/browse/HDFS-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yiqun Lin updated HDFS-10626: - Resolution: Duplicate Status: Resolved (was: Patch Available) > VolumeScanner prints incorrect IOException in reportBadBlocks operation > --- > > Key: HDFS-10626 > URL: https://issues.apache.org/jira/browse/HDFS-10626 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Minor > Labels: supportability > Attachments: HDFS-10626.001.patch, HDFS-10626.002.patch, > HDFS-10626.003.patch, HDFS-10626.004.patch > > > VolumeScanner throws incorrect IOException in {{datanode.reportBadBlocks}}. > The related codes: > {code} > public void handle(ExtendedBlock block, IOException e) { > FsVolumeSpi volume = scanner.volume; > ... > try { > scanner.datanode.reportBadBlocks(block, volume); > } catch (IOException ie) { > // This is bad, but not bad enough to shut down the scanner. > LOG.warn("Cannot report bad " + block.getBlockId(), e); > } > } > {code} > The IOException that printed in the log should be {{ie}} rather than {{e}} > which was passed in method {{handle(ExtendedBlock block, IOException e)}}. > It will be a important info that can help us to know why datanode > reporBadBlocks failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-4660) Block corruption can happen during pipeline recovery
[ https://issues.apache.org/jira/browse/HDFS-4660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15442880#comment-15442880 ] Hudson commented on HDFS-4660: -- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10363 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10363/]) HDFS-10652. Add a unit test for HDFS-4660. Contributed by Vinayakumar (yzhang: rev c25817159af17753b398956cfe6ff14984801b01) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeFaultInjector.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestClientProtocolForPipelineRecovery.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java > Block corruption can happen during pipeline recovery > > > Key: HDFS-4660 > URL: https://issues.apache.org/jira/browse/HDFS-4660 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.0.3-alpha, 3.0.0-alpha1 >Reporter: Peng Zhang >Assignee: Kihwal Lee >Priority: Blocker > Fix For: 2.7.1, 2.6.4 > > Attachments: HDFS-4660.br26.patch, HDFS-4660.patch, HDFS-4660.patch, > HDFS-4660.v2.patch, periodic_hflush.patch > > > pipeline DN1 DN2 DN3 > stop DN2 > pipeline added node DN4 located at 2nd position > DN1 DN4 DN3 > recover RBW > DN4 after recover rbw > 2013-04-01 21:02:31,570 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover > RBW replica > BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1004 > 2013-04-01 21:02:31,570 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: > Recovering ReplicaBeingWritten, blk_-9076133543772600337_1004, RBW > getNumBytes() = 134144 > getBytesOnDisk() = 134144 > getVisibleLength()= 134144 > end at chunk (134144/512=262) > DN3 after recover rbw > 2013-04-01 21:02:31,575 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover > RBW replica > BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_10042013-04-01 > 21:02:31,575 INFO > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: > Recovering ReplicaBeingWritten, blk_-9076133543772600337_1004, RBW > getNumBytes() = 134028 > getBytesOnDisk() = 134028 > getVisibleLength()= 134028 > client send packet after recover pipeline > offset=133632 len=1008 > DN4 after flush > 2013-04-01 21:02:31,779 DEBUG > org.apache.hadoop.hdfs.server.datanode.DataNode: FlushOrsync, file > offset:134640; meta offset:1063 > // meta end position should be floor(134640/512)*4 + 7 == 1059, but now it is > 1063. > DN3 after flush > 2013-04-01 21:02:31,782 DEBUG > org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: > BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1005, > type=LAST_IN_PIPELINE, downstreams=0:[]: enqueue Packet(seqno=219, > lastPacketInBlock=false, offsetInBlock=134640, > ackEnqueueNanoTime=8817026136871545) > 2013-04-01 21:02:31,782 DEBUG > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Changing > meta file offset of block > BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1005 from > 1055 to 1051 > 2013-04-01 21:02:31,782 DEBUG > org.apache.hadoop.hdfs.server.datanode.DataNode: FlushOrsync, file > offset:134640; meta offset:1059 > After checking meta on DN4, I found checksum of chunk 262 is duplicated, but > data not. > Later after block was finalized, DN4's scanner detected bad block, and then > reported it to NM. NM send a command to delete this block, and replicate this > block from other DN in pipeline to satisfy duplication num. > I think this is because in BlockReceiver it skips data bytes already written, > but not skips checksum bytes already written. And function > adjustCrcFilePosition is only used for last non-completed chunk, but > not for this situation. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10626) VolumeScanner prints incorrect IOException in reportBadBlocks operation
[ https://issues.apache.org/jira/browse/HDFS-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15442878#comment-15442878 ] Yiqun Lin commented on HDFS-10626: -- Close this lira, duplicate to HDFS-10625. > VolumeScanner prints incorrect IOException in reportBadBlocks operation > --- > > Key: HDFS-10626 > URL: https://issues.apache.org/jira/browse/HDFS-10626 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Minor > Labels: supportability > Attachments: HDFS-10626.001.patch, HDFS-10626.002.patch, > HDFS-10626.003.patch, HDFS-10626.004.patch > > > VolumeScanner throws incorrect IOException in {{datanode.reportBadBlocks}}. > The related codes: > {code} > public void handle(ExtendedBlock block, IOException e) { > FsVolumeSpi volume = scanner.volume; > ... > try { > scanner.datanode.reportBadBlocks(block, volume); > } catch (IOException ie) { > // This is bad, but not bad enough to shut down the scanner. > LOG.warn("Cannot report bad " + block.getBlockId(), e); > } > } > {code} > The IOException that printed in the log should be {{ie}} rather than {{e}} > which was passed in method {{handle(ExtendedBlock block, IOException e)}}. > It will be a important info that can help us to know why datanode > reporBadBlocks failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-10626) VolumeScanner prints incorrect IOException in reportBadBlocks operation
[ https://issues.apache.org/jira/browse/HDFS-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15442878#comment-15442878 ] Yiqun Lin edited comment on HDFS-10626 at 8/28/16 6:20 AM: --- Close this jira, duplicate to HDFS-10625. was (Author: linyiqun): Close this lira, duplicate to HDFS-10625. > VolumeScanner prints incorrect IOException in reportBadBlocks operation > --- > > Key: HDFS-10626 > URL: https://issues.apache.org/jira/browse/HDFS-10626 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yiqun Lin >Assignee: Yiqun Lin >Priority: Minor > Labels: supportability > Attachments: HDFS-10626.001.patch, HDFS-10626.002.patch, > HDFS-10626.003.patch, HDFS-10626.004.patch > > > VolumeScanner throws incorrect IOException in {{datanode.reportBadBlocks}}. > The related codes: > {code} > public void handle(ExtendedBlock block, IOException e) { > FsVolumeSpi volume = scanner.volume; > ... > try { > scanner.datanode.reportBadBlocks(block, volume); > } catch (IOException ie) { > // This is bad, but not bad enough to shut down the scanner. > LOG.warn("Cannot report bad " + block.getBlockId(), e); > } > } > {code} > The IOException that printed in the log should be {{ie}} rather than {{e}} > which was passed in method {{handle(ExtendedBlock block, IOException e)}}. > It will be a important info that can help us to know why datanode > reporBadBlocks failed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org