[jira] [Updated] (HDFS-8901) Use ByteBuffer in striping positional read

2016-08-28 Thread SammiChen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-8901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SammiChen updated HDFS-8901:

Attachment: HDFS-8901.v16.patch

Fix one style checker reported issue. Another reported issue is a false alert. 

> Use ByteBuffer in striping positional read
> --
>
> Key: HDFS-8901
> URL: https://issues.apache.org/jira/browse/HDFS-8901
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Kai Zheng
>Assignee: SammiChen
> Attachments: HDFS-8901-v10.patch, HDFS-8901-v2.patch, 
> HDFS-8901-v3.patch, HDFS-8901-v4.patch, HDFS-8901-v5.patch, 
> HDFS-8901-v6.patch, HDFS-8901-v7.patch, HDFS-8901-v8.patch, 
> HDFS-8901-v9.patch, HDFS-8901.v11.patch, HDFS-8901.v12.patch, 
> HDFS-8901.v13.patch, HDFS-8901.v14.patch, HDFS-8901.v15.patch, 
> HDFS-8901.v16.patch, initial-poc.patch
>
>
> Native erasure coder prefers to direct ByteBuffer for performance 
> consideration. To prepare for it, this change uses ByteBuffer through the 
> codes in implementing striping position read. It will also fix avoiding 
> unnecessary data copying between striping read chunk buffers and decode input 
> buffers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-4210) NameNode Format should not fail for DNS resolution on minority of JournalNode

2016-08-28 Thread John Zhuge (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Zhuge updated HDFS-4210:
-
Attachment: HDFS-4210.002.patch

Patch 002:
* Throw {{UnknownHostException}} earlier in {{getLoggerAddresses}} when a JN 
hostname can not be resolved
* Use JUnit rule ExpectedException

> NameNode Format should not fail for DNS resolution on minority of JournalNode
> -
>
> Key: HDFS-4210
> URL: https://issues.apache.org/jira/browse/HDFS-4210
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, journal-node, namenode
>Affects Versions: 2.6.0
>Reporter: Damien Hardy
>Assignee: John Zhuge
>Priority: Trivial
>  Labels: BB2015-05-TBR
> Attachments: HDFS-4210.001.patch, HDFS-4210.002.patch
>
>
> Setting  : 
>   qjournal://cdh4master01:8485;cdh4master02:8485;cdh4worker03:8485/hdfscluster
>   cdh4master01 and cdh4master02 JournalNode up and running, 
>   cdh4worker03 not yet provisionning (no DNS entrie)
> With :
> `hadoop namenode -format` fails with :
>   12/11/19 14:42:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.IllegalArgumentException: Unable to construct journal, 
> qjournal://cdh4master01:8485;cdh4master02:8485;cdh4worker03:8485/hdfscluster
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1235)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournals(FSEditLog.java:226)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournalsForWrite(FSEditLog.java:193)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:745)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1099)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1233)
>   ... 5 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannelMetrics.getName(IPCLoggerChannelMetrics.java:107)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannelMetrics.create(IPCLoggerChannelMetrics.java:91)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.(IPCLoggerChannel.java:161)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$1.createLogger(IPCLoggerChannel.java:141)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:353)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:135)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.(QuorumJournalManager.java:104)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.(QuorumJournalManager.java:93)
>   ... 10 more
> I suggest that if quorum is up format should not fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10807) Doc about upgrading to a version of HDFS with snapshots may be confusing

2016-08-28 Thread Akira Ajisaka (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15444814#comment-15444814
 ] 

Akira Ajisaka commented on HDFS-10807:
--

LGTM, +1.

> Doc about upgrading to a version of HDFS with snapshots may be confusing
> 
>
> Key: HDFS-10807
> URL: https://issues.apache.org/jira/browse/HDFS-10807
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Mingliang Liu
>Assignee: Mingliang Liu
>Priority: Minor
> Attachments: HDFS-10807-branch-2.7.000.patch, HDFS-10807.000.patch
>
>
> {code}
> diff --git 
> a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsSnapshots.md 
> b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsSnapshots.md
> index 94a37cd..d856e8c 100644
> --- a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsSnapshots.md
> +++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsSnapshots.md
> @@ -113,7 +113,7 @@ Upgrading to a version of HDFS with snapshots
>  
>  The HDFS snapshot feature introduces a new reserved path name used to
>  interact with snapshots: `.snapshot`. When upgrading from an
> -older version of HDFS, existing paths named `.snapshot` need
> +older version of HDFS which does not support snapshots, existing paths named 
> `.snapshot` need
>  to first be renamed or deleted to avoid conflicting with the reserved path.
>  See the upgrade section in
>  [the HDFS user guide](HdfsUserGuide.html#Upgrade_and_Rollback)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-08-28 Thread Jingcheng Du (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15444798#comment-15444798
 ] 

Jingcheng Du edited comment on HDFS-9668 at 8/29/16 4:39 AM:
-

Upload a new patch V5 according to [~eddyxu]'s comments.
Any of the comments are appreciated, thanks a lot.


was (Author: jingcheng...@intel.com):
Upload a new patch V5 according to [~eddyxu]'s comments.
Any comments are welcome, thanks a lot.

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, 
> HDFS-9668-4.patch, HDFS-9668-5.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or block-level.




[jira] [Updated] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-08-28 Thread Jingcheng Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jingcheng Du updated HDFS-9668:
---
Attachment: HDFS-9668-5.patch

Upload a new patch V5 according to [~eddyxu]'s comments.
Any comments are welcome, thanks a lot.

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, 
> HDFS-9668-4.patch, HDFS-9668-5.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or block-level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For 

[jira] [Comment Edited] (HDFS-4210) NameNode Format should not fail for DNS resolution on minority of JournalNode

2016-08-28 Thread John Zhuge (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15444786#comment-15444786
 ] 

John Zhuge edited comment on HDFS-4210 at 8/29/16 4:30 AM:
---

Discovered that exception {{UnknownHostException}} already thrown and caught in 
the following call path earlier than the NPE call path listed in JIRA 
Description:
{noformat}
  at 
org.apache.hadoop.net.NetUtils.createSocketAddrForHost(NetUtils.java:245)
  at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:217)
  at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164)
  at 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.getLoggerAddresses(QuorumJournalManager.java:390)
  at 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:364)
  at 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:149)
  at 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.(QuorumJournalManager.java:116)
  at 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.(QuorumJournalManager.java:105)
  at 
org.apache.hadoop.hdfs.qjournal.client.TestQJMWithFaults.testUnresolvableHostNameFailsGracefully(TestQJMWithFaults.java:201)
{noformat}

{code:title='createSocketAddrForHost'}
} catch (UnknownHostException e) {
  addr = InetSocketAddress.createUnresolved(host, port);
}
{code}
{{createSocketAddrForHost}} swallows the UHE exception and creates an unsolved 
{{InetSocketAddress}}. Callers are supposed to check with {{isUnresolved}}.

{{getLoggerAddresses}} is the earliest opportunity to throw UHE.


was (Author: jzhuge):
Discovered that exception {{UnknownHostException}} already thrown and caught in 
the earlier call path than the NPE call path listed in JIRA Description:
{noformat}
  at 
org.apache.hadoop.net.NetUtils.createSocketAddrForHost(NetUtils.java:245)
  at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:217)
  at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164)
  at 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.getLoggerAddresses(QuorumJournalManager.java:390)
  at 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:364)
  at 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:149)
  at 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.(QuorumJournalManager.java:116)
  at 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.(QuorumJournalManager.java:105)
  at 
org.apache.hadoop.hdfs.qjournal.client.TestQJMWithFaults.testUnresolvableHostNameFailsGracefully(TestQJMWithFaults.java:201)
{noformat}

{code:title='createSocketAddrForHost'}
} catch (UnknownHostException e) {
  addr = InetSocketAddress.createUnresolved(host, port);
}
{code}
{{createSocketAddrForHost}} swallows the UHE exception and creates an unsolved 
{{InetSocketAddress}}. Callers are supposed to check with {{isUnresolved}}.

{{getLoggerAddresses}} is the earliest opportunity to throw UHE.

> NameNode Format should not fail for DNS resolution on minority of JournalNode
> -
>
> Key: HDFS-4210
> URL: https://issues.apache.org/jira/browse/HDFS-4210
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, journal-node, namenode
>Affects Versions: 2.6.0
>Reporter: Damien Hardy
>Assignee: John Zhuge
>Priority: Trivial
>  Labels: BB2015-05-TBR
> Attachments: HDFS-4210.001.patch
>
>
> Setting  : 
>   qjournal://cdh4master01:8485;cdh4master02:8485;cdh4worker03:8485/hdfscluster
>   cdh4master01 and cdh4master02 JournalNode up and running, 
>   cdh4worker03 not yet provisionning (no DNS entrie)
> With :
> `hadoop namenode -format` fails with :
>   12/11/19 14:42:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.IllegalArgumentException: Unable to construct journal, 
> qjournal://cdh4master01:8485;cdh4master02:8485;cdh4worker03:8485/hdfscluster
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1235)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournals(FSEditLog.java:226)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournalsForWrite(FSEditLog.java:193)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:745)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1099)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)
> Caused by: 

[jira] [Commented] (HDFS-4210) NameNode Format should not fail for DNS resolution on minority of JournalNode

2016-08-28 Thread John Zhuge (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15444786#comment-15444786
 ] 

John Zhuge commented on HDFS-4210:
--

Discovered that exception {{UnknownHostException}} already thrown and caught in 
the earlier call path than the NPE call path listed in JIRA Description:
{noformat}
  at 
org.apache.hadoop.net.NetUtils.createSocketAddrForHost(NetUtils.java:245)
  at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:217)
  at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:164)
  at 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.getLoggerAddresses(QuorumJournalManager.java:390)
  at 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:364)
  at 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:149)
  at 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.(QuorumJournalManager.java:116)
  at 
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.(QuorumJournalManager.java:105)
  at 
org.apache.hadoop.hdfs.qjournal.client.TestQJMWithFaults.testUnresolvableHostNameFailsGracefully(TestQJMWithFaults.java:201)
{noformat}

{code:title='createSocketAddrForHost'}
} catch (UnknownHostException e) {
  addr = InetSocketAddress.createUnresolved(host, port);
}
{code}
{{createSocketAddrForHost}} swallows the UHE exception and creates an unsolved 
{{InetSocketAddress}}. Callers are supposed to check with {{isUnresolved}}.

{{getLoggerAddresses}} is the earliest opportunity to throw UHE.

> NameNode Format should not fail for DNS resolution on minority of JournalNode
> -
>
> Key: HDFS-4210
> URL: https://issues.apache.org/jira/browse/HDFS-4210
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha, journal-node, namenode
>Affects Versions: 2.6.0
>Reporter: Damien Hardy
>Assignee: John Zhuge
>Priority: Trivial
>  Labels: BB2015-05-TBR
> Attachments: HDFS-4210.001.patch
>
>
> Setting  : 
>   qjournal://cdh4master01:8485;cdh4master02:8485;cdh4worker03:8485/hdfscluster
>   cdh4master01 and cdh4master02 JournalNode up and running, 
>   cdh4worker03 not yet provisionning (no DNS entrie)
> With :
> `hadoop namenode -format` fails with :
>   12/11/19 14:42:42 FATAL namenode.NameNode: Exception in namenode join
> java.lang.IllegalArgumentException: Unable to construct journal, 
> qjournal://cdh4master01:8485;cdh4master02:8485;cdh4worker03:8485/hdfscluster
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1235)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournals(FSEditLog.java:226)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.initJournalsForWrite(FSEditLog.java:193)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:745)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1099)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.createJournal(FSEditLog.java:1233)
>   ... 5 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannelMetrics.getName(IPCLoggerChannelMetrics.java:107)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannelMetrics.create(IPCLoggerChannelMetrics.java:91)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel.(IPCLoggerChannel.java:161)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$1.createLogger(IPCLoggerChannel.java:141)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:353)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.createLoggers(QuorumJournalManager.java:135)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.(QuorumJournalManager.java:104)
>   at 
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.(QuorumJournalManager.java:93)
>   ... 10 more
> I suggest that if quorum is up format should not fails.



--
This message was sent by Atlassian 

[jira] [Comment Edited] (HDFS-10794) [SPS]: Provide storage policy satisfy worker at DN for co-ordinating the block storage movement work

2016-08-28 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15444780#comment-15444780
 ] 

Rakesh R edited comment on HDFS-10794 at 8/29/16 4:29 AM:
--

Thanks [~umamaheswararao] for the clarification, I've uploaded the previous 
patch again by following the naming pattern.

Could you please tell me, how to add {{HDFS-10285}} to the target versions so 
that we could mark this jira.


was (Author: rakeshr):
Thanks [~umamaheswararao] for the clarification, I've uploaded the previous 
patch again by following the naming pattern.

> [SPS]: Provide storage policy satisfy worker at DN for co-ordinating the 
> block storage movement work
> 
>
> Key: HDFS-10794
> URL: https://issues.apache.org/jira/browse/HDFS-10794
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-10794-00.patch, HDFS-10794-HDFS-10285.00.patch
>
>
> The idea of this jira is to implement a mechanism to move the blocks to the 
> given target in order to satisfy the block storage policy. Datanode receives 
> {{blocktomove}} details via heart beat response from NN. More specifically, 
> its a datanode side extension to handle the block storage movement commands.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10794) [SPS]: Provide storage policy satisfy worker at DN for co-ordinating the block storage movement work

2016-08-28 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-10794:

Status: Patch Available  (was: Open)

> [SPS]: Provide storage policy satisfy worker at DN for co-ordinating the 
> block storage movement work
> 
>
> Key: HDFS-10794
> URL: https://issues.apache.org/jira/browse/HDFS-10794
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-10794-00.patch, HDFS-10794-HDFS-10285.00.patch
>
>
> The idea of this jira is to implement a mechanism to move the blocks to the 
> given target in order to satisfy the block storage policy. Datanode receives 
> {{blocktomove}} details via heart beat response from NN. More specifically, 
> its a datanode side extension to handle the block storage movement commands.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10794) [SPS]: Provide storage policy satisfy worker at DN for co-ordinating the block storage movement work

2016-08-28 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15444780#comment-15444780
 ] 

Rakesh R commented on HDFS-10794:
-

Thanks [~umamaheswararao] for the clarification, I've uploaded the previous 
patch again by following the naming pattern.

> [SPS]: Provide storage policy satisfy worker at DN for co-ordinating the 
> block storage movement work
> 
>
> Key: HDFS-10794
> URL: https://issues.apache.org/jira/browse/HDFS-10794
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-10794-00.patch, HDFS-10794-HDFS-10285.00.patch
>
>
> The idea of this jira is to implement a mechanism to move the blocks to the 
> given target in order to satisfy the block storage policy. Datanode receives 
> {{blocktomove}} details via heart beat response from NN. More specifically, 
> its a datanode side extension to handle the block storage movement commands.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10794) [SPS]: Provide storage policy satisfy worker at DN for co-ordinating the block storage movement work

2016-08-28 Thread Rakesh R (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh R updated HDFS-10794:

Attachment: HDFS-10794-HDFS-10285.00.patch

> [SPS]: Provide storage policy satisfy worker at DN for co-ordinating the 
> block storage movement work
> 
>
> Key: HDFS-10794
> URL: https://issues.apache.org/jira/browse/HDFS-10794
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode, namenode
>Reporter: Rakesh R
>Assignee: Rakesh R
> Attachments: HDFS-10794-00.patch, HDFS-10794-HDFS-10285.00.patch
>
>
> The idea of this jira is to implement a mechanism to move the blocks to the 
> given target in order to satisfy the block storage policy. Datanode receives 
> {{blocktomove}} details via heart beat response from NN. More specifically, 
> its a datanode side extension to handle the block storage movement commands.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10682) Replace FsDatasetImpl object lock with a separate lock object

2016-08-28 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15444774#comment-15444774
 ] 

Fenghua Hu edited comment on HDFS-10682 at 8/29/16 4:25 AM:


In FsDatasetImpl#FsDatasetImpl() and FsDatasetImpl#addVolume():
volumeMap = new ReplicaMap(this);
and 
ReplicaMap tempVolumeMap = new ReplicaMap(this);

"this" is used as synchronization object:

 ReplicaMap(Object mutex) {
  if (mutex == null) {
throw new HadoopIllegalArgumentException(
"Object to synchronize on cannot be null");
  }
  this.mutex = mutex;
  }

ReplicaMap uses synchronized(mutex) {...} for synchronization. Do we need 
change it accordingly?
[~vagarychen] [~arpitagarwal]



was (Author: fenghua_hu):
In FsDatasetImpl#FsDatasetImpl() and FsDatasetImpl#addVolume():
volumeMap = new ReplicaMap(this);
and 
ReplicaMap tempVolumeMap = new ReplicaMap(this);

"this" is used as synchronization object:

 52   ReplicaMap(Object mutex) {
 53 if (mutex == null) {
 54   throw new HadoopIllegalArgumentException(
 55   "Object to synchronize on cannot be null");
 56 }
 57 this.mutex = mutex;

ReplicaMap uses synchronized(mutex) {...} for synchronization. Do we need 
change it accordingly?
[~vagarychen] [~arpitagarwal]


> Replace FsDatasetImpl object lock with a separate lock object
> -
>
> Key: HDFS-10682
> URL: https://issues.apache.org/jira/browse/HDFS-10682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Fix For: 2.8.0
>
> Attachments: HDFS-10682-branch-2.001.patch, 
> HDFS-10682-branch-2.002.patch, HDFS-10682-branch-2.003.patch, 
> HDFS-10682-branch-2.004.patch, HDFS-10682-branch-2.005.patch, 
> HDFS-10682-branch-2.006.patch, HDFS-10682.001.patch, HDFS-10682.002.patch, 
> HDFS-10682.003.patch, HDFS-10682.004.patch, HDFS-10682.005.patch, 
> HDFS-10682.006.patch, HDFS-10682.007.patch, HDFS-10682.008.patch, 
> HDFS-10682.009.patch, HDFS-10682.010.patch
>
>
> This Jira proposes to replace the FsDatasetImpl object lock with a separate 
> lock object. Doing so will make it easier to measure lock statistics like 
> lock held time and warn about potential lock contention due to slow disk 
> operations.
> Right now we can use org.apache.hadoop.util.AutoCloseableLock. In the future 
> we can also consider replacing the lock with a read-write lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10682) Replace FsDatasetImpl object lock with a separate lock object

2016-08-28 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15444774#comment-15444774
 ] 

Fenghua Hu commented on HDFS-10682:
---

In FsDatasetImpl#FsDatasetImpl() and FsDatasetImpl#addVolume():
volumeMap = new ReplicaMap(this);
and 
ReplicaMap tempVolumeMap = new ReplicaMap(this);

"this" is used as synchronization object:

 52   ReplicaMap(Object mutex) {
 53 if (mutex == null) {
 54   throw new HadoopIllegalArgumentException(
 55   "Object to synchronize on cannot be null");
 56 }
 57 this.mutex = mutex;

ReplicaMap uses synchronized(mutex) {...} for synchronization. Do we need 
change it accordingly?
[~vagarychen] [~arpitagarwal]


> Replace FsDatasetImpl object lock with a separate lock object
> -
>
> Key: HDFS-10682
> URL: https://issues.apache.org/jira/browse/HDFS-10682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Fix For: 2.8.0
>
> Attachments: HDFS-10682-branch-2.001.patch, 
> HDFS-10682-branch-2.002.patch, HDFS-10682-branch-2.003.patch, 
> HDFS-10682-branch-2.004.patch, HDFS-10682-branch-2.005.patch, 
> HDFS-10682-branch-2.006.patch, HDFS-10682.001.patch, HDFS-10682.002.patch, 
> HDFS-10682.003.patch, HDFS-10682.004.patch, HDFS-10682.005.patch, 
> HDFS-10682.006.patch, HDFS-10682.007.patch, HDFS-10682.008.patch, 
> HDFS-10682.009.patch, HDFS-10682.010.patch
>
>
> This Jira proposes to replace the FsDatasetImpl object lock with a separate 
> lock object. Doing so will make it easier to measure lock statistics like 
> lock held time and warn about potential lock contention due to slow disk 
> operations.
> Right now we can use org.apache.hadoop.util.AutoCloseableLock. In the future 
> we can also consider replacing the lock with a read-write lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10799) NameNode should use loginUser(hdfs) to serve iNotify requests

2016-08-28 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1524#comment-1524
 ] 

Allen Wittenauer commented on HDFS-10799:
-

If the client is expired, we shouldn't be giving it answers at all.

> NameNode should use loginUser(hdfs) to serve iNotify requests
> -
>
> Key: HDFS-10799
> URL: https://issues.apache.org/jira/browse/HDFS-10799
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.6.0
> Environment: Kerberized, HA cluster, iNotify client, CDH5.7.0
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
> Attachments: HDFS-10799.001.patch
>
>
> When a NameNode serves iNotify requests from a client, it verifies the client 
> has superuser permission and then uses the client's Kerberos principal to 
> read edits from journal nodes.
> However, if the client does not renew its tgt tickets, the connection from 
> NameNode to journal nodes may fail. In which case, the NameNode thinks the 
> edits are corrupt, and prints a scary error message:
> "During automatic edit log failover, we noticed that all of the remaining 
> edit log streams are shorter than the current one!  The best remaining edit 
> log ends at transaction 11577603, but we thought we could read up to 
> transaction 11577606.  If you continue, metadata will be lost forever!"
> However, the edits are actually good. NameNode _should not freak out when an 
> iNotify client's tgt ticket expires_.
> I think that an easy solution to this bug, is that after NameNode verifies 
> client has superuser permission, call {{SecurityUtil.doAsLoginUser}} and then 
> read edits. This will make sure the operation does not fail due to an expired 
> client ticket.
> Excerpt of related logs:
> {noformat}
> 2016-08-18 19:05:13,979 WARN org.apache.hadoop.security.UserGroupInformation: 
> PriviledgedActionException as:h...@example.com (auth:KERBEROS) 
> cause:java.io.IOException: We encountered an error reading 
> http://jn1.example.com:8480/getJournal?jid=nameservice1=11577487=yyy,
>  
> http://jn1.example.com:8480/getJournal?jid=nameservice1=11577487=yyy.
>   During automatic edit log failover, we noticed that all of the remaining 
> edit log streams are shorter than the current one!  The best remaining edit 
> log ends at transaction 11577603, but we thought we could read up to 
> transaction 11577606.  If you continue, metadata will be lost forever!
> 2016-08-18 19:05:13,979 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 112 on 8020, call 
> org.apache.hadoop.hdfs.protocol.ClientProtocol.getEditsFromTxid from [client 
> IP:port] Call#73 Retry#0
> java.io.IOException: We encountered an error reading 
> http://jn1.example.com:8480/getJournal?jid=nameservice1=11577487=yyy,
>  
> http://jn1.example.com:8480/getJournal?jid=nameservice1=11577487=yyy.
>   During automatic edit log failover, we noticed that all of the remaining 
> edit log streams are shorter than the current one!  The best remaining edit 
> log ends at transaction 11577603, but we thought we could read up to 
> transaction 11577606.  If you continue, metadata will be lost forever!
> at 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:213)
> at 
> org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:85)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.readOp(NameNodeRpcServer.java:1674)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getEditsFromTxid(NameNodeRpcServer.java:1736)
> at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getEditsFromTxid(AuthorizationProviderProxyClientProtocol.java:1010)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolServerSideTranslatorPB.java:1475)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
> {noformat}



--
This message was 

[jira] [Updated] (HDFS-10652) Add a unit test for HDFS-4660

2016-08-28 Thread Yongjun Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated HDFS-10652:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.8.0
   Status: Resolved  (was: Patch Available)

Committed to trunk, branch-2, branch-2.8.

Thanks [~vinayrpet] and [~jojochuang].


> Add a unit test for HDFS-4660
> -
>
> Key: HDFS-10652
> URL: https://issues.apache.org/jira/browse/HDFS-10652
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Reporter: Yongjun Zhang
>Assignee: Vinayakumar B
> Fix For: 2.8.0
>
> Attachments: HDFS-10652-002.patch, HDFS-10652.001.patch, 
> HDFS-10652.003.patch, HDFS-10652.004.patch, HDFS-10652.005.patch, 
> HDFS-10652.006.patch, HDFS-10652.007.patch, HDFS-10652.008.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10652) Add a unit test for HDFS-4660

2016-08-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15442881#comment-15442881
 ] 

Hudson commented on HDFS-10652:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10363 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10363/])
HDFS-10652. Add a unit test for HDFS-4660. Contributed by Vinayakumar (yzhang: 
rev c25817159af17753b398956cfe6ff14984801b01)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeFaultInjector.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestClientProtocolForPipelineRecovery.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java


> Add a unit test for HDFS-4660
> -
>
> Key: HDFS-10652
> URL: https://issues.apache.org/jira/browse/HDFS-10652
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, hdfs
>Reporter: Yongjun Zhang
>Assignee: Vinayakumar B
> Attachments: HDFS-10652-002.patch, HDFS-10652.001.patch, 
> HDFS-10652.003.patch, HDFS-10652.004.patch, HDFS-10652.005.patch, 
> HDFS-10652.006.patch, HDFS-10652.007.patch, HDFS-10652.008.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10626) VolumeScanner prints incorrect IOException in reportBadBlocks operation

2016-08-28 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-10626:
-
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

> VolumeScanner prints incorrect IOException in reportBadBlocks operation
> ---
>
> Key: HDFS-10626
> URL: https://issues.apache.org/jira/browse/HDFS-10626
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
>  Labels: supportability
> Attachments: HDFS-10626.001.patch, HDFS-10626.002.patch, 
> HDFS-10626.003.patch, HDFS-10626.004.patch
>
>
> VolumeScanner throws incorrect IOException in {{datanode.reportBadBlocks}}. 
> The related codes:
> {code}
> public void handle(ExtendedBlock block, IOException e) {
>   FsVolumeSpi volume = scanner.volume;
>   ...
>   try {
> scanner.datanode.reportBadBlocks(block, volume);
>   } catch (IOException ie) {
> // This is bad, but not bad enough to shut down the scanner.
> LOG.warn("Cannot report bad " + block.getBlockId(), e);
>   }
> }
> {code}
> The IOException that printed in the log should be {{ie}} rather than {{e}} 
> which was passed in method {{handle(ExtendedBlock block, IOException e)}}.
> It will be a important info that can help us to know why datanode 
> reporBadBlocks failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-4660) Block corruption can happen during pipeline recovery

2016-08-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-4660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15442880#comment-15442880
 ] 

Hudson commented on HDFS-4660:
--

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10363 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10363/])
HDFS-10652. Add a unit test for HDFS-4660. Contributed by Vinayakumar (yzhang: 
rev c25817159af17753b398956cfe6ff14984801b01)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNodeFaultInjector.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestClientProtocolForPipelineRecovery.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockReceiver.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java


> Block corruption can happen during pipeline recovery
> 
>
> Key: HDFS-4660
> URL: https://issues.apache.org/jira/browse/HDFS-4660
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 2.0.3-alpha, 3.0.0-alpha1
>Reporter: Peng Zhang
>Assignee: Kihwal Lee
>Priority: Blocker
> Fix For: 2.7.1, 2.6.4
>
> Attachments: HDFS-4660.br26.patch, HDFS-4660.patch, HDFS-4660.patch, 
> HDFS-4660.v2.patch, periodic_hflush.patch
>
>
> pipeline DN1  DN2  DN3
> stop DN2
> pipeline added node DN4 located at 2nd position
> DN1  DN4  DN3
> recover RBW
> DN4 after recover rbw
> 2013-04-01 21:02:31,570 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover 
> RBW replica 
> BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1004
> 2013-04-01 21:02:31,570 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> Recovering ReplicaBeingWritten, blk_-9076133543772600337_1004, RBW
>   getNumBytes() = 134144
>   getBytesOnDisk() = 134144
>   getVisibleLength()= 134144
> end at chunk (134144/512=262)
> DN3 after recover rbw
> 2013-04-01 21:02:31,575 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Recover 
> RBW replica 
> BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_10042013-04-01
>  21:02:31,575 INFO 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: 
> Recovering ReplicaBeingWritten, blk_-9076133543772600337_1004, RBW
>   getNumBytes() = 134028 
>   getBytesOnDisk() = 134028
>   getVisibleLength()= 134028
> client send packet after recover pipeline
> offset=133632  len=1008
> DN4 after flush 
> 2013-04-01 21:02:31,779 DEBUG 
> org.apache.hadoop.hdfs.server.datanode.DataNode: FlushOrsync, file 
> offset:134640; meta offset:1063
> // meta end position should be floor(134640/512)*4 + 7 == 1059, but now it is 
> 1063.
> DN3 after flush
> 2013-04-01 21:02:31,782 DEBUG 
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: 
> BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1005, 
> type=LAST_IN_PIPELINE, downstreams=0:[]: enqueue Packet(seqno=219, 
> lastPacketInBlock=false, offsetInBlock=134640, 
> ackEnqueueNanoTime=8817026136871545)
> 2013-04-01 21:02:31,782 DEBUG 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Changing 
> meta file offset of block 
> BP-325305253-10.2.201.14-1364820083462:blk_-9076133543772600337_1005 from 
> 1055 to 1051
> 2013-04-01 21:02:31,782 DEBUG 
> org.apache.hadoop.hdfs.server.datanode.DataNode: FlushOrsync, file 
> offset:134640; meta offset:1059
> After checking meta on DN4, I found checksum of chunk 262 is duplicated, but 
> data not.
> Later after block was finalized, DN4's scanner detected bad block, and then 
> reported it to NM. NM send a command to delete this block, and replicate this 
> block from other DN in pipeline to satisfy duplication num.
> I think this is because in BlockReceiver it skips data bytes already written, 
> but not skips checksum bytes already written. And function 
> adjustCrcFilePosition is only used for last non-completed chunk, but
> not for this situation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10626) VolumeScanner prints incorrect IOException in reportBadBlocks operation

2016-08-28 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15442878#comment-15442878
 ] 

Yiqun Lin commented on HDFS-10626:
--

Close this lira, duplicate to HDFS-10625.

> VolumeScanner prints incorrect IOException in reportBadBlocks operation
> ---
>
> Key: HDFS-10626
> URL: https://issues.apache.org/jira/browse/HDFS-10626
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
>  Labels: supportability
> Attachments: HDFS-10626.001.patch, HDFS-10626.002.patch, 
> HDFS-10626.003.patch, HDFS-10626.004.patch
>
>
> VolumeScanner throws incorrect IOException in {{datanode.reportBadBlocks}}. 
> The related codes:
> {code}
> public void handle(ExtendedBlock block, IOException e) {
>   FsVolumeSpi volume = scanner.volume;
>   ...
>   try {
> scanner.datanode.reportBadBlocks(block, volume);
>   } catch (IOException ie) {
> // This is bad, but not bad enough to shut down the scanner.
> LOG.warn("Cannot report bad " + block.getBlockId(), e);
>   }
> }
> {code}
> The IOException that printed in the log should be {{ie}} rather than {{e}} 
> which was passed in method {{handle(ExtendedBlock block, IOException e)}}.
> It will be a important info that can help us to know why datanode 
> reporBadBlocks failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10626) VolumeScanner prints incorrect IOException in reportBadBlocks operation

2016-08-28 Thread Yiqun Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15442878#comment-15442878
 ] 

Yiqun Lin edited comment on HDFS-10626 at 8/28/16 6:20 AM:
---

Close this jira, duplicate to HDFS-10625.


was (Author: linyiqun):
Close this lira, duplicate to HDFS-10625.

> VolumeScanner prints incorrect IOException in reportBadBlocks operation
> ---
>
> Key: HDFS-10626
> URL: https://issues.apache.org/jira/browse/HDFS-10626
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>Priority: Minor
>  Labels: supportability
> Attachments: HDFS-10626.001.patch, HDFS-10626.002.patch, 
> HDFS-10626.003.patch, HDFS-10626.004.patch
>
>
> VolumeScanner throws incorrect IOException in {{datanode.reportBadBlocks}}. 
> The related codes:
> {code}
> public void handle(ExtendedBlock block, IOException e) {
>   FsVolumeSpi volume = scanner.volume;
>   ...
>   try {
> scanner.datanode.reportBadBlocks(block, volume);
>   } catch (IOException ie) {
> // This is bad, but not bad enough to shut down the scanner.
> LOG.warn("Cannot report bad " + block.getBlockId(), e);
>   }
> }
> {code}
> The IOException that printed in the log should be {{ie}} rather than {{e}} 
> which was passed in method {{handle(ExtendedBlock block, IOException e)}}.
> It will be a important info that can help us to know why datanode 
> reporBadBlocks failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org