[jira] [Updated] (HDFS-11260) Slow writer threads are not stopped

2016-12-17 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-11260:
---
Description: 
If a DataNode receives a transferred block, it tries to stop writer to the same 
block. However, this may not work, and we saw the following error message and 
stacktrace.

Fundamentally, the assumption of {{ReplicaInPipeline#stopWriter}} is wrong. It 
assumes the writer thread must be a DataXceiver thread, which it can be 
interrupted and terminates afterwards. However, IPC threads may also be the 
writer thread by calling initReplicaRecovery, and which ignores interrupt and 
do not terminate. 

{noformat}
2016-12-16 19:58:56,167 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
Join on writer thread Thread[IPC Server handler 6 on 50020,5,main] timed out
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
org.apache.hadoop.ipc.CallQueueManager.take(CallQueueManager.java:135)
org.apache.hadoop.ipc.Server$Handler.run(Server.java:2052)

2016-12-16 19:58:56,167 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
IOException in BlockReceiver constructor. Cause is
2016-12-16 19:58:56,168 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
sj1dra082.corp.adobe.com:50010:DataXceiver error processing WRITE_BLOCK 
operation  src: /10.10.0.80:44105 dst: /10.10.0.82:50010
java.io.IOException: Join on writer thread Thread[IPC Server handler 6 on 
50020,5,main] timed out
at 
org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:212)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1579)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:195)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:669)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
at java.lang.Thread.run(Thread.java:745)
{noformat}

There is also a logic error in FsDatasetImpl#createTemporary, in which if the 
code in the synchronized block executes for more than 60 seconds (in theory), 
it could throw an exception, without trying to stop the existing slow writer.

We saw a FsDatasetImpl#createTemporary failed after nearly 10 minutes, and it's 
unclear why yet. It's my understanding that the code intends to stop slow 
writers after 1 minute by default. Some code rewrite is probably needed to get 
the logic right.
{noformat}
2016-12-16 23:12:24,636 WARN 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Unable to 
stop existing writer for block 
BP-1527842723-10.0.0.180-1367984731269:blk_4313782210_1103780331023 after 
568320 miniseconds.
{noformat}

  was:
If a DataNode receives a transferred block, it tries to stop writer to the same 
block. However, this may not work, and we saw the following error message and 
stacktrace.

Fundamentally, the assumption of {{ReplicaInPipeline#stopWriter}} is wrong. It 
assumes the writer thread must be a DataXceiver thread, which it can be 
interrupted and terminates afterwards. However, IPC threads may also be the 
writer thread by calling initReplicaRecovery, and which ignores interrupt and 
do not terminate. 

{noformat}
2016-12-16 19:58:56,167 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
Join on writer thread Thread[IPC Server handler 6 on 50020,5,main] timed out
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
org.apache.hadoop.ipc.CallQueueManager.take(CallQueueManager.java:135)
org.apache.hadoop.ipc.Server$Handler.run(Server.java:2052)

2016-12-16 19:58:56,167 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
IOException in BlockReceiver constructor. Cause is
2016-12-16 19:58:56,168 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
sj1dra082.corp.adobe.com:50010:DataXceiver error processing WRITE_BLOCK 
operation  src: /10.10.0.80:44105 dst: /10.10.0.82:50010
java.io.IOException: Join on writer thread Thread[IPC Server handler 6 on 
50020,5,main] timed out
at 
org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:212)
at 

[jira] [Updated] (HDFS-11260) Slow writer threads are not stopped

2016-12-17 Thread Wei-Chiu Chuang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-11260:
---
Description: 
If a DataNode receives a transferred block, it tries to stop writer to the same 
block. However, this may not work, and we saw the following error message and 
stacktrace.

Fundamentally, the assumption of {{ReplicaInPipeline#stopWriter}} is wrong. It 
assumes the writer thread must be a DataXceiver thread, which it can be 
interrupted and terminates afterwards. However, IPC threads may also be the 
writer thread by calling initReplicaRecovery, and which ignores interrupt and 
do not terminate. 

{noformat}
2016-12-16 19:58:56,167 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
Join on writer thread Thread[IPC Server handler 6 on 50020,5,main] timed out
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
org.apache.hadoop.ipc.CallQueueManager.take(CallQueueManager.java:135)
org.apache.hadoop.ipc.Server$Handler.run(Server.java:2052)

2016-12-16 19:58:56,167 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
IOException in BlockReceiver constructor. Cause is
2016-12-16 19:58:56,168 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
sj1dra082.corp.adobe.com:50010:DataXceiver error processing WRITE_BLOCK 
operation  src: /10.10.0.80:44105 dst: /10.10.0.82:50010
java.io.IOException: Join on writer thread Thread[IPC Server handler 6 on 
50020,5,main] timed out
at 
org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:212)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1579)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:195)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:669)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
at java.lang.Thread.run(Thread.java:745)
{noformat}

There is also a logic error in FsDatasetImpl#createTemporary, in which if the 
code in the synchronized block executes for more than 60 seconds (in theory), 
it could throw an exception, without trying to stop the existing slow writer.

We saw a FsDatasetImpl#createTemporary failed after nearly 10 minutes, and it's 
unclear why yet. It's my understanding that the code intends to stop slow 
writers after 1 minute by default.
{noformat}
2016-12-16 23:12:24,636 WARN 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Unable to 
stop existing writer for block 
BP-1527842723-10.0.0.180-1367984731269:blk_4313782210_1103780331023 after 
568320 miniseconds.
{noformat}

  was:
If a DataNode receives a transferred block, it tries to stop writer to the same 
block. However, this may not work, and we saw the following error message and 
stacktrace.

Fundamentally, the assumption of {{ReplicaInPipeline#stopWriter}} is wrong. It 
assumes the writer thread must be a DataXceiver thread, which it can be 
interrupted and terminates afterwards. However, IPC threads may also be the 
writer thread by calling initReplicaRecovery, and which ignores interrupt and 
do not terminate. 

{noformat}
2016-12-16 19:58:56,167 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
Join on writer thread Thread[IPC Server handler 6 on 50020,5,main] timed out
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
org.apache.hadoop.ipc.CallQueueManager.take(CallQueueManager.java:135)
org.apache.hadoop.ipc.Server$Handler.run(Server.java:2052)

2016-12-16 19:58:56,167 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
IOException in BlockReceiver constructor. Cause is
2016-12-16 19:58:56,168 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
sj1dra082.corp.adobe.com:50010:DataXceiver error processing WRITE_BLOCK 
operation  src: /10.10.0.80:44105 dst: /10.10.0.82:50010
java.io.IOException: Join on writer thread Thread[IPC Server handler 6 on 
50020,5,main] timed out
at 
org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:212)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1579)
at 

[jira] [Created] (HDFS-11260) Slow writer threads are not stopped

2016-12-17 Thread Wei-Chiu Chuang (JIRA)
Wei-Chiu Chuang created HDFS-11260:
--

 Summary: Slow writer threads are not stopped
 Key: HDFS-11260
 URL: https://issues.apache.org/jira/browse/HDFS-11260
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.7.0
Reporter: Wei-Chiu Chuang


If a DataNode receives a transferred block, it tries to stop writer to the same 
block. However, this may not work, and we saw the following error message and 
stacktrace.

Fundamentally, the assumption of {{ReplicaInPipeline#stopWriter}} is wrong. It 
assumes the writer thread must be a DataXceiver thread, which it can be 
interrupted and terminates afterwards. However, IPC threads may also be the 
writer thread by calling initReplicaRecovery, and which ignores interrupt and 
do not terminate. 

{noformat}
2016-12-16 19:58:56,167 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
Join on writer thread Thread[IPC Server handler 6 on 50020,5,main] timed out
sun.misc.Unsafe.park(Native Method)
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
org.apache.hadoop.ipc.CallQueueManager.take(CallQueueManager.java:135)
org.apache.hadoop.ipc.Server$Handler.run(Server.java:2052)

2016-12-16 19:58:56,167 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
IOException in BlockReceiver constructor. Cause is
2016-12-16 19:58:56,168 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
sj1dra082.corp.adobe.com:50010:DataXceiver error processing WRITE_BLOCK 
operation  src: /10.10.0.80:44105 dst: /10.10.0.82:50010
java.io.IOException: Join on writer thread Thread[IPC Server handler 6 on 
50020,5,main] timed out
at 
org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.stopWriter(ReplicaInPipeline.java:212)
at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:1579)
at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:195)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:669)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
at java.lang.Thread.run(Thread.java:745)
{noformat}

There is also a logic error in FsDatasetImpl#createTemporary, in which if the 
code in the synchronized block executes for more than 60 seconds (in theory), 
it could throw an exception, without trying to stop the existing slow writer.

We saw a FsDatasetImpl#createTemporary failed after nearly 10 minutes, and it's 
unclear why yet.
{noformat}
2016-12-16 23:12:24,636 WARN 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Unable to 
stop existing writer for block 
BP-1527842723-10.0.0.180-1367984731269:blk_4313782210_1103780331023 after 
568320 miniseconds.
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.

2016-12-17 Thread He Xiaoqiao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756632#comment-15756632
 ] 

He Xiaoqiao commented on HDFS-10453:


[~yangyishan0901m]
patch is ready and it works well for long times in our production env as 
expected. you can patch and test it for yourself.

> ReplicationMonitor thread could stuck for long time due to the race between 
> replication and delete of same file in a large cluster.
> ---
>
> Key: HDFS-10453
> URL: https://issues.apache.org/jira/browse/HDFS-10453
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4
>Reporter: He Xiaoqiao
> Attachments: HDFS-10453-branch-2.001.patch, 
> HDFS-10453-branch-2.003.patch, HDFS-10453.001.patch
>
>
> ReplicationMonitor thread could stuck for long time and loss data with little 
> probability. Consider the typical scenarioļ¼š
> (1) create and close a file with the default replicas(3);
> (2) increase replication (to 10) of the file.
> (3) delete the file while ReplicationMonitor is scheduling blocks belong to 
> that file for replications.
> if ReplicationMonitor stuck reappeared, NameNode will print log as:
> {code:xml}
> 2016-04-19 10:20:48,083 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 7 to reach 10 
> (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) For more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> ..
> 2016-04-19 10:21:17,184 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 7 to reach 10 
> (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) For more information, please enable DEBUG log level on 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy
> 2016-04-19 10:21:17,184 WARN 
> org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough 
> replicas: expected size is 7 but only 0 storage types can be selected 
> (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, 
> DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]})
> 2016-04-19 10:21:17,184 WARN 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to 
> place enough replicas, still in need of 7 to reach 10 
> (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, 
> newBlock=false) All required storage types are unavailable:  
> unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, 
> storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}
> {code}
> This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) 
> process same block at the same moment.
> (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to 
> replicate and leave the global lock.
> (2) FSNamesystem#delete invoked to delete blocks then clear the reference in 
> blocksmap, needReplications, etc. the block's NumBytes will set 
> NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does 
> not need explicit ACK from the node. 
> (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to 
> chooseTargets for the same blocks and no node will be selected after traverse 
> whole cluster because  no node choice satisfy the goodness criteria 
> (remaining spaces achieve required size Long.MAX_VALUE). 
> During of stage#3 ReplicationMonitor stuck for long time, especial in a large 
> cluster. invalidateBlocks & neededReplications continues to grow and no 
> consumes. it will loss data at the worst.
> This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block 
> and remove it from neededReplications.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org