[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17624847#comment-17624847 ] Xiaoqiao He commented on HDFS-10453: [~yuyanlei] It works find in my internal cluster. It has been checkin trunk and other active branches. Thanks. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 2.7.6, 3.0.3 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453-branch-2.7.007.patch, HDFS-10453-branch-2.7.008.patch, > HDFS-10453-branch-2.7.009.patch, HDFS-10453-branch-2.8.001.patch, > HDFS-10453-branch-2.8.002.patch, HDFS-10453-branch-2.9.001.patch, > HDFS-10453-branch-2.9.002.patch, HDFS-10453-branch-3.0.001.patch, > HDFS-10453-branch-3.0.002.patch, HDFS-10453-trunk.001.patch, > HDFS-10453-trunk.002.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17624837#comment-17624837 ] Yanlei Yu commented on HDFS-10453: -- Hi [~hexiaoqiao]. After reading your previous discussion, I would like to confirm whether HDFS-10453-branch-2.7.009 patch has solved this problem > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: Xiaoqiao He >Assignee: Xiaoqiao He >Priority: Major > Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 2.7.6, 3.0.3 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453-branch-2.7.007.patch, HDFS-10453-branch-2.7.008.patch, > HDFS-10453-branch-2.7.009.patch, HDFS-10453-branch-2.8.001.patch, > HDFS-10453-branch-2.8.002.patch, HDFS-10453-branch-2.9.001.patch, > HDFS-10453-branch-2.9.002.patch, HDFS-10453-branch-3.0.001.patch, > HDFS-10453-branch-3.0.002.patch, HDFS-10453-trunk.001.patch, > HDFS-10453-trunk.002.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924846#comment-16924846 ] niu commented on HDFS-10453: Hi [~hexiaoqiao]. In this case, shouldn't be zero in the log `still in need of 0 to reach 3`. The demand should be at least 1,right? I didn't see any abnormal log in namenode. Don't hung, it still works normally. Is it possible that some datanode connections are not stable but still alive causes this issue? > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 2.7.6, 3.0.3 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453-branch-2.7.007.patch, HDFS-10453-branch-2.7.008.patch, > HDFS-10453-branch-2.7.009.patch, HDFS-10453-branch-2.8.001.patch, > HDFS-10453-branch-2.8.002.patch, HDFS-10453-branch-2.9.001.patch, > HDFS-10453-branch-2.9.002.patch, HDFS-10453-branch-3.0.001.patch, > HDFS-10453-branch-3.0.002.patch, HDFS-10453-trunk.001.patch, > HDFS-10453-trunk.002.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924753#comment-16924753 ] He Xiaoqiao commented on HDFS-10453: [~hustnn] I think this is explained case, when some datanodes are shutdown at the same time, NameNode will trigger to replicate under-replication blocks, then some live datanode will be high load especially for small cluster. I just wonder if namenode process meet some exception or hung long times when log above information? > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 2.7.6, 3.0.3 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453-branch-2.7.007.patch, HDFS-10453-branch-2.7.008.patch, > HDFS-10453-branch-2.7.009.patch, HDFS-10453-branch-2.8.001.patch, > HDFS-10453-branch-2.8.002.patch, HDFS-10453-branch-2.9.001.patch, > HDFS-10453-branch-2.9.002.patch, HDFS-10453-branch-3.0.001.patch, > HDFS-10453-branch-3.0.002.patch, HDFS-10453-trunk.001.patch, > HDFS-10453-trunk.002.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923915#comment-16923915 ] niu commented on HDFS-10453: [~hexiaoqiao] Thanks for your quick reply. Yes, it is very strange. Currently, I am not sure the correct way to reprod it and it occurs after some datanodes are down. I missed another log which is before this one ```blockmanagement.BlockPlacementPolicy (BlockPlacementPolicyDefault.java:chooseRandom(827)) - Not enough replicas was chosen. Reason:{TOO_MANY_NODES_ON_RACK=xxx, NODE_TOO_BUSY=xx}``` It shouldn't happen, right? > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 2.7.6, 3.0.3 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453-branch-2.7.007.patch, HDFS-10453-branch-2.7.008.patch, > HDFS-10453-branch-2.7.009.patch, HDFS-10453-branch-2.8.001.patch, > HDFS-10453-branch-2.8.002.patch, HDFS-10453-branch-2.9.001.patch, > HDFS-10453-branch-2.9.002.patch, HDFS-10453-branch-3.0.001.patch, > HDFS-10453-branch-3.0.002.patch, HDFS-10453-trunk.001.patch, > HDFS-10453-trunk.002.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923913#comment-16923913 ] He Xiaoqiao commented on HDFS-10453: Thanks [~hustnn] for your pings. I am confused about the log `Failed to place enough replicas, still in need of 0 to reach 3`. it seems that there are enough replicas. Would like to offer more related information or how to reprod it? > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 2.7.6, 3.0.3 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453-branch-2.7.007.patch, HDFS-10453-branch-2.7.008.patch, > HDFS-10453-branch-2.7.009.patch, HDFS-10453-branch-2.8.001.patch, > HDFS-10453-branch-2.8.002.patch, HDFS-10453-branch-2.9.001.patch, > HDFS-10453-branch-2.9.002.patch, HDFS-10453-branch-3.0.001.patch, > HDFS-10453-branch-3.0.002.patch, HDFS-10453-trunk.001.patch, > HDFS-10453-trunk.002.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923885#comment-16923885 ] niu commented on HDFS-10453: Hi [~hexiaoqiao] We are using 2.9.2 and has this WARN again. In which case, will the `in need of 0 to reach 3` happen? ```2019-09-05 16:10:47,821 WARN blockmanagement.BlockPlacementPolicy (BlockPlacementPolicyDefault.java:chooseTarget(431)) - Failed to place enough replicas, still in need of 0 to reach 3 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=false) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and org.apache.hadoop.net.NetworkTopology``` > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 3.1.0, 2.10.0, 2.9.1, 2.8.4, 2.7.6, 3.0.3 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453-branch-2.7.007.patch, HDFS-10453-branch-2.7.008.patch, > HDFS-10453-branch-2.7.009.patch, HDFS-10453-branch-2.8.001.patch, > HDFS-10453-branch-2.8.002.patch, HDFS-10453-branch-2.9.001.patch, > HDFS-10453-branch-2.9.002.patch, HDFS-10453-branch-3.0.001.patch, > HDFS-10453-branch-3.0.002.patch, HDFS-10453-trunk.001.patch, > HDFS-10453-trunk.002.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3)
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584189#comment-16584189 ] Henrique Barros commented on HDFS-10453: I have the same problem with Hadoop 2.6.0-cdh5.15.0 With Cloudera CDH-5.15.0-1.cdh5.15.0.p0.21 In my case we are getting this error very randomly and with only one Datanode (for now). Here is the Log. {code:java} Choosing random from 1 available nodes on node /default, scope=/default, excludedScope=null, excludeNodes=[] 2:38:20.527 PM DEBUG NetworkTopology Choosing random from 0 available nodes on node /default, scope=/default, excludedScope=null, excludeNodes=[192.168.220.53:50010] 2:38:20.527 PM DEBUG NetworkTopology chooseRandom returning null 2:38:20.527 PM DEBUG BlockPlacementPolicy [ Node /default/192.168.220.53:50010 [ Datanode 192.168.220.53:50010 is not chosen since the node is too busy (load: 8 > 0.0). 2:38:20.527 PM DEBUG NetworkTopology chooseRandom returning 192.168.220.53:50010 2:38:20.527 PM INFOBlockPlacementPolicy Not enough replicas was chosen. Reason:{NODE_TOO_BUSY=1} 2:38:20.527 PM DEBUG StateChange closeFile: /mobi.me/development/apps/flink/checkpoints/a5a6806866c1640660924ea1453cbe34/chk-2118/eef8bff6-75a9-43c1-ae93-4b1a9ca31ad9 with 1 blocks is persisted to the file system 2:38:20.527 PM DEBUG StateChange *BLOCK* NameNode.addBlock: file /mobi.me/development/apps/flink/checkpoints/a5a6806866c1640660924ea1453cbe34/chk-2118/1cfe900d-6f45-4b55-baaa-73c02ace2660 fileId=129628869 for DFSClient_NONMAPREDUCE_467616914_65 2:38:20.527 PM DEBUG BlockPlacementPolicy Failed to choose from local rack (location = /default); the second replica is not found, retry choosing ramdomly org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy$NotEnoughReplicasException: at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:784) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseRandom(BlockPlacementPolicyDefault.java:694) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalRack(BlockPlacementPolicyDefault.java:601) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseLocalStorage(BlockPlacementPolicyDefault.java:561) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTargetInOrder(BlockPlacementPolicyDefault.java:464) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:395) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:270) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:142) at org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault.chooseTarget(BlockPlacementPolicyDefault.java:158) at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1715) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3505) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:694) at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:219) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:507) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2281) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2277) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2275) {code} This part makes no sense at all: {code:java} load: 8 > 0.0{code} > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL:
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16362473#comment-16362473 ] Andras Bokor commented on HDFS-10453: - Regarding branch-2.7: After patch 008 an additional patch, 009 was uploaded and it seems that one was [committed|https://github.com/apache/hadoop/commit/02f6030b35999f2f741a8c4b9363ee59f36f7e28]. The difference between the two patches is that the later one does not include the unit test. I do not see any comment about 009. Was committing 009 intended? Now the branch misses the UT. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4, 2.7.6 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453-branch-2.7.007.patch, HDFS-10453-branch-2.7.008.patch, > HDFS-10453-branch-2.7.009.patch, HDFS-10453-branch-2.8.001.patch, > HDFS-10453-branch-2.8.002.patch, HDFS-10453-branch-2.9.001.patch, > HDFS-10453-branch-2.9.002.patch, HDFS-10453-branch-3.0.001.patch, > HDFS-10453-branch-3.0.002.patch, HDFS-10453-trunk.001.patch, > HDFS-10453-trunk.002.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360934#comment-16360934 ] Hudson commented on HDFS-10453: --- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #13645 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/13645/]) HDFS-10453. ReplicationMonitor thread could stuck for long time due to (arp: rev 96bb6a51ec4a470e9b287c94e377444a9f97c410) * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/ErasureCodingWork.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/ReplicationWork.java * (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockReconstructionWork.java > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 3.1.0, 2.10.0, 2.9.1, 3.0.1, 2.8.4, 2.7.6 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453-branch-2.7.007.patch, HDFS-10453-branch-2.7.008.patch, > HDFS-10453-branch-2.7.009.patch, HDFS-10453-branch-2.8.001.patch, > HDFS-10453-branch-2.8.002.patch, HDFS-10453-branch-2.9.001.patch, > HDFS-10453-branch-2.9.002.patch, HDFS-10453-branch-3.0.001.patch, > HDFS-10453-branch-3.0.002.patch, HDFS-10453-trunk.001.patch, > HDFS-10453-trunk.002.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3)
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16360400#comment-16360400 ] He Xiaoqiao commented on HDFS-10453: I checked the failed UTs and tested locally, it seems to work fine and might not relate to this patch, please double check at your convenience. [~arpitagarwal] > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 2.7.6 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453-branch-2.7.007.patch, HDFS-10453-branch-2.7.008.patch, > HDFS-10453-branch-2.7.009.patch, HDFS-10453-branch-2.8.001.patch, > HDFS-10453-branch-2.8.002.patch, HDFS-10453-branch-2.9.001.patch, > HDFS-10453-branch-2.9.002.patch, HDFS-10453-branch-3.0.001.patch, > HDFS-10453-branch-3.0.002.patch, HDFS-10453-trunk.001.patch, > HDFS-10453-trunk.002.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359285#comment-16359285 ] genericqa commented on HDFS-10453: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 25s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 37s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 56s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 45s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 26s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}128m 7s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 23s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}176m 34s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-10453 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12910023/HDFS-10453-trunk.002.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux f10611600a15 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / c97d5bc | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/23019/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/23019/testReport/ | | Max. process+thread count | 3577 (vs. ulimit of 5500) | | modules |
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359237#comment-16359237 ] He Xiaoqiao commented on HDFS-10453: [~arpitagarwal] upload new patches without unit test. {quote}However the new test doesn't verify this fix. I don't see a way to unit test the race condition without refactoring, so let's just remove the new unit test.{quote} Thanks for your careful review.The added unit test doesn't work well as your mentioned using this simple fix type as well. And I do not find an elegant way to verify this fix since we could not manipulate thread {{ReplicationMonitor}} and {{PendingReplicationMonitor}} progress in MiniDFSCluster without refactoring. Please share your idea if there are good suggestions. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 2.7.6 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453-branch-2.7.007.patch, HDFS-10453-branch-2.7.008.patch, > HDFS-10453-branch-2.7.009.patch, HDFS-10453-branch-2.8.001.patch, > HDFS-10453-branch-2.8.002.patch, HDFS-10453-branch-2.9.001.patch, > HDFS-10453-branch-2.9.002.patch, HDFS-10453-branch-3.0.001.patch, > HDFS-10453-branch-3.0.002.patch, HDFS-10453-trunk.001.patch, > HDFS-10453-trunk.002.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359229#comment-16359229 ] Arpit Agarwal commented on HDFS-10453: -- +1 for HDFS-10453-trunk.002.patch, pending Jenkins. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 2.7.6 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453-branch-2.7.007.patch, HDFS-10453-branch-2.7.008.patch, > HDFS-10453-branch-2.7.009.patch, HDFS-10453-branch-2.8.001.patch, > HDFS-10453-branch-2.8.002.patch, HDFS-10453-branch-2.9.001.patch, > HDFS-10453-branch-2.9.002.patch, HDFS-10453-branch-3.0.001.patch, > HDFS-10453-branch-3.0.002.patch, HDFS-10453-trunk.001.patch, > HDFS-10453-trunk.002.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358772#comment-16358772 ] Arpit Agarwal commented on HDFS-10453: -- [~hexiaoqiao], +1 from me also. However the new test doesn't verify this fix. I don't see a way to unit test the race condition without refactoring, so let's just remove the new unit test. +1 with that removed. Also you can delay attaching patches for the branches other than trunk until there is a +1, to save yourself work. :) > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 2.7.6 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453-branch-2.7.007.patch, HDFS-10453-branch-2.7.008.patch, > HDFS-10453-branch-2.8.001.patch, HDFS-10453-branch-2.9.001.patch, > HDFS-10453-branch-3.0.001.patch, HDFS-10453-trunk.001.patch, > HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358739#comment-16358739 ] Ajay Kumar commented on HDFS-10453: --- [~xkrogen] i am also good with change. Question was about some edge cases where we don't find any target. Even in those cases we will remove it in next iteration so that should be fine as well. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 2.7.6 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453-branch-2.7.007.patch, HDFS-10453-branch-2.7.008.patch, > HDFS-10453-branch-2.8.001.patch, HDFS-10453-branch-2.9.001.patch, > HDFS-10453-branch-3.0.001.patch, HDFS-10453-trunk.001.patch, > HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358657#comment-16358657 ] Erik Krogen commented on HDFS-10453: I don't think that move is necessary anymore. Given that we use the block's old, non-deleted size, the expectation is that targets will _not_ be empty. Thus the {{bc == null}} check will end up being triggered regardless. branch-2.7 v008 & trunk v001 patches LGTM. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 2.7.6 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453-branch-2.7.007.patch, HDFS-10453-branch-2.7.008.patch, > HDFS-10453-branch-2.8.001.patch, HDFS-10453-branch-2.9.001.patch, > HDFS-10453-branch-3.0.001.patch, HDFS-10453-trunk.001.patch, > HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358323#comment-16358323 ] He Xiaoqiao commented on HDFS-10453: [~ajayydv] {quote} As Erik Krogen mentioned earlier "A deleted block should always be removed from the needingReplications list regardless of whether or not any targets were found for it, so it makes sense to perform this check before the check for an empty targets list. ". In current scenario it will be removed in next iteration of computeReplicationWorkForBlocks. {quote} I think we need discuss about if moving check {{if (bc == null || (bc.isUnderConstruction() && block.equals(bc.getLastBlock(}} before {{if(targets == null || targets.length == 0)}}. Since there is cost that grabbing the lock on neededReplications to get {{block}} for all scenario. [~xkrogen],[~arpitagarwal], do you mind having a look? > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 2.7.6 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453-branch-2.7.007.patch, HDFS-10453-branch-2.7.008.patch, > HDFS-10453-branch-2.8.001.patch, HDFS-10453-branch-2.9.001.patch, > HDFS-10453-branch-3.0.001.patch, HDFS-10453-trunk.001.patch, > HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16358033#comment-16358033 ] genericqa commented on HDFS-10453: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 23s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 15m 26s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 35s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 45s{color} | {color:green} branch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 44s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 53s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 10m 20s{color} | {color:green} patch has no errors when building and testing our client artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 48s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}143m 18s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}191m 8s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure | | | hadoop.hdfs.server.namenode.ha.TestRetryCacheWithHA | | | hadoop.hdfs.qjournal.server.TestJournalNodeSync | | | hadoop.hdfs.web.TestWebHdfsTimeouts | | | hadoop.hdfs.TestDFSStripedOutputStreamWithFailure | | | hadoop.hdfs.TestBlocksScheduledCounter | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 | | JIRA Issue | HDFS-10453 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12909885/HDFS-10453-trunk.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit shadedclient findbugs checkstyle | | uname | Linux 351fb731547f 4.4.0-64-generic #85-Ubuntu SMP Mon Feb 20 11:50:30 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 1bc03dd | | maven | version: Apache Maven 3.3.9 | | Default Java | 1.8.0_151 | | findbugs | v3.1.0-RC1 | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/23000/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/23000/testReport/ | | Max. process+thread count |
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357925#comment-16357925 ] Ajay Kumar commented on HDFS-10453: --- [~hexiaoqiao], I was referring to moving the check {{if (bc == null || (bc.isUnderConstruction() && block.equals(bc.getLastBlock(}} before {{if(targets == null || targets.length == 0)}}. As [~xkrogen] mentioned earlier "A deleted block should always be removed from the needingReplications list regardless of whether or not any targets were found for it, so it makes sense to perform this check before the check for an empty targets list. ". In current scenario it will be removed in next iteration of {{computeReplicationWorkForBlocks}}. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 2.7.6 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453-branch-2.7.007.patch, HDFS-10453-branch-2.7.008.patch, > HDFS-10453-branch-2.8.001.patch, HDFS-10453-branch-2.9.001.patch, > HDFS-10453-branch-3.0.001.patch, HDFS-10453-trunk.001.patch, > HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357897#comment-16357897 ] He Xiaoqiao commented on HDFS-10453: [~xkrogen],[~ajayydv] Sorry for so late comments. I just update v008 [#HDFS-10453-branch-2.7.008.patch] patch for branch-2.7, and patchs for other branchs also be ready, which correct the properly invoke to get number bytes of block. {{getNumBytes()}}. {quote}He Xiaoqiao, Patch v8 doesn't have changes from patch v7 in BlockManager#computeReplicationWorkForBlocks. Is that intentional?{quote} [~ajayydv] Thanks for your comments firstly. since we have saved blocksize within the constructor for ReplicationWork rather than calling block.getNumBytes() within chooseTargets() in new patch [#HDFS-10453-branch-2.7.008.patch], so it is impossible to choose target for a block whose length is {{Long.MAX_VALUE}}. FYI. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 2.7.6 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453-branch-2.7.007.patch, HDFS-10453-branch-2.7.008.patch, > HDFS-10453-branch-2.8.001.patch, HDFS-10453-branch-2.9.001.patch, > HDFS-10453-branch-3.0.001.patch, HDFS-10453-trunk.001.patch, > HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357343#comment-16357343 ] Ajay Kumar commented on HDFS-10453: --- [~hexiaoqiao], Patch v8 doesn't have changes from patch v7 in {{BlockManager#computeReplicationWorkForBlocks}}. Is that intentional? > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 2.7.6 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453-branch-2.7.007.patch, HDFS-10453-branch-2.7.008.patch, > HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail:
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355650#comment-16355650 ] Erik Krogen commented on HDFS-10453: Re: v008 patch, looks like you are using {{getPreferredBlockSize()}} instead of {{getNumBytes()}} , that does not seem right, was it an unintentional change? > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 2.7.6 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453-branch-2.7.007.patch, HDFS-10453-branch-2.7.008.patch, > HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail:
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16355019#comment-16355019 ] genericqa commented on HDFS-10453: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2.7 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 3s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 8s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 6s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 3s{color} | {color:green} branch-2.7 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 12s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 60 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 37s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 57s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}109m 7s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 1m 36s{color} | {color:red} The patch generated 304 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}136m 41s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Unreaped Processes | hadoop-hdfs:26 | | Failed junit tests | hadoop.hdfs.crypto.TestHdfsCryptoStreams | | | hadoop.hdfs.TestParallelShortCircuitLegacyRead | | | hadoop.hdfs.TestFetchImage | | | hadoop.hdfs.TestDFSRollback | | | hadoop.hdfs.TestSetrepIncreasing | | | hadoop.hdfs.TestDataTransferProtocol | | | hadoop.hdfs.TestRollingUpgrade | | | hadoop.hdfs.TestAbandonBlock | | | hadoop.hdfs.TestHDFSTrash | | | hadoop.hdfs.TestParallelUnixDomainRead | | | hadoop.hdfs.TestFileCreationClient | | Timed out junit tests | org.apache.hadoop.hdfs.TestLeaseRecovery2 | | | org.apache.hadoop.hdfs.TestDatanodeRegistration | | | org.apache.hadoop.hdfs.TestDFSClientFailover | | | org.apache.hadoop.hdfs.TestDatanodeDeath | | | org.apache.hadoop.hdfs.TestDFSClientRetries | | | org.apache.hadoop.hdfs.web.TestWebHdfsTokens | | | org.apache.hadoop.hdfs.TestDFSInotifyEventInputStream | | | org.apache.hadoop.hdfs.TestFileAppendRestart | | | org.apache.hadoop.hdfs.security.TestDelegationToken | | | org.apache.hadoop.hdfs.TestSeekBug | | | org.apache.hadoop.hdfs.TestDFSMkdirs | | | org.apache.hadoop.hdfs.TestDatanodeReport | | | org.apache.hadoop.hdfs.web.TestWebHDFS | | | org.apache.hadoop.hdfs.web.TestWebHDFSXAttr | | | org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes | | | org.apache.hadoop.hdfs.TestMiniDFSCluster | | | org.apache.hadoop.hdfs.TestDistributedFileSystem | | | org.apache.hadoop.hdfs.web.TestWebHDFSForHA | | | org.apache.hadoop.hdfs.TestBalancerBandwidth | | | org.apache.hadoop.hdfs.TestSetTimes | | |
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354927#comment-16354927 ] He Xiaoqiao commented on HDFS-10453: {quote}However, going through this makes me realize, a more simple fix may be to just fetch and save blocksize within the constructor for ReplicationWork rather than calling block.getNumBytes() within chooseTargets(). {quote} +1. It makes sense for me. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 2.7.6 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453-branch-2.7.007.patch, HDFS-10453-branch-2.7.008.patch, > HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354717#comment-16354717 ] Arpit Agarwal commented on HDFS-10453: -- bq. However, going through this makes me realize, a more simple fix may be to just fetch and save blocksize within the constructor for ReplicationWork rather than calling block.getNumBytes() within chooseTargets(). +1 for this suggestion. That is a more deterministic solution than depending on a value sampled outside of the lock. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 2.7.6 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453-branch-2.7.007.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354711#comment-16354711 ] Erik Krogen commented on HDFS-10453: Hey [~arpitagarwal], IIUC it doesn't need to see the most recent value to fix this issue. The problem comes when: 1. Block is added to ReplicationWork under lock 2. Block is deleted 3. Block length is read by ReplicationWork as NO_ACK 4. chooseTargets attempts to place a block of size Long.MAX_VALUE; this causes issue because there is no valid placement, so it takes a long time for the chooseTargets loop to terminate If the original block length is read rather than the most recent value, the issue discussed here does not occur: 1. Block is added to ReplicationWork under lock 2. Block is deleted 3. Block length is read by ReplicationWork as a normal length 4. chooseTargets successfully finds some new locations; then the {{bc == null}} check properly removes the block from {{neededReplications}} However, going through this makes me realize, a more simple fix may be to just fetch and save {{blocksize}} within the constructor for ReplicationWork rather than calling {{block.getNumBytes()}} within {{chooseTargets()}}. This ensures consistency, so there should no longer be any ReplicationWorks which consider a block size of NO_ACK. It avoids the "hacky" nature of checking size equality with NO_ACK as discussed by Chen. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 2.7.6 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453-branch-2.7.007.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354677#comment-16354677 ] Arpit Agarwal commented on HDFS-10453: -- Hi [~hexiaoqiao], I assume this is the key part of the fix in the v7 patch: {code} // Skip choose targets for block where one of the following conditions: // a. additional number of replicas wanted is zero // b. the datanode number of cluster is zero // c. block has been deleted which is indicated by BlockCommand.NO_ACK. if (numOfReplicas == 0 || clusterMap.getNumOfLeaves()==0 || blocksize == BlockCommand.NO_ACK) { return DatanodeStorageInfo.EMPTY_ARRAY; } {code} i.e. chooseTargets avoids looking for replication targets if the blockSize was changed to NO_ACK. This won't work. The block.length field was read by the caller ReplicationWork#chooseTargets after releasing the lock, so there's no guarantee it sees the most recent value. I don't think we should attempt to fix chooseTarget at all since it runs outside the lock. Your v4 patch was on the right track. I will review it more closely. Also we'll need a trunk patch (and patches for branch-2.9 and branch-2.8, if we want to commit this to branch-2.7). > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 2.7.6 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453-branch-2.7.007.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does >
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354571#comment-16354571 ] Ajay Kumar commented on HDFS-10453: --- LGTM. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 2.7.6 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453-branch-2.7.007.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16354286#comment-16354286 ] Chen Liang commented on HDFS-10453: --- Thanks for the follow up [~hexiaoqiao]! bq. if check if(targets == null || targets.length == 0) and continue loop, it will waste some CPU resource Moving this check into synchronized block also has the potential CPU waste issue of grabbing the lock on neededReplications, but then do nothing but just immediately releasing the lock and continue (when targets is empty or null). I'm not sure whether this matters though, I'm okay with either way. Ping [~arpitagarwal], do you mind having a look? > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 2.7.6 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453-branch-2.7.007.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353567#comment-16353567 ] genericqa commented on HDFS-10453: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 16m 26s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2.7 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 21s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 2s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 47s{color} | {color:green} branch-2.7 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 0s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 25s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 237 unchanged - 1 fixed = 238 total (was 238) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 3s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 60 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 79m 12s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 1m 29s{color} | {color:red} The patch generated 321 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}121m 13s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Unreaped Processes | hadoop-hdfs:20 | | Failed junit tests | hadoop.hdfs.TestClientBlockVerification | | | hadoop.hdfs.web.TestWebHdfsFileSystemContract | | Timed out junit tests | org.apache.hadoop.hdfs.TestLeaseRecovery2 | | | org.apache.hadoop.hdfs.TestDatanodeRegistration | | | org.apache.hadoop.hdfs.TestDFSClientFailover | | | org.apache.hadoop.hdfs.TestDFSClientRetries | | | org.apache.hadoop.security.TestPermission | | | org.apache.hadoop.hdfs.web.TestWebHdfsTokens | | | org.apache.hadoop.hdfs.TestDFSInotifyEventInputStream | | | org.apache.hadoop.hdfs.TestFileAppendRestart | | | org.apache.hadoop.hdfs.TestSeekBug | | | org.apache.hadoop.hdfs.TestDatanodeReport | | | org.apache.hadoop.hdfs.web.TestWebHDFS | | | org.apache.hadoop.hdfs.web.TestWebHDFSXAttr | | | org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes | | | org.apache.hadoop.hdfs.TestMiniDFSCluster | | | org.apache.hadoop.hdfs.web.TestFSMainOperationsWebHdfs | | | org.apache.hadoop.hdfs.TestDistributedFileSystem | | | org.apache.hadoop.hdfs.web.TestWebHDFSForHA | | | org.apache.hadoop.hdfs.TestSetTimes | | | org.apache.hadoop.hdfs.TestDFSShell | | | org.apache.hadoop.hdfs.web.TestWebHDFSAcl | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ea57d10 | | JIRA Issue | HDFS-10453 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12909377/HDFS-10453-branch-2.7.007.patch | | Optional Tests |
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353480#comment-16353480 ] He Xiaoqiao commented on HDFS-10453: [~xkrogen] [~vagarychen], Thanks for you comments, I attach #HDFS-10453-branch-2.7.007.patch following your suggestions. {quote}Given the change in {{BlockPlacementPolicyDefault#chooseTarget}} that, if the size is {{NO_ACK}}, it immediately returns an empty array, do we still really need the change in {{BlockManager#computeReplicationWorkForBlocks}}? Because with the change in {{chooseTarget}}, I think {{rw.chooseTargets(...);}} would set {{rw.targets}} to empty array, then in {{computeReplicationWorkForBlocks}}, {{if(targets == null || targets.length == 0)}} will be true and the \{{rw}}gets skipped. {quote} I think it is necessary to check if Block is null in BlockManager#computeReplicationWorkForBlocks, even if return empty in rw#chooseTargets when block has been deleted, because there are many cases lead to return of rw#chooseTarget is empty/null, if check {{if(targets == null || targets.length == 0)}} and continue loop, it will waste some CPU resource, especially any case as this Jira describe. on another hand, I am not sure if there are some other case cause block to be null, if check \{{bc == null}} firstly, it won't make the situation worse at least. {quote}I think this flag {{NO_ACK}} only specifically means "an indicator of no need for DN to ack", but we are using it here as "an indicator that the block does not need placement". {quote} as you mentioned, this flag used here is not original intention indeed, but I think it can describe `deleted block` as the annotation in branch-2.7: {quote}/** * This constant is used to indicate that the block deletion does not need * explicit ACK from the datanode. When a block is put into the list of blocks * to be deleted, it's size is set to this constant. We assume that no block * would actually have this size. Otherwise, we would miss ACKs for blocks * with such size. Positive number is used for compatibility reasons. */ public static final long NO_ACK = Long.MAX_VALUE; {quote} [~xkrogen] [~vagarychen] thanks again and please let me know if there are something wrong. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 2.7.6 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453-branch-2.7.007.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) >
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353143#comment-16353143 ] Chen Liang commented on HDFS-10453: --- This is a very tricky case, thanks [~hexiaoqiao] for working on this, really appreciate! I've only looked through v6 patch on branch-2.7, and I've got a question. Given the change in {{BlockPlacementPolicyDefault#chooseTarget}} that, if the size is {{NO_ACK}}, it immediately returns an empty array, do we still really need the change in {{BlockManager#computeReplicationWorkForBlocks}}? Because with the change in {{chooseTarget}}, I think {{rw.chooseTargets(...);}} would set {{rw.targets}} to empty array, then in {{computeReplicationWorkForBlocks}}, {{if(targets == null || targets.length == 0)}} will be true and the {{rw}} gets skipped, this invalidate rw will be eventually be removed from {{neededReplications}} anyway. In addition, the check {{blocksize == BlockCommand.NO_ACK}} in {{BlockPlacementPolicyDefault}} seems a bit hacky. Because I think this flag {{NO_ACK}} only specifically means "an indicator of no need for DN to ack", but we are using it here as "an indicator that the block does not need placement". Can't think of a better easy alternative though, ideally, we may need another flag to indicate blocks being removed. But for now at least we can do something to make this easier to track in the future, such as: 1. add some explanation comments on what this check is about, i.e. why NO_ACK is against blockSize. 2. maybe move this check to merge in the check in L196 {{if (numOfReplicas == 0 || clusterMap.getNumOfLeaves()==0)}} > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 2.7.6 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16353107#comment-16353107 ] Ajay Kumar commented on HDFS-10453: --- [~hexiaoqiao] {code} if (targets == null || targets.length == 0) { rw.targets = null; continue; } {code} In latest patch we are not removing block from {{neededReplications}}. Is this intentional as you mentioned " ReplicationMonitor#run will continue to chooseTargets for the same block". Since we are returning empty targets from {{BlockPlacementPolicyDefault#chooseTarget}} for blocks with {{blocksize == BlockCommand.NO_ACK}} we should remove them from {{neededReplications}} as was the case in previous patch. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 2.7.6 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16352702#comment-16352702 ] Erik Krogen commented on HDFS-10453: Hi [~hexiaoqiao], good catch on discovering and diagnosing this issue. We have seen it as well. I agree with your last comment that we can use the check for the null block collection; when the size is set to {{NO_ACK}} in {{BlockManager#removeBlock()}}, the block collection is also set to null. A deleted block should always be removed from the {{needingReplications}} list regardless of whether or not any targets were found for it, so it makes sense to perform this check before the check for an empty targets list. This change does mean that, if no targets are returned, we have to acquire the lock on {{neededReplications}} whereas as we did not previously, but I think this situation is infrequent enough that it is not an issue. My one comment on the v006 patch main code: why do you choose to add the {{blocksize == BlockCommand.NO_ACK}} check within {{BlockPlacementPolicyDefault#chooseTarget(String, int, Node, List, boolean, Set, long, BlockStoragePolicy)}}, currently just a delegation method, rather than {{BlockPlacementPolicyDefault#chooseTarget(int, Node, List, boolean, Set, long, BlockStoragePolicy)}}, where the implementation lives? It seems it would be better to keep this logic centralized in the implementation method. Few comments on the test as well: * We should set {{DFSConfigKeys.DFS_NAMENODE_REPLICATION_INTERVAL_KEY}} to be a lower value (i.e. 1) to avoid the test taking longer than necessary. * The line creating the MiniDFSCluster is too long. Not sure why checkstyle is not complaining. * Can we use {{GenericTestUtils#waitFor()}} instead of a {{sleep()}} to be more robust about waiting for the ReplicationMonitor to remove the deleted block from its list? > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 2.7.6 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: >
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351434#comment-16351434 ] He Xiaoqiao commented on HDFS-10453: hi [~ajayydv] {quote}This check {{if(bc == null || (bc.isUnderConstruction() && block.equals(bc.getLastBlock(}} in {{BlockManager#computeReplicationWorkForBlocks}} exists already. So, if it was handling this case we will not face it. I think this additional check for {{block.getNumBytes() == BlockCommand.NO_ACK}} is required. {quote} Since BlockManager#computeReplicationWorkForBlocks checks if \{{(bc == null || (bc.isUnderConstruction() && block.equals(bc.getLastBlock()))}} after check if \{{target == null}} which is always true, then the block will not be removed from {{neededReplications}}, ReplicationMonitor#run will continue to chooseTargets for the same block and no node will be selected even if traverse all nodes of cluster in next loop and so on. This case will avoid if check \{{block == null}} before \{{target == null}}. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 2.7.6 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350826#comment-16350826 ] Ajay Kumar commented on HDFS-10453: --- [~hexiaoqiao], {quote} I think it can cover block.getNumBytes() == BlockCommand.NO_ACK since bc which take out from blocksmap is NULL now when it has been deleted. Thus it is not necessary to check if block's size is BlockCommand.NO_ACK.{quote} This check {{if(bc == null || (bc.isUnderConstruction() && block.equals(bc.getLastBlock(}} in {{BlockManager#computeReplicationWorkForBlocks}} exists already. So, if it was handling this case we will not face it. I think this additional check for {{block.getNumBytes() == BlockCommand.NO_ACK}} is required. Lets confirm with few experts in community. [~arpitagarwal],[~ajisakaa] Mind having a look? > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 2.7.6 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350705#comment-16350705 ] He Xiaoqiao commented on HDFS-10453: ping [~ajayydv], please share your feedback and give some suggestions at your convenience. thanks again. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 2.7.6 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349730#comment-16349730 ] He Xiaoqiao commented on HDFS-10453: [~ajayydv] Thanks for your reviews. In the lastest patch, I check if {{BlockCollection}} of {{block}} is null firstly, I think it can cover {{block.getNumBytes() == BlockCommand.NO_ACK}} since {{bc}} which take out from {{blocksmap}} is NULL now when it has been deleted. Thus it is not necessary to check if block's size is BlockCommand.NO_ACK. FYI. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 2.7.6 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349077#comment-16349077 ] Steve Loughran commented on HDFS-10453: --- Not my area of expertise at all; I'm not safe to have opinions on it. Sorry > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 2.7.6 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348942#comment-16348942 ] Ajay Kumar commented on HDFS-10453: --- [~hexiaoqiao], thanks for updating the patch. New patch doesn't contains your initial check: {code} if (rw.block.getNumBytes() == BlockCommand.NO_ACK) { // remove from neededReplications while block has deleted. neededReplications.remove(rw.block, rw.priority); neededReplications.decrementReplicationIndex(rw.priority); } {code} I was suggesting something like this: {code} // abandoned block or block reopened for append or deleted block if(bc == null || (bc.isUnderConstruction() && block.equals(bc .getLastBlock())) || (block.getNumBytes() == BlockCommand.NO_ACK)) { neededReplications.remove(block, priority); // remove from neededReplications rw.targets = null; neededReplications.decrementReplicationIndex(priority); continue; } {code} [~ste...@apache.org],[~xyao],[~andrew.wang] mind to have a look and share you feedback? > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Fix For: 2.7.6 > > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to >
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348172#comment-16348172 ] genericqa commented on HDFS-10453: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 15m 11s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2.7 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 57s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 8s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 5s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 5s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s{color} | {color:green} branch-2.7 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange} 0m 25s{color} | {color:orange} hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 237 unchanged - 1 fixed = 238 total (was 238) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 60 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 47s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 93m 1s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 1m 20s{color} | {color:red} The patch generated 204 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}135m 9s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Unreaped Processes | hadoop-hdfs:22 | | Failed junit tests | hadoop.hdfs.TestBlockReaderLocal | | | hadoop.hdfs.TestDatanodeDeath | | | hadoop.hdfs.TestSetrepIncreasing | | | hadoop.hdfs.TestDataTransferProtocol | | | hadoop.hdfs.TestDFSFinalize | | Timed out junit tests | org.apache.hadoop.hdfs.TestLeaseRecovery2 | | | org.apache.hadoop.hdfs.TestDatanodeRegistration | | | org.apache.hadoop.hdfs.TestDFSClientFailover | | | org.apache.hadoop.hdfs.TestDFSClientRetries | | | org.apache.hadoop.hdfs.web.TestWebHdfsTokens | | | org.apache.hadoop.hdfs.TestDFSInotifyEventInputStream | | | org.apache.hadoop.hdfs.TestFileAppendRestart | | | org.apache.hadoop.hdfs.TestSeekBug | | | org.apache.hadoop.hdfs.TestDFSMkdirs | | | org.apache.hadoop.hdfs.TestDatanodeReport | | | org.apache.hadoop.hdfs.web.TestWebHDFS | | | org.apache.hadoop.hdfs.web.TestWebHDFSXAttr | | | org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes | | | org.apache.hadoop.hdfs.TestDFSRollback | | | org.apache.hadoop.hdfs.TestMiniDFSCluster | | | org.apache.hadoop.hdfs.web.TestFSMainOperationsWebHdfs | | | org.apache.hadoop.hdfs.TestDistributedFileSystem | | | org.apache.hadoop.hdfs.web.TestWebHDFSForHA | | | org.apache.hadoop.hdfs.TestSetTimes | | | org.apache.hadoop.hdfs.TestDFSShell | | | org.apache.hadoop.hdfs.web.TestWebHDFSAcl | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:ea57d10 | | JIRA Issue |
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16348056#comment-16348056 ] He Xiaoqiao commented on HDFS-10453: [~ajayydv] Thank you for your suggestion, I just attach new patch [#HDFS-10453-branch-2.7.006.patch] for branch-2.7 and first check if {{block}} is abandoned or reopen for append, thus it can avoid choose target fail for deleted blocks endless loop. FYI. please correct me if i am wrong, Thanks again. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453-branch-2.7.006.patch, > HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail:
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347753#comment-16347753 ] Ajay Kumar commented on HDFS-10453: --- Hi [~hexiaoqiao], Thanks for working on this. Patch looks good to me. One minor suggestion, I think we can simplify the patch a bit my merging the new check \{{if (rw.block.getNumBytes() == BlockCommand.NO_ACK)}}{{ with {{}}if(bc == null || (bc.isUnderConstruction() && block.equals(bc.getLastBlock({{inside \{{computeReplicationWorkForBlocks}} L1501. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao >Assignee: He Xiaoqiao >Priority: Major > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail:
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16288578#comment-16288578 ] Xiang Li commented on HDFS-10453: - Agree. Thanks xiaoqiao! > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453-branch-2.7.005.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16287441#comment-16287441 ] genericqa commented on HDFS-10453: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 18m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2.7 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 54s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 24s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 0s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 54s{color} | {color:green} branch-2.7 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 47s{color} | {color:green} branch-2.7 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 60 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}106m 9s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 1m 15s{color} | {color:red} The patch generated 1 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}148m 22s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Unreaped Processes | hadoop-hdfs:25 | | Failed junit tests | hadoop.hdfs.TestBlocksScheduledCounter | | | hadoop.hdfs.TestListFilesInDFS | | Timed out junit tests | org.apache.hadoop.hdfs.TestLeaseRecovery2 | | | org.apache.hadoop.hdfs.TestDatanodeRegistration | | | org.apache.hadoop.hdfs.TestDFSClientFailover | | | org.apache.hadoop.hdfs.TestRead | | | org.apache.hadoop.security.TestPermission | | | org.apache.hadoop.hdfs.web.TestWebHdfsTokens | | | org.apache.hadoop.hdfs.TestDFSInotifyEventInputStream | | | org.apache.hadoop.hdfs.TestFileAppendRestart | | | org.apache.hadoop.hdfs.TestReadWhileWriting | | | org.apache.hadoop.hdfs.security.TestDelegationToken | | | org.apache.hadoop.hdfs.TestDFSOutputStream | | | org.apache.hadoop.hdfs.TestDFSMkdirs | | | org.apache.hadoop.security.TestRefreshUserMappings | | | org.apache.hadoop.hdfs.TestDatanodeReport | | | org.apache.hadoop.hdfs.web.TestWebHDFS | | | org.apache.hadoop.hdfs.web.TestWebHDFSXAttr | | | org.apache.hadoop.hdfs.web.TestWebHdfsWithMultipleNameNodes | | | org.apache.hadoop.hdfs.TestMiniDFSCluster | | | org.apache.hadoop.hdfs.web.TestFSMainOperationsWebHdfs | | | org.apache.hadoop.hdfs.TestDistributedFileSystem | | | org.apache.hadoop.hdfs.web.TestWebHDFSForHA | | | org.apache.hadoop.hdfs.TestDFSShell | | | org.apache.hadoop.hdfs.web.TestWebHDFSAcl | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:67e87c9 | | JIRA Issue | HDFS-10453 | | JIRA Patch URL |
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16287059#comment-16287059 ] Xiang Li commented on HDFS-10453: - [~hexiaoqiao], thanks for the patch and quick update! Shall we need to call {{neededReplications.decrementReplicationIndex(priority)}} after {{neededReplications.remove(rw.block, rw.priority)}} to make it like {code} if (rw.block.getNumBytes() == BlockCommand.NO_ACK) { // remove from neededReplications while block has deleted. neededReplications.remove(rw.block, rw.priority); neededReplications.remove(rw.priority) // <-- here } {code} I am not quite familiar with those code, please advise > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281696#comment-16281696 ] He Xiaoqiao commented on HDFS-10453: [~Octivian] The new patch is ready and update based on you mentioned above, FYI. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453-branch-2.7.004.patch, > HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281571#comment-16281571 ] He Xiaoqiao commented on HDFS-10453: [~Octivian] [~genericqa] Thanks for your comments and tests. Actually you are right, it needs add lock to {{neededReplications}} exactly. In our production env, this patch has update with synchronized of {{neededReplications}}. I will update this patch for a moment. Thanks again. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281548#comment-16281548 ] genericqa commented on HDFS-10453: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 19m 0s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 51s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 49s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 31s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 9s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s{color} | {color:green} branch-2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}107m 16s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 1m 16s{color} | {color:red} The patch generated 207 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}151m 56s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Unreaped Processes | hadoop-hdfs:24 | | Failed junit tests | hadoop.hdfs.server.datanode.TestDataNodeUUID | | | hadoop.hdfs.server.datanode.checker.TestThrottledAsyncChecker | | | hadoop.hdfs.TestEncryptedTransfer | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshotMetrics | | | hadoop.hdfs.server.namenode.snapshot.TestSnapshottableDirListing | | | hadoop.hdfs.TestDFSPermission | | | hadoop.hdfs.server.datanode.TestBatchIbr | | | hadoop.hdfs.server.namenode.TestNestedEncryptionZones | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistPolicy | | | hadoop.hdfs.server.datanode.TestBpServiceActorScheduler | | | hadoop.hdfs.server.federation.router.TestRouter | | | hadoop.hdfs.server.datanode.metrics.TestDataNodeOutlierDetectionViaMetrics | | | hadoop.hdfs.server.federation.metrics.TestFederationMetrics | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestInterDatanodeProtocol | | | hadoop.hdfs.server.namenode.snapshot.TestNestedSnapshots | | | hadoop.hdfs.server.blockmanagement.TestNameNodePrunesMissingStorages | | | hadoop.hdfs.TestSetTimes | | | hadoop.hdfs.server.blockmanagement.TestHeartbeatHandling | | | hadoop.hdfs.server.blockmanagement.TestPendingInvalidateBlock | | | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithDFS | | | hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistLockedMemory | | | hadoop.hdfs.TestDatanodeDeath | | | hadoop.hdfs.server.blockmanagement.TestReplicationPolicyConsiderLoad | | | hadoop.hdfs.server.datanode.TestDataNodeTransferSocketSize | | | hadoop.hdfs.server.datanode.TestBlockCountersInPendingIBR | | | hadoop.hdfs.server.datanode.TestDataNodeVolumeMetrics
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16281396#comment-16281396 ] Xiang Li commented on HDFS-10453: - Hi [~hexiaoqiao], we met the same issue and thanks for the patch! I read your patch and it seems no lock is added when trying to remove rw from neededReplications. Could it be something like? {code} synchronized (neededReplications) { if (rw.getBlock().getNumBytes() == BlockCommand.NO_ACK) { //remove from neededReconstruction while block has deleted. neededReplications.remove(rw.getBlock(), rw.getPriority()); } } {code} Could you help to explain a little more why you do not place a lock of neededReplications here? > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15756632#comment-15756632 ] He Xiaoqiao commented on HDFS-10453: [~yangyishan0901m] patch is ready and it works well for long times in our production env as expected. you can patch and test it for yourself. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755580#comment-15755580 ] Yishan Yang commented on HDFS-10453: Any update? Whether community wanna accept this fix? Thanks! > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710653#comment-15710653 ] Taklon Stephen Wu commented on HDFS-10453: -- +1, could someone please review this patch again? > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710645#comment-15710645 ] He Xiaoqiao commented on HDFS-10453: [~wutak...@amazon.com] Thanks for your comments, The patch is ready, and i think the failure tests are not related to this patch. Actually, This bugfix has run on our production environment over half a year and the exception does not appear. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710641#comment-15710641 ] He Xiaoqiao commented on HDFS-10453: [~wutak...@amazon.com] Thanks for your comments, The patch is ready, and i think the failure tests are not related to this patch. Actually, This bugfix has run on our production environment over half a year and the exception does not appear. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710643#comment-15710643 ] He Xiaoqiao commented on HDFS-10453: [~wutak...@amazon.com] Thanks for your comments, The patch is ready, and i think the failure tests are not related to this patch. Actually, This bugfix has run on our production environment over half a year and the exception does not appear. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15709995#comment-15709995 ] Hadoop QA commented on HDFS-10453: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 14m 25s{color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 38s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} branch-2 passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} branch-2 passed with JDK v1.7.0_121 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 32s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 56s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 17s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 13s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s{color} | {color:green} branch-2 passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 52s{color} | {color:green} branch-2 passed with JDK v1.7.0_121 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 45s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 18s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} the patch passed with JDK v1.8.0_111 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s{color} | {color:green} the patch passed with JDK v1.7.0_121 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 53m 38s{color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_121. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 31s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}154m 55s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_111 Failed junit tests | hadoop.hdfs.server.namenode.TestNameNodeMetricsLogger | | | hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation | | | hadoop.hdfs.TestBlockStoragePolicy | | JDK v1.7.0_121 Failed junit tests | hadoop.hdfs.server.datanode.TestFsDatasetCache | | | hadoop.metrics2.sink.TestRollingFileSystemSinkWithSecureHdfs | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:b59b8b7 | | JIRA Issue | HDFS-10453 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12806883/HDFS-10453-branch-2.003.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 6d0e25f9ff3c 3.13.0-92-generic #139-Ubuntu SMP Tue Jun 28 20:42:26 UTC 2016
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15709564#comment-15709564 ] Taklon Stephen Wu commented on HDFS-10453: -- is this issue still ongoing? > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.5.2, 2.7.1, 2.6.4 >Reporter: He Xiaoqiao > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15306312#comment-15306312 ] He Xiaoqiao commented on HDFS-10453: hi [~vinayrpet], thanks for your comments. Actually this case occurs at Step 2 in {{BlockManager#computeReconstructionWorkForBlocks()}} which is not with global-lock held as you metioned. bq. 2. For blocks in work list, targets will be chosen. {{rw.chooseTargets}} will stuck for long time in the following scenarios: 1. *Blocks* are chosen to be reconstructed at {{neededReconstruction#chooseLowRedundancyBlocks(blocksToProcess)}} with global-lock held; 2. BlockReconstructionWork list {{reconWork}} will be created with the *blocks* which are exactly present at that time in {{scheduleReplication()}} also with global-lock. After creating list, lock is released. 3. Deletion happens at {{BlockManager#removeBlock(BlockInfo block)}} with global lock held. it is a critical parts of this case, {code} public void removeBlock(BlockInfo block) { assert namesystem.hasWriteLock(); // No need to ACK blocks that are being removed entirely // from the namespace, since the removal of the associated // file already removes them from the block map below. block.setNumBytes(BlockCommand.NO_ACK); addToInvalidates(block); removeBlockFromMap(block); // Remove the block from pendingReconstruction and neededReconstruction pendingReconstruction.remove(block); neededReconstruction.remove(block, LowRedundancyBlocks.LEVEL); if (postponedMisreplicatedBlocks.remove(block)) { postponedMisreplicatedBlocksCount.decrementAndGet(); } } {code} After {{removeBlock(BlockInfo block)}} *numbytes* of the block is set to BlockCommand.NO_ACK (=Long.MAX_VALUE), block is delete from {{neededReconstruction}},{{pendingReconstruction}} and {{blocksMap}}, but it is still referenced by {{reconWork}} which is local variable of {{BlockManager #computeReconstructionWorkForBlocks}}. 4. Choose target for each block in {{reconWork}}, but no Node could be selected after *traverse whole cluster* at this moment since numbytes of this block is Long.MAX_VALUE. if there are multiple blocks as depict above in a large cluster, Step 2 as flow below of {{BlockManager#computeReconstructionWorkForBlocks()}} will cost long time. it could be above *10 min* in our online cluster. {code} // Step 2: choose target nodes for each reconstruction task final Set excludedNodes = new HashSet<>(); for(BlockReconstructionWork rw : reconWork){ // Exclude all of the containing nodes from being targets. // This list includes decommissioning or corrupt nodes. excludedNodes.clear(); for (DatanodeDescriptor dn : rw.getContainingNodes()) { excludedNodes.add(dn); } // choose replication targets: NOT HOLDING THE GLOBAL LOCK // It is costly to extract the filename for which chooseTargets is called, // so for now we pass in the block collection itself. final BlockPlacementPolicy placementPolicy = placementPolicies.getPolicy(rw.getBlock().isStriped()); rw.chooseTargets(placementPolicy, storagePolicySuite, excludedNodes); } {code} 5. The rest processing as usually. FYI. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.7.1 >Reporter: He Xiaoqiao > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to >
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15306248#comment-15306248 ] Vinayakumar B commented on HDFS-10453: -- In Branch-2 code, {{BlockManager#computeReconstructionWorkForBlocks()}}, flow goes like below before reaching the part of the code where you have provided the patch, 1. With global-lock held, {{work}} list will be created with the blocks which are exactly present at that time in {{scheduleReplication()}}, lock will be released after creating the list. {code}if (block.isDeleted() || !block.isCompleteOrCommitted()) { // remove from neededReplications neededReplications.remove(block, priority); return null; }{code} 2. For blocks in {{work}} list, targets will be chosen. 3. Global lock will be acquired again, recheck happens for all {{ReplicationWork}} blocks which have got targets, whether block deleted or not during step #2. Now only blocks which are not deleted during this time will be proceeded for replication. {code}if (block.isDeleted() || !block.isCompleteOrCommitted()) { neededReplications.remove(block, priority); rw.resetTargets(); return false; }{code} Therefore the problem you mentioned in the description does not occur IMO. Correct me, If I am wrong. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.4.1, 2.7.1 >Reporter: He Xiaoqiao > Attachments: HDFS-10453-branch-2.001.patch, > HDFS-10453-branch-2.003.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15305917#comment-15305917 ] Hadoop QA commented on HDFS-10453: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 13m 53s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 29s {color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s {color} | {color:green} branch-2 passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 41s {color} | {color:green} branch-2 passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 29s {color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 50s {color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 14s {color} | {color:green} branch-2 passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 54s {color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in branch-2 has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s {color} | {color:green} branch-2 passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 45s {color} | {color:green} branch-2 passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 45s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s {color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s {color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 5s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s {color} | {color:green} the patch passed with JDK v1.8.0_91 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 41s {color} | {color:green} the patch passed with JDK v1.7.0_101 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 60m 31s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.8.0_91. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 58m 17s {color} | {color:red} hadoop-hdfs in the patch failed with JDK v1.7.0_101. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 157m 14s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_91 Failed junit tests | hadoop.hdfs.TestDistributedFileSystem | | | hadoop.hdfs.server.blockmanagement.TestRBWBlockInvalidation | | | hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs | | JDK v1.7.0_101 Failed junit tests | hadoop.hdfs.TestDistributedFileSystem | | | hadoop.metrics2.sink.TestRollingFileSystemSinkWithHdfs | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:babe025 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12806883/HDFS-10453-branch-2.003.patch | | JIRA Issue | HDFS-10453 | | Optional Tests
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15305222#comment-15305222 ] He Xiaoqiao commented on HDFS-10453: hi [~shahrs87], thanks a lot for your watching this issue. It would be helpful with ur reviews and more suggestions. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Reporter: He Xiaoqiao > Attachments: HDFS-10453-branch-2.001.patch, HDFS-10453.001.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip chooseTarget for BlockCommand.NO_ACK block > and remove it from neededReplications. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299642#comment-15299642 ] Hadoop QA commented on HDFS-10453: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 28s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 59s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 51s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 53s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 57s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 9s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 82m 34s {color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 103m 42s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength | | | hadoop.hdfs.TestAsyncDFSRename | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:2c91fd8 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12806074/HDFS-10453.001.patch | | JIRA Issue | HDFS-10453 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 17b64327d605 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 28bd63e | | Default Java | 1.8.0_91 | | findbugs | v3.0.0 | | whitespace | https://builds.apache.org/job/PreCommit-HDFS-Build/15556/artifact/patchprocess/whitespace-eol.txt | | unit | https://builds.apache.org/job/PreCommit-HDFS-Build/15556/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | unit test logs | https://builds.apache.org/job/PreCommit-HDFS-Build/15556/artifact/patchprocess/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt | | Test Results | https://builds.apache.org/job/PreCommit-HDFS-Build/15556/testReport/ | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/15556/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > ReplicationMonitor thread could stuck for long time due to the race
[jira] [Commented] (HDFS-10453) ReplicationMonitor thread could stuck for long time due to the race between replication and delete of same file in a large cluster.
[ https://issues.apache.org/jira/browse/HDFS-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297832#comment-15297832 ] Hadoop QA commented on HDFS-10453: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} | {color:red} HDFS-10453 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12805826/HDFS-10453.patch | | JIRA Issue | HDFS-10453 | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/15539/console | | Powered by | Apache Yetus 0.2.0 http://yetus.apache.org | This message was automatically generated. > ReplicationMonitor thread could stuck for long time due to the race between > replication and delete of same file in a large cluster. > --- > > Key: HDFS-10453 > URL: https://issues.apache.org/jira/browse/HDFS-10453 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode >Affects Versions: 2.7.1 >Reporter: He Xiaoqiao > Fix For: 2.7.1 > > Attachments: HDFS-10453.patch > > > ReplicationMonitor thread could stuck for long time and loss data with little > probability. Consider the typical scenario: > (1) create and close a file with the default replicas(3); > (2) increase replication (to 10) of the file. > (3) delete the file while ReplicationMonitor is scheduling blocks belong to > that file for replications. > if ReplicationMonitor stuck reappeared, NameNode will print log as: > {code:xml} > 2016-04-19 10:20:48,083 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > .. > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) For more information, please enable DEBUG log level on > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.protocol.BlockStoragePolicy: Failed to place enough > replicas: expected size is 7 but only 0 storage types can be selected > (replication=10, selected=[], unavailable=[DISK, ARCHIVE], removed=[DISK, > DISK, DISK, DISK, DISK, DISK, DISK], policy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}) > 2016-04-19 10:21:17,184 WARN > org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy: Failed to > place enough replicas, still in need of 7 to reach 10 > (unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, > newBlock=false) All required storage types are unavailable: > unavailableStorages=[DISK, ARCHIVE], storagePolicy=BlockStoragePolicy{HOT:7, > storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]} > {code} > This is because 2 threads (#NameNodeRpcServer and #ReplicationMonitor) > process same block at the same moment. > (1) ReplicationMonitor#computeReplicationWorkForBlocks get blocks to > replicate and leave the global lock. > (2) FSNamesystem#delete invoked to delete blocks then clear the reference in > blocksmap, needReplications, etc. the block's NumBytes will set > NO_ACK(Long.MAX_VALUE) which is used to indicate that the block deletion does > not need explicit ACK from the node. > (3) ReplicationMonitor#computeReplicationWorkForBlocks continue to > chooseTargets for the same blocks and no node will be selected after traverse > whole cluster because no node choice satisfy the goodness criteria > (remaining spaces achieve required size Long.MAX_VALUE). > During of stage#3 ReplicationMonitor stuck for long time, especial in a large > cluster. invalidateBlocks & neededReplications continues to grow and no > consumes. it will loss data at the worst. > This can mostly be avoided by skip