[ https://issues.apache.org/jira/browse/HDFS-16806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ruiliang resolved HDFS-16806. ----------------------------- Hadoop Flags: Reviewed Resolution: Fixed > ec data balancer block blk_id The index error ,Data cannot be moved > ------------------------------------------------------------------- > > Key: HDFS-16806 > URL: https://issues.apache.org/jira/browse/HDFS-16806 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs > Affects Versions: 3.1.0 > Reporter: ruiliang > Priority: Critical > Attachments: image-2022-10-20-11-32-35-833.png > > > ec data balancer block blk_id The index error ,Data cannot be moved > dn->10.12.15.149 use disk 100% > > {code:java} > echo 10.12.15.149>sorucehost > balancer -fs hdfs://xxcluster06 -threshold 10 -source -f sorucehost > 2>>~/balancer.log & {code} > > datanode logs > A lot of this log output > {code:java} > datanode logs > ... > 2022-10-19 14:43:02,031 ERROR datanode.DataNode (DataXceiver.java:run(321)) - > fs-hiido-dn-12-15-149.xx.com:1019:DataXceiver error processing COPY_BLOCK > operation src: /10.12.65.216:58214 dst: /10.12.15.149:1019 > org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not > found for > BP-1822992414-10.12.65.48-1660893388633:blk_-9223372036799576592_4218617 > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:492) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:256) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.copyBlock(DataXceiver.java:1089) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opCopyBlock(Receiver.java:291) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:113) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:290) > at java.lang.Thread.run(Thread.java:748) > ... > > hdfs fsck -fs hdfs://xxcluster06 -blockId blk_-9223372036799576592 > Connecting to namenode via > http://fs-hiido-xxcluster06-yynn2.xx.com:50070/fsck?ugi=hdfs&blockId=blk_-9223372036799576592+&path=%2F > FSCK started by hdfs (auth:KERBEROS_SSL) from /10.12.19.4 at Wed Oct 19 > 14:47:15 CST 2022Block Id: blk_-9223372036799576592 > Block belongs to: > /hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz > No. of Expected Replica: 5 > No. of live Replica: 5 > No. of excess Replica: 0 > No. of stale Replica: 5 > No. of decommissioned Replica: 0 > No. of decommissioning Replica: 0 > No. of corrupted Replica: 0 > Block replica on datanode/rack: fs-hiido-dn-12-66-4.xx.com/4F08-01-09 is > HEALTHY > Block replica on datanode/rack: fs-hiido-dn-12-65-244.xx.com/4F08-01-08 is > HEALTHY > Block replica on datanode/rack: fs-hiido-dn-12-15-149.xx.com/4F08-05-13 is > HEALTHY > Block replica on datanode/rack: fs-hiido-dn-12-65-218.xx.com/4F08-12-04 is > HEALTHY > Block replica on datanode/rack: fs-hiido-dn-12-17-35.xx.com/4F08-03-03 is > HEALTHY > hdfs fsck -fs hdfs://xxcluster06 > /hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz > -files -blocks -locations > Connecting to namenode via > http://xx.com:50070/fsck?ugi=hdfs&files=1&blocks=1&locations=1&path=%2Fhive_warehouse%2Fwarehouse_old_snapshots%2Fyy_mbsdkevent_original%2Fdt%3D20210505%2Fpost_202105052129_33.log.gz > FSCK started by hdfs (auth:KERBEROS_SSL) from /10.12.19.4 for path > /hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz > at Wed Oct 19 14:48:42 CST 2022 > /hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz > 500582412 bytes, erasure-coded: policy=RS-3-2-1024k, 1 block(s): OK > 0. BP-1822992414-10.12.65.48-1660893388633:blk_-9223372036799576592_4218617 > len=500582412 Live_repl=5 > [blk_-9223372036799576592:DatanodeInfoWithStorage[10.12.17.35:1019,DS-3ccebf8d-5f05-45b5-ac7f-96d1cfb48608,DISK], > > blk_-9223372036799576591:DatanodeInfoWithStorage[10.12.65.218:1019,DS-4f8e3114-7566-4cf1-ad5a-e454c8ea8805,DISK], > > blk_-9223372036799576590:DatanodeInfoWithStorage[10.12.15.149:1019,DS-1dd55c27-8f47-46a6-935b-1d9024ca9188,DISK], > > blk_-9223372036799576589:DatanodeInfoWithStorage[10.12.65.244:1019,DS-a9ffd747-c427-4aaa-8559-04cded7d9d5f,DISK], > > blk_-9223372036799576588:DatanodeInfoWithStorage[10.12.66.4:1019,DS-d88f94db-6db1-4753-a652-780d7cd7f081,DISK]] > Status: HEALTHY > Number of data-nodes: 62 > Number of racks: 19 > Total dirs: 0 > Total symlinks: 0Replicated Blocks: > Total size: 0 B > Total files: 0 > Total blocks (validated): 0 > Minimally replicated blocks: 0 > Over-replicated blocks: 0 > Under-replicated blocks: 0 > Mis-replicated blocks: 0 > Default replication factor: 3 > Average block replication: 0.0 > Missing blocks: 0 > Corrupt blocks: 0 > Missing replicas: 0Erasure Coded Block Groups: > Total size: 500582412 B > Total files: 1 > Total block groups (validated): 1 (avg. block group size 500582412 B) > Minimally erasure-coded block groups: 1 (100.0 %) > Over-erasure-coded block groups: 0 (0.0 %) > Under-erasure-coded block groups: 0 (0.0 %) > Unsatisfactory placement block groups: 0 (0.0 %) > Average block group size: 5.0 > Missing block groups: 0 > Corrupt block groups: 0 > Missing internal blocks: 0 (0.0 %) > FSCK ended at Wed Oct 19 14:48:42 CST 2022 in 1 milliseconds > The filesystem under path > '/hive_warehouse/warehouse_old_snapshots/yy_mbsdkevent_original/dt=20210505/post_202105052129_33.log.gz' > is HEALTHY > {code} > {code:java} > ssh 10.12.15.149 > ls -alh > /data*/hadoop/dfs/data/current/*/current/finalized/subdir*/subdir*/blk_-9223372036799576592 > > ls: cannot access > '/data*/hadoop/dfs/data/current/*/current/finalized/subdir*/subdir*/blk_-9223372036799576592': > No such file or directory > ls -alh > /data*/hadoop/dfs/data/current/*/current/finalized/subdir*/subdir*/blk_-9223372036799576590 > -rw-r--r-- 1 hdfs hdfs 159M Oct 18 20:02 > /data7/hadoop/dfs/data/current/BP-1822992414-10.12.65.48-1660893388633/current/finalized/subdir10/subdir5/blk_-9223372036799576590 > {code} > Should move blk_-9223372036799576590 -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org