It's not that big of a deal, the source data were available on our other cluster in another DC and the rest could be recomputed from that but I just wanted to know.

Thanks for the reply, good to know.

The reason I ran the tool was that we recently added more datadirs (disks) per each datanode and the new datadirs were empty while others were almost full. It's a shame that native HDFS tools (such as balancer) aren't able to do inter-node volume rebalance.

We tried to add the new disks as the ARCHIVE storage so we could mark some old data as COLD but when I did that and ran the mover we hit bugs
https://issues.apache.org/jira/browse/HDFS-8770
https://issues.apache.org/jira/browse/HDFS-9661
which made me decide to add new datadirs as regular DISK storage instead for the time being...

In the meantime cloudera's got the fix for the NN crash and I already know I'm able to patch the DN deadlock so we might try that again soon.

Thanks,

David Watzke

Dne 5.3.2016 v 23:13 Anu Engineer napsal(a):
I am so sorry to hear this, but I don’t think we have any tool at this point of time that can fix that layout issue and I don’t know enough about the volume-balancer tool to comment on other options.

If you are okay with losing some of your blocks ( since other nodes are in bad state too), you can decommission the node and just re-add it and wait for cluster to heal itself. We have been working on a tool to address disk balancing issue, if you are interested you can follow the progress of that tool in HDFS-1312.

—Anu

Ps. Just out of curiosity, can I ask you what prompted you to run this tool ? Did you replace a disk or where you running out of space on one disk on that node ?

From: David Watzke <[email protected] <mailto:[email protected]>>
Date: Saturday, March 5, 2016 at 6:47 AM
To: "[email protected] <mailto:[email protected]>" <[email protected] <mailto:[email protected]>>
Subject: datanode directory structure mess-up

Hi list,

I ran into trouble because I accidentally usedthis tool https://github.com/killerwhile/volume-balancerwith Hadoop 2.6.0 (just like that page warns you not to -- I used it successfully before and didn't think to check that page before using it again) and it messed up my datadirs because as I understand it that software now makes invalid assumptions about what directory moves can it do. Now the datanode logs are filled with these:

WARN org.apache.hadoop.hdfs.server.datanode.VolumeScanner: I/O error while finding block BP-680964103-A.B.C.D-1375882473930:blk_5822441067008155275_0 on volume /xyz/dfs/dn

What can I do to fix this? I don't know what files/dirs were moved and from where but is there a reasonable way out of this? Such as editing VERSION file to a previous version when DN is down so that it fixes the layout by itself - would that work?

Please note that I've lost the other replica due to a filesystem error so I can't just ignore it. This is literally my only option to recover some missing blocks.

Thanks,

--
David Watzke

Reply via email to