Re: datanode directory structure mess-up

David Watzke Sat, 05 Mar 2016 14:36:45 -0800

It's not that big of a deal, the source data were available on our othercluster in another DC and the rest could be recomputed from that but Ijust wanted to know.


Thanks for the reply, good to know.

The reason I ran the tool was that we recently added more datadirs(disks) per each datanode and the new datadirs were empty while otherswere almost full. It's a shame that native HDFS tools (such as balancer)aren't able to do inter-node volume rebalance.

We tried to add the new disks as the ARCHIVE storage so we could marksome old data as COLD but when I did that and ran the mover we hit bugs

https://issues.apache.org/jira/browse/HDFS-8770
https://issues.apache.org/jira/browse/HDFS-9661

which made me decide to add new datadirs as regular DISK storage insteadfor the time being...

In the meantime cloudera's got the fix for the NN crash and I alreadyknow I'm able to patch the DN deadlock so we might try that again soon.


Thanks,

David Watzke

Dne 5.3.2016 v 23:13 Anu Engineer napsal(a):

I am so sorry to hear this, but I don’t think we have any tool at thispoint of time that can fix that layout issue and I don’t know enoughabout the volume-balancer tool to comment on other options.
If you are okay with losing some of your blocks ( since other nodesare in bad state too), you can decommission the node and just re-addit and wait for cluster to heal itself.We have been working on a tool to address disk balancing issue, if youare interested you can follow the progress of that tool in HDFS-1312.
—Anu
Ps. Just out of curiosity, can I ask you what prompted you to run thistool ? Did you replace a disk or where you running out of space on onedisk on that node ?
From: David Watzke <[email protected] <mailto:[email protected]>>
Date: Saturday, March 5, 2016 at 6:47 AM
To: "[email protected] <mailto:[email protected]>"<[email protected] <mailto:[email protected]>>
Subject: datanode directory structure mess-up

Hi list,
I ran into trouble because I accidentally usedthis toolhttps://github.com/killerwhile/volume-balancerwith Hadoop 2.6.0 (justlike that page warns you not to -- I used it successfully before anddidn't think to check that page before using it again) and it messedup my datadirs because as I understand it that software now makesinvalid assumptions about what directory moves can it do. Now thedatanode logs are filled with these:
WARN org.apache.hadoop.hdfs.server.datanode.VolumeScanner: I/O errorwhile finding blockBP-680964103-A.B.C.D-1375882473930:blk_5822441067008155275_0 on volume/xyz/dfs/dn
What can I do to fix this? I don't know what files/dirs were moved andfrom where but is there a reasonable way out of this? Such as editingVERSION file to a previous version when DN is down so that it fixesthe layout by itself - would that work?
Please note that I've lost the other replica due to a filesystem errorso I can't just ignore it. This is literally my only option to recoversome missing blocks.
Thanks,

--
David Watzke

Re: datanode directory structure mess-up

Reply via email to