It's not that big of a deal, the source data were available on our other
cluster in another DC and the rest could be recomputed from that but I
just wanted to know.
Thanks for the reply, good to know.
The reason I ran the tool was that we recently added more datadirs
(disks) per each datanode and the new datadirs were empty while others
were almost full. It's a shame that native HDFS tools (such as balancer)
aren't able to do inter-node volume rebalance.
We tried to add the new disks as the ARCHIVE storage so we could mark
some old data as COLD but when I did that and ran the mover we hit bugs
https://issues.apache.org/jira/browse/HDFS-8770
https://issues.apache.org/jira/browse/HDFS-9661
which made me decide to add new datadirs as regular DISK storage instead
for the time being...
In the meantime cloudera's got the fix for the NN crash and I already
know I'm able to patch the DN deadlock so we might try that again soon.
Thanks,
David Watzke
Dne 5.3.2016 v 23:13 Anu Engineer napsal(a):
I am so sorry to hear this, but I don’t think we have any tool at this
point of time that can fix that layout issue and I don’t know enough
about the volume-balancer tool to comment on other options.
If you are okay with losing some of your blocks ( since other nodes
are in bad state too), you can decommission the node and just re-add
it and wait for cluster to heal itself.
We have been working on a tool to address disk balancing issue, if you
are interested you can follow the progress of that tool in HDFS-1312.
—Anu
Ps. Just out of curiosity, can I ask you what prompted you to run this
tool ? Did you replace a disk or where you running out of space on one
disk on that node ?
From: David Watzke <[email protected] <mailto:[email protected]>>
Date: Saturday, March 5, 2016 at 6:47 AM
To: "[email protected] <mailto:[email protected]>"
<[email protected] <mailto:[email protected]>>
Subject: datanode directory structure mess-up
Hi list,
I ran into trouble because I accidentally usedthis tool
https://github.com/killerwhile/volume-balancerwith Hadoop 2.6.0 (just
like that page warns you not to -- I used it successfully before and
didn't think to check that page before using it again) and it messed
up my datadirs because as I understand it that software now makes
invalid assumptions about what directory moves can it do. Now the
datanode logs are filled with these:
WARN org.apache.hadoop.hdfs.server.datanode.VolumeScanner: I/O error
while finding block
BP-680964103-A.B.C.D-1375882473930:blk_5822441067008155275_0 on volume
/xyz/dfs/dn
What can I do to fix this? I don't know what files/dirs were moved and
from where but is there a reasonable way out of this? Such as editing
VERSION file to a previous version when DN is down so that it fixes
the layout by itself - would that work?
Please note that I've lost the other replica due to a filesystem error
so I can't just ignore it. This is literally my only option to recover
some missing blocks.
Thanks,
--
David Watzke