Re: HDFS datanode: rename local filesystem directories?

Bob Metelsky Tue, 14 Sep 2021 06:33:46 -0700

I think there are some fsck queries you can run where it will show the full 
path and then (MISSING) you can find that with google pretty easy.


Think of it… the namenode has to keep track where all the blocks are something 
like hostname/path that’s the job of the nn

If you can (I would) let all the missing block rebalance, during that time try 
to identify the missing file paths. Once you find that, introduce one drive at 
a time and see…

On Mon, Sep 13, 2021 at 7:37 PM, Andrew Chi <chi.and...@gmail.com> wrote:

> Thanks a bunch. Most of the data was replicated on other datanodes, but there 
> were some blocks that at the time of failure were only on the single datanode 
> with the failed drives.
>
> I did look at the namenode.log, but it seems that for each block, the central 
> log only provides the IP address of the datanode(s) on which the block is 
> replicated. That suggests to me that the datanode's local filesystem path 
> information is contained only on the datanode itself, but I can't figure out 
> where. Perhaps the directory doesn't matter as long as the storageID is 
> correct (in the current/VERSION file). But I'd like to verify this before 
> starting up the datanode and potentially corrupting the HDFS filesystem.
>
> Is there datanode state stored anywhere other than the */current/ and the 
> */lost+found/ directories?
>
> On Mon, Sep 13, 2021, 7:24 PM Bob Metelsky <bob.metel...@pm.me> wrote:
>
>> Just throwing some ideas out here...
>>
>> if all the failed drives were on one server, its likely the blocks are 
>> replicated on other nodes. So you can
>>
>> hdfs dfsadmin - report |head -13
>> and look for under replicated blocks
>> you can put that in a loop and watch the count go down, eventually you will 
>> be left with actual missing blocks
>>
>> while true
>> do
>> hdfs dfsadmin - report |head -13
>> sleep 600
>> done
>>
>> you can also run some queries
>> https://knpcode.com/hadoop/hdfs/how-to-fix-corrupt-blocks-and-under-replicated-blocks-in-hdfs/
>>
>> Its very likely most of the data is replicated on other disks/nodes
>> you may also get some insight to actual path names by tailing the 
>> namenode.log
>>
>> Just ideas off the top of my head
>>
>> Good luck
>>
>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>> On Monday, September 13th, 2021 at 7:03 PM, Bob Metelsky 
>> <bob.metel...@pm.me> wrote:
>>
>>> Hi berfore doing that, I would ls-ltR >
>>> filename.txt on each disk and see if there are hints/references to the 
>>> original file system. That may help provide a more meaningful path to to 
>>> HD’s-site.xml. Generally it sounds pretty close
>>>
>>> Let us know how it goes
>>>
>>> On Mon, Sep 13, 2021 at 5:59 PM, Andrew Chi <chi.and...@gmail.com> wrote:
>>>
>>>> I've had a recent drive failure that resulted in the removal of several 
>>>> drives from an HDFS datanode machine (Hadoop version 3.3.0). This caused 
>>>> Linux to rename half of the drives in /dev/*, with the result that when we 
>>>> mount the drives, the original directory mapping no longer exists. The 
>>>> data on those drives still exists, so this is equivalent to a renaming of 
>>>> the local filesystem directories.
>>>>
>>>> Originally, we had:
>>>> /hadoop/data/path/a
>>>> /hadoop/data/path/b
>>>> /hadoop/data/path/c
>>>>
>>>> Now we have:
>>>> /hadoop/data/path/x
>>>> /hadoop/data/path/y
>>>> /hadoop/data/path/z
>>>>
>>>> Where it's not clear how {a,b,c} map on to {x,y,z}. The blocks have been 
>>>> preserved within the directories, but the directories have essentially 
>>>> been randomly permuted.
>>>>
>>>> Can I simply go to hdfs-site.xml and change dfs.datanode.data.dir to the 
>>>> new list of comma-separated directories /hadoop/data/path/{x,y,z}? Will 
>>>> the datanode still work correctly when I start it back up?
>>>>
>>>> Thanks!
>>>> Andrew

Re: HDFS datanode: rename local filesystem directories?

Reply via email to