Re: HDFS Missing Blocks / Corrupt Blocks Logic: What are the specific checks done to determine a block is bad and needs to be replicated?

Austin Hackett Wed, 21 Oct 2020 09:24:21 -0700

Hi Tom

It is my understanding that in addition to block verification on client reads, 
each data node runs a DataBlockScanner in a background thread that periodically 
verifies all the blocks stored on the data node. The 
dfs.datanode.scan.period.hours property controls how often this verification 
occurs.


I think the reports are available via the data node /blockScannerReport HTTP 
endpoint, although I’m not sure I ever actually looked at one. (add ?listblocks 
to get the verification status of each block).

More info here: 
https://blog.cloudera.com/hdfs-datanode-scanners-and-disk-checker-explained/

Thanks

Austin

> On 21 Oct 2020, at 16:47, TomK <tomk...@mdevsys.com> wrote:
> 
> Hey Sanjeev,
> 
> Allright.  Thank you once more.  This is clear. 
> 
> However, this poses an issue then.  If during the two years, disk drives 
> develop bad blocks but do not necessarily fail to the point that they cannot 
> be mounted, that checksum would have changed since those filesystem blocks 
> can no longer be read.  However, from an HDFS perspective, since no checks 
> are done regularly, that is not known.   So HDFS still reports that the file 
> is fine, in other words, no missing blocks.  For example, if a disk is going 
> bad, but those files are not read for two years, the system won't know that 
> there is a problem.  Even when removing a data node temporarily and re-adding 
> the datanode, HDFS isn't checking because that HDFS file isn't read.
> 
> So let's assume this scenario.  Data nodes dn01 to dn10  exist. Each data 
> node has 10 x 10TB drives.
> And let's assume that there is one large file on those drives and it's 
> replicated to factor of X3.  
> 
> If during the two years the file isn't read, and 10 of those drives develop 
> bad blocks or other underlying hardware issues, then it is possible that HDFS 
> will still report everything fine, even with a replication factor of 3.  
> Because with 10 disks failing, it's possible a block or sector has failed 
> under each of the 3 copies of the data.  But HDFS would NOT know since 
> nothing triggered a read of that HDFS file.  Based on everything below, then 
> corruption is very much possible even with a replication of factor X3.  A 
> this point the file is unreadable but HDFS still reports no missing blocks.  
> 
> Similarly, if once I take a data node out, I adjust one of the files on the 
> data disks, HDFS will not know and still report everything fine.  That is 
> until someone read's the file.
> 
> Sounds like this is a very real possibility. 
> 
> Thx,
> TK
> 
> 
> On 10/21/2020 10:26 AM, संजीव (Sanjeev Tripurari) wrote:
>> Hi Tom
>> 
>> Therefore, if I write a file to HDFS but access it two years later, then the 
>> checksum will be computed only twice, at the beginning of the two years and 
>> again at the end when a client connects?  Correct?  As long as no process 
>> ever accesses the file between now and two years from now, the checksum is 
>> never redone and compared to the two year old checksum in the fsimage?
>> 
>> yes, Exactly unless data is read checksum is not verified. (when data is 
>> written and when the data is read), 
>> if checksum is mismatched, there is no way to correct it, you will have to 
>> re-write that file.
>> 
>> When  datanode is added back in, there is no real read operation on the 
>> files themselves.  The datanode just reports the blocks but doesn't really 
>> read the blocks that are there to re-verify the files and ensure consistency?
>> 
>> yes, Exactly, datanode maintains list of files and their blocks, which it 
>> reports, along with total disk size and used size.
>> Namenode only has list of blocks, unless datanodes is connected it wont know 
>> where the blocks are stored.
>> 
>> Regards
>> -Sanjeev
>> 
>> 
>> On Wed, 21 Oct 2020 at 18:31, TomK <tomk...@mdevsys.com 
>> <mailto:tomk...@mdevsys.com>> wrote:
>> Hey Sanjeev,
>> 
>> Thank you very much again.  This confirms my suspision.
>> 
>> Therefore, if I write a file to HDFS but access it two years later, then the 
>> checksum will be computed only twice, at the beginning of the two years and 
>> again at the end when a client connects?  Correct?  As long as no process 
>> ever accesses the file between now and two years from now, the checksum is 
>> never redone and compared to the two year old checksum in the fsimage?
>> 
>> When  datanode is added back in, there is no real read operation on the 
>> files themselves.  The datanode just reports the blocks but doesn't really 
>> read the blocks that are there to re-verify the files and ensure consistency?
>> 
>> Thx,
>> TK
>> 
>> 
>> 
>> On 10/21/2020 12:38 AM, संजीव (Sanjeev Tripurari) wrote:
>>> Hi Tom,
>>> 
>>> Every datanode sends heartbeat to namenode, on its list of blocks it has.
>>> 
>>> When a datanode which is disconnected for a while, after connecting will 
>>> send heartbeat to namenode, with list of blocks it has (till then namenode 
>>> will have under-replicated blocks).
>>> As soon as the datanode is connected to namenode, it will clear 
>>> under-replicatred blocks.
>>> 
>>> When a client connects to read or write a file, it will run checksum to 
>>> validate the file.
>>> 
>>> There is no independent process running to do checksum, as it will be heavy 
>>> process on each node.
>>> 
>>> Regards
>>> -Sanjeev
>>> 
>>> On Wed, 21 Oct 2020 at 00:18, Tom <t...@mdevsys.com 
>>> <mailto:t...@mdevsys.com>> wrote:
>>> Thank you.  That part I understand and am Ok with it.  
>>> 
>>> What I would like to know next is when again the CRC32C checksum is ran and 
>>> checked against the fsimage that the block file has not changed or become 
>>> corrupted?  
>>> 
>>> For example, if I take a datanode out, and within 15 minutes, plug it back 
>>> in, does HDF rerun the CRC 32C on all data disks on that node to make sure 
>>> blocks are ok?
>>> 
>>> Cheers,
>>> TK
>>> 
>>> Sent from my iPhone
>>> 
>>>> On Oct 20, 2020, at 1:39 PM, संजीव (Sanjeev Tripurari) 
>>>> <sanjeevtripur...@gmail.com <mailto:sanjeevtripur...@gmail.com>> wrote:
>>>> 
>>>> its done as sson as  a file is stored on disk.. 
>>>> 
>>>> Sanjeev 
>>>> 
>>>> On Tuesday, 20 October 2020, TomK <tomk...@mdevsys.com 
>>>> <mailto:tomk...@mdevsys.com>> wrote:
>>>> Thanks again.
>>>> 
>>>> At what points is the checksum validated (checked) after that?  For 
>>>> example, is it done on a daily basis or is it done only when the file is 
>>>> accessed?
>>>> 
>>>> Thx,
>>>> TK
>>>> 
>>>> On 10/20/2020 10:18 AM, संजीव (Sanjeev Tripurari) wrote:
>>>>> As soon as the file is written first time checksum is calculated and 
>>>>> updated in fsimage (first in edit logs), and same is replicated other 
>>>>> replicas.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Tue, 20 Oct 2020 at 19:15, TomK <tomk...@mdevsys.com 
>>>>> <mailto:tomk...@mdevsys.com>> wrote:
>>>>> Hi Sanjeev,
>>>>> 
>>>>> Thank you.  It does help. 
>>>>> 
>>>>> At what points is the checksum calculated?  
>>>>> 
>>>>> Thx,
>>>>> TK
>>>>> 
>>>>> On 10/20/2020 3:03 AM, संजीव (Sanjeev Tripurari) wrote:
>>>>>> For Missing blocks and corrupted blocks, do check if all the datanode 
>>>>>> services are up, non of the disks where hdfs data is stored is 
>>>>>> accessible and have no issues, hosts are reachable from namenode,
>>>>>> 
>>>>>> If you are able to re-generate the data and write its great, otherwise 
>>>>>> hadoop cannot correct itself.
>>>>> Could you please elaborate on this?  Does it mean I have to continuously 
>>>>> access a file for HDFS to be able to detect corrupt blocks and correct 
>>>>> itself?
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> "Does HDFS check that the data node is up, data disk is mounted, path to
>>>>>> the file exists and file can be read?"
>>>>>> -- yes, only after it fails it will say missing blocks.
>>>>>> 
>>>>>> Or does it also do a filesystem check on that data disk as well as
>>>>>> perhaps a checksum to ensure block integrity?
>>>>>> -- yes, every file cheksum is maintained and cross checked, if it fails 
>>>>>> it will say corrupted blocks.
>>>>>> 
>>>>>> hope this helps.
>>>>>> 
>>>>>> -Sanjeev
>>>>>> 
>>>>>> 
>>>>>> On Tue, 20 Oct 2020 at 09:52, TomK <tomk...@mdevsys.com 
>>>>>> <mailto:tomk...@mdevsys.com>> wrote:
>>>>>> Hello,
>>>>>> 
>>>>>> HDFS Missing Blocks / Corrupt Blocks Logic:  What are the specific 
>>>>>> checks done to determine a block is bad and needs to be replicated?
>>>>>> 
>>>>>> Does HDFS check that the data node is up, data disk is mounted, path to 
>>>>>> the file exists and file can be read?
>>>>>> 
>>>>>> Or does it also do a filesystem check on that data disk as well as 
>>>>>> perhaps a checksum to ensure block integrity?
>>>>>> 
>>>>>> I've googled on this quite a bit.  I don't see the exact answer I'm 
>>>>>> looking for.  I would like to know exactly what happens during file 
>>>>>> integrity verification that then constitutes missing blocks or corrupt 
>>>>>> blocks in the reports.
>>>>>> 
>>>>>> -- 
>>>>>> Thank  You,
>>>>>> TK.
>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org 
>>>>>> <mailto:user-unsubscr...@hadoop.apache.org>
>>>>>> For additional commands, e-mail: user-h...@hadoop.apache.org 
>>>>>> <mailto:user-h...@hadoop.apache.org>
>>>>>> 
>>>>> 
>>>> 
>>>> -- 
>>>> Thx,
>>>> TK.
>> 
>> -- 
>> Thx,
>> TK.
> 
> -- 
> Thx,
> TK.

Re: HDFS Missing Blocks / Corrupt Blocks Logic: What are the specific checks done to determine a block is bad and needs to be replicated?

Reply via email to