Re: HDFS Missing Blocks / Corrupt Blocks Logic: What are the specific checks done to determine a block is bad and needs to be replicated?

Sanjeev Tripurari Thu, 22 Oct 2020 08:57:52 -0700

Hi Tom,

Can you start your datanode service, and share the datanode logs, check if
it is started properly or not.


Regards
-Sanjeev


On Thu, 22 Oct 2020 at 20:33, Austin Hackett <hacketta...@me.com> wrote:

> Hi Tom
>
> It might be worth restarting the DataNode process? I didn’t think you
> could disable the DataNode Web UI as such, but I could be wrong on this
> point. Out of interest, what does hdfs-site.xml say with regards
> to dfs.datanode.http.address/dfs.datanode.https.address?
>
> Regarding the logs, a quick look on GitHub suggests there may be a couple
> of useful log messages:
>
>
> https://github.com/apache/hadoop/blob/88a9f42f320e7c16cf0b0b424283f8e4486ef286/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockScanner.java
>
> For example, LOG.warn(“Periodic block scanner is not running”) or 
> LOG.info(“Initialized
> block scanner with targetBytesPerSec {}”).
>
> Of course, you’d need make sure those LOG statements are present in the
> Hadoop version included with CDH 6.3. Git “blame” suggests the LOG
> statements were added 6 years, so chance are you have them...
>
> Thanks
>
> Austin
>
> On 22 Oct 2020, at 14:44, TomK <tomk...@mdevsys.com> wrote:
>
> Thanks Austin.  However none of these are open on a standard Cloudera 6.3
> build.
>
> # netstat -pnltu|grep -Ei "9866|1004|9864|9865|1006|9867"
> #
>
> Would there be anything in the logs to indicate whether or not the block /
> volume scanner is running?
>
> Thx,
> TK
>
>
> On 10/22/2020 3:09 AM, Austin Hackett wrote:
>
> Hi Tom
>
> I not too familiar with the CDH distribution, but this page has the
> default ports used by DataNode:
>
>
> https://docs.cloudera.com/documentation/enterprise/latest/topics/cdh_ports.html
>
> I believe it’s the settings for 
> dfs.datanode.http.address/dfs.datanode.https.address
> that you’re interested in (9864/9865)
>
> Since the data block scanner related config parameters are not set, the
> defaults of 3 weeks and 1MB should be applied.
>
> Thanks
>
> Austin
>
> On 22 Oct 2020, at 06:35, TomK <tomk...@mdevsys.com> <tomk...@mdevsys.com>
> wrote:
>
> 
>
> Hey Austin, Sanjeev,
>
> Thanks once more!  Took some time to review the pages.  That was certainly
> very helpful.  Appreciated!
> However, I tried to access https://dn01/blockScannerReport on a test
> Cloudera 6.3 cluster.  Didn't work  Tried the following as well:
>
> http://dn01:50075/blockscannerreport?listblocks
>
> https://dn01:50075/blockscannerreport
>
>
> https://dn01:10006/blockscannerreport
>
> Checked that port 50075 is up ( netstat -pnltu ).  There's no service on
> that port on the workers.  Checked the pages:
>
>
> https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/cdh_ig_ports_cdh5.html
>
> It is defined on the pages.  Checked if the following is set:
>
> The following 2 configurations in *hdfs-site.xml *are the most used for
> block scanners.
>
>    -
>    -
>    - *dfs.block.scanner.volume.bytes.per.second*  to throttle the scan
>    bandwidth to configurable bytes per second. *Default value is 1M*.
>    Setting this to 0 will disable the block scanner.
>    - *dfs.datanode.scan.period.hours* to configure the scan period, which
>    defines how often a whole scan is performed. This should be set to a long
>    enough interval to really take effect, for the reasons explained above. 
> *Default
>    value is 3 weeks (504 hours)*. Setting this to 0 will use the default
>    value. Setting this to a negative value will disable the block scanner.
>
> These are NOT explicitly set.  Checked hdfs-site.xml.  Nothing defined
> there.  Checked the Configuration tab in the cluster.  It's not defined
> either.
>
> Does this mean that the defaults are applied OR does it mean that the
> block / volume scanner is disabled?  I see the pages detail what values for
> these settings mean but I didn't see any notes pertaining to the situation
> where both values are not explicitly set.
>
> Thx,
> TK
>
>
> On 10/21/2020 1:34 PM, संजीव (Sanjeev Tripurari) wrote:
>
> Yes Austin,
>
> you are right every datanode will do its block verification, which is send
> as health check report to the namenode
>
> Regards
> -Sanjeev
>
>
> On Wed, 21 Oct 2020 at 21:53, Austin Hackett <hacketta...@me.com> wrote:
>
>> Hi Tom
>>
>> It is my understanding that in addition to block verification on client
>> reads, each data node runs a DataBlockScanner in a background thread that
>> periodically verifies all the blocks stored on the data node. The
>> dfs.datanode.scan.period.hours property controls how often this
>> verification occurs.
>>
>> I think the reports are available via the data node /blockScannerReport
>> HTTP endpoint, although I’m not sure I ever actually looked at one. (add
>> ?listblocks to get the verification status of each block).
>>
>> More info here:
>> https://blog.cloudera.com/hdfs-datanode-scanners-and-disk-checker-explained/
>>
>> Thanks
>>
>> Austin
>>
>> On 21 Oct 2020, at 16:47, TomK <tomk...@mdevsys.com> wrote:
>>
>> Hey Sanjeev,
>>
>> Allright.  Thank you once more.  This is clear.
>>
>> However, this poses an issue then.  If during the two years, disk drives
>> develop bad blocks but do not necessarily fail to the point that they
>> cannot be mounted, that checksum would have changed since those filesystem
>> blocks can no longer be read.  However, from an HDFS perspective, since no
>> checks are done regularly, that is not known.   So HDFS still reports that
>> the file is fine, in other words, no missing blocks.  For example, if a
>> disk is going bad, but those files are not read for two years, the system
>> won't know that there is a problem.  Even when removing a data node
>> temporarily and re-adding the datanode, HDFS isn't checking because that
>> HDFS file isn't read.
>>
>> So let's assume this scenario.  Data nodes *dn01* to *dn10*  exist. Each
>> data node has 10 x 10TB drives.
>>
>> And let's assume that there is one large file on those drives and it's
>> replicated to factor of X3.
>> If during the two years the file isn't read, and 10 of those drives
>> develop bad blocks or other underlying hardware issues, then it is possible
>> that HDFS will still report everything fine, even with a replication factor
>> of 3.  Because with 10 disks failing, it's possible a block or sector has
>> failed under each of the 3 copies of the data.  But HDFS would NOT know
>> since nothing triggered a read of that HDFS file.  Based on everything
>> below, then corruption is very much possible even with a replication of
>> factor X3.  A this point the file is unreadable but HDFS still reports no
>> missing blocks.
>>
>> Similarly, if once I take a data node out, I adjust one of the files on
>> the data disks, HDFS will not know and still report everything fine.  That
>> is until someone read's the file.
>>
>> Sounds like this is a very real possibility.
>>
>> Thx,
>> TK
>>
>>
>> On 10/21/2020 10:26 AM, संजीव (Sanjeev Tripurari) wrote:
>>
>> Hi Tom
>>
>> Therefore, if I write a file to HDFS but access it two years later, then
>> the checksum will be computed only twice, at the beginning of the two years
>> and again at the end when a client connects?  Correct?  As long as no
>> process ever accesses the file between now and two years from now, the
>> checksum is never redone and compared to the two year old checksum in the
>> fsimage?
>>
>> yes, Exactly unless data is read checksum is not verified. (when data is
>> written and when the data is read),
>> if checksum is mismatched, there is no way to correct it, you will have
>> to re-write that file.
>>
>> When  datanode is added back in, there is no real read operation on the
>> files themselves.  The datanode just reports the blocks but doesn't really
>> read the blocks that are there to re-verify the files and ensure
>> consistency?
>>
>> yes, Exactly, datanode maintains list of files and their blocks, which it
>> reports, along with total disk size and used size.
>> Namenode only has list of blocks, unless datanodes is connected it wont
>> know where the blocks are stored.
>>
>> Regards
>> -Sanjeev
>>
>>
>> On Wed, 21 Oct 2020 at 18:31, TomK <tomk...@mdevsys.com> wrote:
>>
>>> Hey Sanjeev,
>>>
>>> Thank you very much again.  This confirms my suspision.
>>>
>>> Therefore, if I write a file to HDFS but access it two years later, then
>>> the checksum will be computed only twice, at the beginning of the two years
>>> and again at the end when a client connects?  Correct?  As long as no
>>> process ever accesses the file between now and two years from now, the
>>> checksum is never redone and compared to the two year old checksum in the
>>> fsimage?
>>>
>>> When  datanode is added back in, there is no real read operation on the
>>> files themselves.  The datanode just reports the blocks but doesn't really
>>> read the blocks that are there to re-verify the files and ensure
>>> consistency?
>>>
>>> Thx,
>>> TK
>>>
>>>
>>>
>>> On 10/21/2020 12:38 AM, संजीव (Sanjeev Tripurari) wrote:
>>>
>>> Hi Tom,
>>>
>>> Every datanode sends heartbeat to namenode, on its list of blocks it has.
>>>
>>> When a datanode which is disconnected for a while, after connecting will
>>> send heartbeat to namenode, with list of blocks it has (till then namenode
>>> will have under-replicated blocks).
>>> As soon as the datanode is connected to namenode, it will clear
>>> under-replicatred blocks.
>>>
>>> *When a client connects to read or write a file, it will run checksum to
>>> validate the file.*
>>>
>>> There is no independent process running to do checksum, as it will be
>>> heavy process on each node.
>>>
>>> Regards
>>> -Sanjeev
>>>
>>> On Wed, 21 Oct 2020 at 00:18, Tom <t...@mdevsys.com> wrote:
>>>
>>>> Thank you.  That part I understand and am Ok with it.
>>>>
>>>> What I would like to know next is when again the CRC32C checksum is ran
>>>> and checked against the fsimage that the block file has not changed or
>>>> become corrupted?
>>>>
>>>> For example, if I take a datanode out, and within 15 minutes, plug it
>>>> back in, does HDF rerun the CRC 32C on all data disks on that node to make
>>>> sure blocks are ok?
>>>>
>>>> Cheers,
>>>> TK
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Oct 20, 2020, at 1:39 PM, संजीव (Sanjeev Tripurari) <
>>>> sanjeevtripur...@gmail.com> wrote:
>>>>
>>>> its done as sson as  a file is stored on disk..
>>>>
>>>> Sanjeev
>>>>
>>>> On Tuesday, 20 October 2020, TomK <tomk...@mdevsys.com> wrote:
>>>>
>>>>> Thanks again.
>>>>>
>>>>> At what points is the checksum validated (checked) after that?  For
>>>>> example, is it done on a daily basis or is it done only when the file is
>>>>> accessed?
>>>>>
>>>>> Thx,
>>>>> TK
>>>>>
>>>>> On 10/20/2020 10:18 AM, संजीव (Sanjeev Tripurari) wrote:
>>>>>
>>>>> As soon as the file is written first time checksum is calculated and
>>>>> updated in fsimage (first in edit logs), and same is replicated other
>>>>> replicas.
>>>>>
>>>>>
>>>>>
>>>>> On Tue, 20 Oct 2020 at 19:15, TomK <tomk...@mdevsys.com> wrote:
>>>>>
>>>>>> Hi Sanjeev,
>>>>>>
>>>>>> Thank you.  It does help.
>>>>>>
>>>>>> At what points is the checksum calculated?
>>>>>>
>>>>>> Thx,
>>>>>> TK
>>>>>>
>>>>>> On 10/20/2020 3:03 AM, संजीव (Sanjeev Tripurari) wrote:
>>>>>>
>>>>>> For Missing blocks and corrupted blocks, do check if all the datanode
>>>>>> services are up, non of the disks where hdfs data is stored is accessible
>>>>>> and have no issues, hosts are reachable from namenode,
>>>>>>
>>>>>> If you are able to re-generate the data and write its great,
>>>>>> otherwise hadoop cannot correct itself.
>>>>>>
>>>>>> Could you please elaborate on this?  Does it mean I have to
>>>>>> continuously access a file for HDFS to be able to detect corrupt blocks 
>>>>>> and
>>>>>> correct itself?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *"Does HDFS check that the data node is up, data disk is mounted,
>>>>>> path to the file exists and file can be read?"*
>>>>>> -- yes, only after it fails it will say missing blocks.
>>>>>>
>>>>>>
>>>>>> *Or does it also do a filesystem check on that data disk as well as
>>>>>> perhaps a checksum to ensure block integrity?*
>>>>>> -- yes, every file cheksum is maintained and cross checked, if it
>>>>>> fails it will say corrupted blocks.
>>>>>>
>>>>>> hope this helps.
>>>>>>
>>>>>> -Sanjeev
>>>>>>
>>>>>>
>>>>>> On Tue, 20 Oct 2020 at 09:52, TomK <tomk...@mdevsys.com> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> HDFS Missing Blocks / Corrupt Blocks Logic:  What are the specific
>>>>>>> checks done to determine a block is bad and needs to be replicated?
>>>>>>>
>>>>>>> Does HDFS check that the data node is up, data disk is mounted, path
>>>>>>> to
>>>>>>> the file exists and file can be read?
>>>>>>>
>>>>>>> Or does it also do a filesystem check on that data disk as well as
>>>>>>> perhaps a checksum to ensure block integrity?
>>>>>>>
>>>>>>> I've googled on this quite a bit.  I don't see the exact answer I'm
>>>>>>> looking for.  I would like to know exactly what happens during file
>>>>>>> integrity verification that then constitutes missing blocks or
>>>>>>> corrupt
>>>>>>> blocks in the reports.
>>>>>>>
>>>>>>> --
>>>>>>> Thank  You,
>>>>>>> TK.
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
>>>>>>> For additional commands, e-mail: user-h...@hadoop.apache.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>> --
>>>>> Thx,
>>>>> TK.
>>>>>
>>>>
>>> --
>>> Thx,
>>> TK.
>>>
>>
>> --
>> Thx,
>> TK.
>>
>>
>>
> --
> Thx,
> TK.
>
>
> --
> Thx,
> TK.
>
>
>

Re: HDFS Missing Blocks / Corrupt Blocks Logic: What are the specific checks done to determine a block is bad and needs to be replicated?

Reply via email to