Re: HDFS Missing Blocks / Corrupt Blocks Logic: What are the specific checks done to determine a block is bad and needs to be replicated?

TomK Thu, 22 Oct 2020 06:44:49 -0700

Thanks Austin. However none of these are open on a standard Cloudera6.3 build.


# netstat -pnltu|grep -Ei "9866|1004|9864|9865|1006|9867"
#

Would there be anything in the logs to indicate whether or not the block/ volume scanner is running?


Thx,
TK


On 10/22/2020 3:09 AM, Austin Hackett wrote:

Hi Tom

I not too familiar with the CDH distribution, but this page has thedefault ports used by DataNode:

https://docs.cloudera.com/documentation/enterprise/latest/topics/cdh_ports.html<https://docs.cloudera.com/documentation/enterprise/latest/topics/cdh_ports.html>

I believe it’s the settings fordfs.datanode.http.address/dfs.datanode.https.address that you’reinterested in (9864/9865)

Since the data block scanner related config parameters are not set,the defaults of 3 weeks and 1MB should be applied.


Thanks

Austin

On 22 Oct 2020, at 06:35, TomK <tomk...@mdevsys.com> wrote:



Hey Austin, Sanjeev,

Thanks once more! Took some time to review the pages. That wascertainly very helpful. Appreciated!

However, I tried to access https://dn01/blockScannerReport on a testCloudera 6.3 cluster. Didn't work Tried the following as well:


http://dn01:50075/blockscannerreport?listblocks

https://dn01:50075/blockscannerreport


https://dn01:10006/blockscannerreport

Checked that port 50075 is up ( netstat -pnltu ). There's no serviceon that port on the workers. Checked the pages:


https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/cdh_ig_ports_cdh5.html

It is defined on the pages. Checked if the following is set:

The following 2 configurations in/hdfs-site.xml/are the most used forblock scanners.


 *


 *



  * *dfs.block.scanner.volume.bytes.per.second* to throttle the scan
    bandwidth to configurable bytes per second. *Default value is
    1M*. Setting this to 0 will disable the block scanner.
  * *dfs.datanode.scan.period.hours*to configure the scan period,
    which defines how often a whole scan is performed. This should be
    set to a long enough interval to really take effect, for the
    reasons explained above. *Default value is 3 weeks (504 hours)*.
    Setting this to 0 will use the default value. Setting this to a
    negative value will disable the block scanner.

These are NOT explicitly set. Checked hdfs-site.xml. Nothing definedthere. Checked the Configuration tab in the cluster. It's notdefined either.

Does this mean that the defaults are applied OR does it mean that theblock / volume scanner is disabled? I see the pages detail whatvalues for these settings mean but I didn't see any notes pertainingto the situation where both values are not explicitly set.


Thx,
TK


On 10/21/2020 1:34 PM, संजीव (Sanjeev Tripurari) wrote:

Yes Austin,

you are right every datanode will do its block verification, whichis send as health check report to the namenode


Regards
-Sanjeev

On Wed, 21 Oct 2020 at 21:53, Austin Hackett <hacketta...@me.com<mailto:hacketta...@me.com>> wrote:


    Hi Tom

    It is my understanding that in addition to block verification on
    client reads, each data node runs a DataBlockScanner in a
    background thread that periodically verifies all the blocks
    stored on the data node. The dfs.datanode.scan.period.hours
    property controls how often this verification occurs.

    I think the reports are available via the data node
    /blockScannerReport HTTP endpoint, although I’m not sure I ever
    actually looked at one. (add ?listblocks to get the verification
    status of each block).

    More info here:
    https://blog.cloudera.com/hdfs-datanode-scanners-and-disk-checker-explained/
    
<https://blog.cloudera.com/hdfs-datanode-scanners-and-disk-checker-explained/>

    Thanks

    Austin

    On 21 Oct 2020, at 16:47, TomK <tomk...@mdevsys.com
    <mailto:tomk...@mdevsys.com>> wrote:

    Hey Sanjeev,

    Allright.  Thank you once more.  This is clear.

    However, this poses an issue then.  If during the two years,
    disk drives develop bad blocks but do not necessarily fail to
    the point that they cannot be mounted, that checksum would have
    changed since those filesystem blocks can no longer be read. 
    However, from an HDFS perspective, since no checks are done
    regularly, that is not known.   So HDFS still reports that the
    file is fine, in other words, no missing blocks.  For example,
    if a disk is going bad, but those files are not read for two
    years, the system won't know that there is a problem.  Even
    when removing a data node temporarily and re-adding the
    datanode, HDFS isn't checking because that HDFS file isn't read.

    So let's assume this scenario.  Data nodes *dn01* to *dn10* 
    exist. Each data node has 10 x 10TB drives.

    And let's assume that there is one large file on those drives
    and it's replicated to factor of X3.

    If during the two years the file isn't read, and 10 of those
    drives develop bad blocks or other underlying hardware issues,
    then it is possible that HDFS will still report everything
    fine, even with a replication factor of 3.  Because with 10
    disks failing, it's possible a block or sector has failed under
    each of the 3 copies of the data.  But HDFS would NOT know
    since nothing triggered a read of that HDFS file.  Based on
    everything below, then corruption is very much possible even
    with a replication of factor X3.  A this point the file is
    unreadable but HDFS still reports no missing blocks.

    Similarly, if once I take a data node out, I adjust one of the
    files on the data disks, HDFS will not know and still report
    everything fine.  That is until someone read's the file.

    Sounds like this is a very real possibility.

    Thx,
    TK


    On 10/21/2020 10:26 AM, संजीव (Sanjeev Tripurari) wrote:

    Hi Tom

    Therefore, if I write a file to HDFS but access it two years
    later, then the checksum will be computed only twice, at the
    beginning of the two years and again at the end when a client
    connects?  Correct?  As long as no process ever accesses the
    file between now and two years from now, the checksum is never
    redone and compared to the two year old checksum in the fsimage?

    yes, Exactly unless data is read checksum is not verified.
    (when data is written and when the data is read),
    if checksum is mismatched, there is no way to correct it, you
    will have to re-write that file.

    When  datanode is added back in, there is no real read
    operation on the files themselves.  The datanode just reports
    the blocks but doesn't really read the blocks that are there
    to re-verify the files and ensure consistency?

    yes, Exactly, datanode maintains list of files and their
    blocks, which it reports, along with total disk size and used
    size.
    Namenode only has list of blocks, unless datanodes is
    connected it wont know where the blocks are stored.

    Regards
    -Sanjeev


    On Wed, 21 Oct 2020 at 18:31, TomK <tomk...@mdevsys.com
    <mailto:tomk...@mdevsys.com>> wrote:

        Hey Sanjeev,

        Thank you very much again. This confirms my suspision.

        Therefore, if I write a file to HDFS but access it two
        years later, then the checksum will be computed only
        twice, at the beginning of the two years and again at the
        end when a client connects?  Correct?  As long as no
        process ever accesses the file between now and two years
        from now, the checksum is never redone and compared to the
        two year old checksum in the fsimage?

        When  datanode is added back in, there is no real read
        operation on the files themselves.  The datanode just
        reports the blocks but doesn't really read the blocks that
        are there to re-verify the files and ensure consistency?

        Thx,
        TK



        On 10/21/2020 12:38 AM, संजीव (Sanjeev Tripurari) wrote:

        Hi Tom,

        Every datanode sends heartbeat to namenode, on its list
        of blocks it has.

        When a datanode which is disconnected for a while, after
        connecting will send heartbeat to namenode, with list of
        blocks it has (till then namenode will have
        under-replicated blocks).
        As soon as the datanode is connected to namenode, it will
        clear under-replicatred blocks.

        *When a client connects to read or write a file, it will
        run checksum to validate the file.*

        There is no independent process running to do checksum,
        as it will be heavy process on each node.

        Regards
        -Sanjeev

        On Wed, 21 Oct 2020 at 00:18, Tom <t...@mdevsys.com
        <mailto:t...@mdevsys.com>> wrote:

            Thank you. That part I understand and am Ok with it.

            What I would like to know next is when again the
            CRC32C checksum is ran and checked against the
            fsimage that the block file has not changed or become
            corrupted?

            For example, if I take a datanode out, and within 15
            minutes, plug it back in, does HDF rerun the CRC 32C
            on all data disks on that node to make sure blocks
            are ok?

            Cheers,
            TK

            Sent from my iPhone

            On Oct 20, 2020, at 1:39 PM, संजीव (Sanjeev
            Tripurari) <sanjeevtripur...@gmail.com
            <mailto:sanjeevtripur...@gmail.com>> wrote:

            its done as sson as  a file is stored on disk..

            Sanjeev

            On Tuesday, 20 October 2020, TomK
            <tomk...@mdevsys.com <mailto:tomk...@mdevsys.com>>
            wrote:

                Thanks again.

                At what points is the checksum validated
                (checked) after that? For example, is it done on
                a daily basis or is it done only when the file
                is accessed?

                Thx,
                TK

                On 10/20/2020 10:18 AM, संजीव (Sanjeev
                Tripurari) wrote:

                As soon as the file is written first time
                checksum is calculated and updated in fsimage
                (first in edit logs), and same is replicated
                other replicas.



                On Tue, 20 Oct 2020 at 19:15, TomK
                <tomk...@mdevsys.com
                <mailto:tomk...@mdevsys.com>> wrote:

                    Hi Sanjeev,

                    Thank you.  It does help.

                    At what points is the checksum calculated?

                    Thx,
                    TK

                    On 10/20/2020 3:03 AM, संजीव (Sanjeev
                    Tripurari) wrote:

                    For Missing blocks and corrupted blocks,
                    do check if all the datanode services are
                    up, non of the disks where hdfs data is
                    stored is accessible and have no issues,
                    hosts are reachable from namenode,

                    If you are able to re-generate the data
                    and write its great, otherwise hadoop
                    cannot correct itself.


                    Could you please elaborate on this?  Does
                    it mean I have to continuously access a
                    file for HDFS to be able to detect corrupt
                    blocks and correct itself?



                    *"Does HDFS check that the data node is
                    up, data disk is mounted, path to
                    the file exists and file can be read?"*
                    -- yes, only after it fails it will say
                    missing blocks.

                    *Or does it also do a filesystem check on
                    that data disk as well as
                    perhaps a checksum to ensure block integrity?*
                    -- yes, every file cheksum is maintained
                    and cross checked, if it fails it will say
                    corrupted blocks.

                    hope this helps.

                    -Sanjeev
                    *
                    *

                    On Tue, 20 Oct 2020 at 09:52, TomK
                    <tomk...@mdevsys.com
                    <mailto:tomk...@mdevsys.com>> wrote:

                        Hello,

                        HDFS Missing Blocks / Corrupt Blocks
                        Logic:  What are the specific
                        checks done to determine a block is
                        bad and needs to be replicated?

                        Does HDFS check that the data node is
                        up, data disk is mounted, path to
                        the file exists and file can be read?

                        Or does it also do a filesystem check
                        on that data disk as well as
                        perhaps a checksum to ensure block
                        integrity?

                        I've googled on this quite a bit.  I
                        don't see the exact answer I'm
                        looking for. I would like to know
                        exactly what happens during file
                        integrity verification that then
                        constitutes missing blocks or corrupt
                        blocks in the reports.

--Thank You,

                        TK.

                        
---------------------------------------------------------------------
                        To unsubscribe, e-mail:
                        user-unsubscr...@hadoop.apache.org
                        <mailto:user-unsubscr...@hadoop.apache.org>
                        For additional commands, e-mail:
                        user-h...@hadoop.apache.org
                        <mailto:user-h...@hadoop.apache.org>

--Thx,

TK.

--Thx,

TK.

--Thx,

TK.


--
Thx,
TK.



--
Thx,
TK.

Re: HDFS Missing Blocks / Corrupt Blocks Logic: What are the specific checks done to determine a block is bad and needs to be replicated?

Reply via email to