Hey Austin, Sanjeev,
Thanks once more! Took some time to review the pages. That was
certainly very helpful. Appreciated!
However, I tried to access https://dn01/blockScannerReport on a test
Cloudera 6.3 cluster. Didn't work Tried the following as well:
http://dn01:50075/blockscannerreport?listblocks
https://dn01:50075/blockscannerreport
https://dn01:10006/blockscannerreport
Checked that port 50075 is up ( netstat -pnltu ). There's no service on
that port on the workers. Checked the pages:
https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/cdh_ig_ports_cdh5.html
It is defined on the pages. Checked if the following is set:
The following 2 configurations in/hdfs-site.xml/are the most used for
block scanners.
*
*
* *dfs.block.scanner.volume.bytes.per.second* to throttle the scan
bandwidth to configurable bytes per second. *Default value is 1M*.
Setting this to 0 will disable the block scanner.
* *dfs.datanode.scan.period.hours*to configure the scan period, which
defines how often a whole scan is performed. This should be set to a
long enough interval to really take effect, for the reasons
explained above. *Default value is 3 weeks (504 hours)*. Setting
this to 0 will use the default value. Setting this to a negative
value will disable the block scanner.
These are NOT explicitly set. Checked hdfs-site.xml. Nothing defined
there. Checked the Configuration tab in the cluster. It's not defined
either.
Does this mean that the defaults are applied OR does it mean that the
block / volume scanner is disabled? I see the pages detail what values
for these settings mean but I didn't see any notes pertaining to the
situation where both values are not explicitly set.
Thx,
TK
On 10/21/2020 1:34 PM, संजीव (Sanjeev Tripurari) wrote:
Yes Austin,
you are right every datanode will do its block verification, which is
send as health check report to the namenode
Regards
-Sanjeev
On Wed, 21 Oct 2020 at 21:53, Austin Hackett <hacketta...@me.com
<mailto:hacketta...@me.com>> wrote:
Hi Tom
It is my understanding that in addition to block verification on
client reads, each data node runs a DataBlockScanner in a
background thread that periodically verifies all the blocks stored
on the data node. The dfs.datanode.scan.period.hours property
controls how often this verification occurs.
I think the reports are available via the data node
/blockScannerReport HTTP endpoint, although I’m not sure I ever
actually looked at one. (add ?listblocks to get the verification
status of each block).
More info here:
https://blog.cloudera.com/hdfs-datanode-scanners-and-disk-checker-explained/
<https://blog.cloudera.com/hdfs-datanode-scanners-and-disk-checker-explained/>
Thanks
Austin
On 21 Oct 2020, at 16:47, TomK <tomk...@mdevsys.com
<mailto:tomk...@mdevsys.com>> wrote:
Hey Sanjeev,
Allright. Thank you once more. This is clear.
However, this poses an issue then. If during the two years, disk
drives develop bad blocks but do not necessarily fail to the
point that they cannot be mounted, that checksum would have
changed since those filesystem blocks can no longer be read.
However, from an HDFS perspective, since no checks are done
regularly, that is not known. So HDFS still reports that the
file is fine, in other words, no missing blocks. For example, if
a disk is going bad, but those files are not read for two years,
the system won't know that there is a problem. Even when
removing a data node temporarily and re-adding the datanode, HDFS
isn't checking because that HDFS file isn't read.
So let's assume this scenario. Data nodes *dn01* to *dn10*
exist. Each data node has 10 x 10TB drives.
And let's assume that there is one large file on those drives and
it's replicated to factor of X3.
If during the two years the file isn't read, and 10 of those
drives develop bad blocks or other underlying hardware issues,
then it is possible that HDFS will still report everything fine,
even with a replication factor of 3. Because with 10 disks
failing, it's possible a block or sector has failed under each of
the 3 copies of the data. But HDFS would NOT know since nothing
triggered a read of that HDFS file. Based on everything below,
then corruption is very much possible even with a replication of
factor X3. A this point the file is unreadable but HDFS still
reports no missing blocks.
Similarly, if once I take a data node out, I adjust one of the
files on the data disks, HDFS will not know and still report
everything fine. That is until someone read's the file.
Sounds like this is a very real possibility.
Thx,
TK
On 10/21/2020 10:26 AM, संजीव (Sanjeev Tripurari) wrote:
Hi Tom
Therefore, if I write a file to HDFS but access it two years
later, then the checksum will be computed only twice, at the
beginning of the two years and again at the end when a client
connects? Correct? As long as no process ever accesses the
file between now and two years from now, the checksum is never
redone and compared to the two year old checksum in the fsimage?
yes, Exactly unless data is read checksum is not verified. (when
data is written and when the data is read),
if checksum is mismatched, there is no way to correct it, you
will have to re-write that file.
When datanode is added back in, there is no real read operation
on the files themselves. The datanode just reports the blocks
but doesn't really read the blocks that are there to re-verify
the files and ensure consistency?
yes, Exactly, datanode maintains list of files and their blocks,
which it reports, along with total disk size and used size.
Namenode only has list of blocks, unless datanodes is connected
it wont know where the blocks are stored.
Regards
-Sanjeev
On Wed, 21 Oct 2020 at 18:31, TomK <tomk...@mdevsys.com
<mailto:tomk...@mdevsys.com>> wrote:
Hey Sanjeev,
Thank you very much again. This confirms my suspision.
Therefore, if I write a file to HDFS but access it two years
later, then the checksum will be computed only twice, at the
beginning of the two years and again at the end when a
client connects? Correct? As long as no process ever
accesses the file between now and two years from now, the
checksum is never redone and compared to the two year old
checksum in the fsimage?
When datanode is added back in, there is no real read
operation on the files themselves. The datanode just
reports the blocks but doesn't really read the blocks that
are there to re-verify the files and ensure consistency?
Thx,
TK
On 10/21/2020 12:38 AM, संजीव (Sanjeev Tripurari) wrote:
Hi Tom,
Every datanode sends heartbeat to namenode, on its list of
blocks it has.
When a datanode which is disconnected for a while, after
connecting will send heartbeat to namenode, with list of
blocks it has (till then namenode will have
under-replicated blocks).
As soon as the datanode is connected to namenode, it will
clear under-replicatred blocks.
*When a client connects to read or write a file, it will
run checksum to validate the file.*
There is no independent process running to do checksum, as
it will be heavy process on each node.
Regards
-Sanjeev
On Wed, 21 Oct 2020 at 00:18, Tom <t...@mdevsys.com
<mailto:t...@mdevsys.com>> wrote:
Thank you. That part I understand and am Ok with it.
What I would like to know next is when again the CRC32C
checksum is ran and checked against the fsimage that
the block file has not changed or become corrupted?
For example, if I take a datanode out, and within 15
minutes, plug it back in, does HDF rerun the CRC 32C on
all data disks on that node to make sure blocks are ok?
Cheers,
TK
Sent from my iPhone
On Oct 20, 2020, at 1:39 PM, संजीव (Sanjeev Tripurari)
<sanjeevtripur...@gmail.com
<mailto:sanjeevtripur...@gmail.com>> wrote:
its done as sson as a file is stored on disk..
Sanjeev
On Tuesday, 20 October 2020, TomK <tomk...@mdevsys.com
<mailto:tomk...@mdevsys.com>> wrote:
Thanks again.
At what points is the checksum validated (checked)
after that? For example, is it done on a daily
basis or is it done only when the file is accessed?
Thx,
TK
On 10/20/2020 10:18 AM, संजीव (Sanjeev Tripurari)
wrote:
As soon as the file is written first time
checksum is calculated and updated in fsimage
(first in edit logs), and same is replicated
other replicas.
On Tue, 20 Oct 2020 at 19:15, TomK
<tomk...@mdevsys.com
<mailto:tomk...@mdevsys.com>> wrote:
Hi Sanjeev,
Thank you. It does help.
At what points is the checksum calculated?
Thx,
TK
On 10/20/2020 3:03 AM, संजीव (Sanjeev
Tripurari) wrote:
For Missing blocks and corrupted blocks, do
check if all the datanode services are up,
non of the disks where hdfs data is stored
is accessible and have no issues, hosts are
reachable from namenode,
If you are able to re-generate the data and
write its great, otherwise hadoop cannot
correct itself.
Could you please elaborate on this? Does it
mean I have to continuously access a file for
HDFS to be able to detect corrupt blocks and
correct itself?
*"Does HDFS check that the data node is up,
data disk is mounted, path to
the file exists and file can be read?"*
-- yes, only after it fails it will say
missing blocks.
*Or does it also do a filesystem check on
that data disk as well as
perhaps a checksum to ensure block integrity?*
-- yes, every file cheksum is maintained and
cross checked, if it fails it will say
corrupted blocks.
hope this helps.
-Sanjeev
*
*
On Tue, 20 Oct 2020 at 09:52, TomK
<tomk...@mdevsys.com
<mailto:tomk...@mdevsys.com>> wrote:
Hello,
HDFS Missing Blocks / Corrupt Blocks
Logic: What are the specific
checks done to determine a block is bad
and needs to be replicated?
Does HDFS check that the data node is
up, data disk is mounted, path to
the file exists and file can be read?
Or does it also do a filesystem check on
that data disk as well as
perhaps a checksum to ensure block
integrity?
I've googled on this quite a bit. I
don't see the exact answer I'm
looking for. I would like to know
exactly what happens during file
integrity verification that then
constitutes missing blocks or corrupt
blocks in the reports.
--
Thank You,
TK.
---------------------------------------------------------------------
To unsubscribe, e-mail:
user-unsubscr...@hadoop.apache.org
<mailto:user-unsubscr...@hadoop.apache.org>
For additional commands, e-mail:
user-h...@hadoop.apache.org
<mailto:user-h...@hadoop.apache.org>
--
Thx,
TK.
--
Thx,
TK.
--
Thx,
TK.
--
Thx,
TK.