One typo below. It should be / not * . On 10/23/2020 12:53 PM, TomK wrote:
Hey Sanjeev,While figuring out the UI ports and statistics, would like to add another question.So let's assume our scenario once more. (Data nodes *dn01* to *dn10* exist. Each data node has 10 x 10TB drives. ) Also remembering that the check takes 3 weeks (504 hours) and runs at 1MB/s.In order to check the CRC 32C checksum, assuming maximum speed of 300MB/s, drives at 75% capacity, and that disks are checked in parallel, these are the timings I'm getting:1) 300MB /s : 7.5TB (7864320MB) . 7864320/300 / 60 / 60 = 7.28 Hours.2) 1MB /s : 7.5TB (7864320MB) . 7864320/1 / 60 / 60 = 2184 Hours. This equals 89.5 days. This would overlap with the next 3 week cycle (21 days). Therefore it can't effectively complete.Given this, I'm thinking in order for HDFS to properly check larger clusters, the default values of *dfs.block.scanner.volume.bytes.per.second* and **dfs.datanode.scan.period.hours**would need to be adjusted. Otherwise, there is a potential that HDFS block (volume) scanner isn't checking things well.Am I correct in the above? **** Thx, TKDid these tests to get a rough estimate of the time it takes for the various checksums to complete:root / home time gsutil hash BigFile17GiBinRAM; Hashes [base64] for BigFile17GiBinRAM:7 MiB/s Hash (crc32c): lRucsQ== Hash (md5): v3yZMSuBlJ/q1N8Xavshgg== Operation completed over 1 objects/16.0 GiB. real 4m7.993s user 3m44.568s sys 0m22.338s root / home time sha512sum BigFile17GiBinRAMac2c90c6acfeed591530ad9db639734308ef1bf494a3608c10f6334b933149d35c8098791a460fcdc302dcc921a4e16e2821712d8bbbb5d51ea8f5dc329de0ad BigFile17GiBinRAMreal 2m21.008s user 1m51.815s sys 0m15.877s root / home time sha1sum BigFile17GiBinRAM 69d37b2c75a13fc02e99bf8e0205ce2a09d98b24 BigFile17GiBinRAM real 1m18.117s user 1m1.428s sys 0m15.505s root / home time gsutil hash BigFile17GiBinRAM; Hashes [base64] for BigFile17GiBinRAM:3 MiB/s Hash (crc32c): lRucsQ== Hash (md5): v3yZMSuBlJ/q1N8Xavshgg== Operation completed over 1 objects/16.0 GiB. real 4m9.437s user 3m43.473s sys 0m23.597s root / home On 10/22/2020 9:48 PM, TomK wrote:Hey Austin, Sanjeev, The ports defined are as follows in hdfs-site.xml: cm-r01wn01.mws.mds.xyz root … run cloudera-scm-agent process grep -Ei "dfs.datanode.http.address|dfs.datanode.https.address" -A 2 ./3370-hdfs-DATANODE/hdfs-site.xml<name>dfs.datanode.http.address</name> <value>cm-r01wn01.mws.mds.xyz:1006</value> </property> -- <name>dfs.datanode.https.address</name> <value>cm-r01wn01.mws.mds.xyz:9865</value> </property> cm-r01wn01.mws.mds.xyz root … run cloudera-scm-agent process Checking the ports used: cm-r01wn01.mws.mds.xyz root ~ netstat -pnltu|grep -Ei "9866|1004|9864|9865|1006|9867" tcp 0 0 10.3.0.160:9867 0.0.0.0:* LISTEN 30096/jsvc.exec tcp 0 0 10.3.0.160:1004 0.0.0.0:* LISTEN 30096/jsvc.exec tcp 0 0 10.3.0.160:1006 0.0.0.0:* LISTEN 30096/jsvc.exec cm-r01wn01.mws.mds.xyz root ~ cm-r01wn01.mws.mds.xyz root ~ cm-r01wn01.mws.mds.xyz root ~ hds getconf -confKey dfs.datanode.address-bash: hds: command not found cm-r01wn01.mws.mds.xyz root ~ hdfs getconf -confKey dfs.datanode.address0.0.0.0:9866 cm-r01wn01.mws.mds.xyz root ~ hdfs getconf -confKey dfs.datanode.http.address0.0.0.0:9864 cm-r01wn01.mws.mds.xyz root ~ hdfs getconf -confKey dfs.datanode.https.address0.0.0.0:9865 cm-r01wn01.mws.mds.xyz root ~ hdfs getconf -confKey dfs.datanode.ipc.address0.0.0.0:9867 cm-r01wn01.mws.mds.xyz root ~ The scanner looks to be initialized: cm-r01wn01.mws.mds.xyz root / var log hadoop-hdfs grep -EiR "Periodic block scanner is not running" * cm-r01wn01.mws.mds.xyz root / var log hadoop-hdfs cm-r01wn01.mws.mds.xyz root / var log hadoop-hdfs grep -EiR "Initialized block scanner with targetBytesPerSec" *|wc -l;32 cm-r01wn01.mws.mds.xyz root / var log hadoop-hdfs And yes, indeed it is started up. It kicked off around the time when I restarted the DataNode service. cm-r01wn01.mws.mds.xyz root / var log hadoop-hdfs vi hadoop-cmf-hdfs-DATANODE-cm-r01wn01.mws.mds.xyz.log.outSTARTUP_MSG: build = http://github.com/cloudera/hadoop -r 7f07ef8e6df428a8eb53009dc8d9a249dbbb50ad; compiled by 'jenkins' on 2019-07-18T17:09ZSTARTUP_MSG: java = 1.8.0_181 ************************************************************/2020-10-22 20:54:58,488 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: registered UNIX signal handlers for [TERM, HUP, INT] 2020-10-22 20:54:59,762 INFO org.apache.hadoop.security.UserGroupInformation: Login successful for user hdfs/cm-r01wn01.mws.mds....@mws.mds.xyz using keytab file hdfs.keytab. Keytab auto renewal enabled : false 2020-10-22 20:55:00,265 INFO org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker: Scheduling a check for [DISK]file:/hdfs/1/dfs/dn 2020-10-22 20:55:00,295 INFO org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker: Scheduling a check for [DISK]file:/hdfs/2/dfs/dn 2020-10-22 20:55:00,296 INFO org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker: Scheduling a check for [DISK]file:/hdfs/3/dfs/dn 2020-10-22 20:55:00,297 INFO org.apache.hadoop.hdfs.server.datanode.checker.ThrottledAsyncChecker: Scheduling a check for [DISK]file:/hdfs/4/dfs/dn 2020-10-22 20:55:00,521 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties 2020-10-22 20:55:00,723 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s). 2020-10-22 20:55:00,723 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started 2020-10-22 20:55:00,947 INFO org.apache.hadoop.hdfs.server.common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling 2020-10-22 20:55:00,953 INFO org.apache.hadoop.hdfs.server.datanode.BlockScanner: *Initialized block scanner with targetBytesPerSec 1048576* 2020-10-22 20:55:00,961 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: File descriptor passing is enabled. 2020-10-22 20:55:00,963 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Configured hostname is cm-r01wn01.mws.mds.xyz 2020-10-22 20:55:00,965 INFO org.apache.hadoop.hdfs.server.common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling 2020-10-22 20:55:00,995 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting DataNode with maxLockedMemory = 299892736 2020-10-22 20:55:01,018 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened streaming server at /10.3.0.160:1004 2020-10-22 20:55:01,023 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwidth is 10485760 bytes/s 2020-10-22 20:55:01,024 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Number threads for balancing is 50 2020-10-22 20:55:01,029 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwidth is 10485760 bytes/s 2020-10-22 20:55:01,029 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Number threads for balancing is 50 2020-10-22 20:55:01,029 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Listening on UNIX domain socket: /var/run/hdfs-sockets/dn 2020-10-22 20:55:01,304 INFO org.eclipse.jetty.util.log: Logging initialized @8929ms 2020-10-22 20:55:01,559 INFO org.apache.hadoop.http.HttpRequestLog: Http request log for http.requests.datanode is not defined 2020-10-22 20:55:01,585 INFO org.apache.hadoop.http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter) 2020-10-22 20:55:01,589 INFO org.apache.hadoop.http.HttpServer2: Added filter authentication (class=org.apache.hadoop.security.authentication.server.AuthenticationFilter) to context datanodeThis answers another question I had: under what conditions does the block / volume checker kick off. When removing a datanode and adding it back in, it appears the checker will get kicked off on the worker at that time.Only the Secure DataNode port returns a login, as to be expected. ( http://cm-r01wn01.mws.mds.xyz:1006/ )Thx, TK On 10/22/2020 11:56 AM, संजीव (Sanjeev Tripurari) wrote:Hi Tom,Can you start your datanode service, and share the datanode logs, check if it is started properly or not.Regards -SanjeevOn Thu, 22 Oct 2020 at 20:33, Austin Hackett <hacketta...@me.com <mailto:hacketta...@me.com>> wrote:Hi Tom It might be worth restarting the DataNode process? I didn’t think you could disable the DataNode Web UI as such, but I could be wrong on this point. Out of interest, what does hdfs-site.xml say with regards to dfs.datanode.http.address/dfs.datanode.https.address? Regarding the logs, a quick look on GitHub suggests there may be a couple of useful log messages: https://github.com/apache/hadoop/blob/88a9f42f320e7c16cf0b0b424283f8e4486ef286/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockScanner.java <https://github.com/apache/hadoop/blob/88a9f42f320e7c16cf0b0b424283f8e4486ef286/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockScanner.java> For example, LOG.warn(“Periodic block scanner is not running”) or LOG.info <http://LOG.info>(“Initialized block scanner with targetBytesPerSec {}”). Of course, you’d need make sure those LOG statements are present in the Hadoop version included with CDH 6.3. Git “blame” suggests the LOG statements were added 6 years, so chance are you have them... Thanks AustinOn 22 Oct 2020, at 14:44, TomK <tomk...@mdevsys.com <mailto:tomk...@mdevsys.com>> wrote: Thanks Austin. However none of these are open on a standard Cloudera 6.3 build. # netstat -pnltu|grep -Ei "9866|1004|9864|9865|1006|9867" # Would there be anything in the logs to indicate whether or not the block / volume scanner is running? Thx, TK On 10/22/2020 3:09 AM, Austin Hackett wrote:Hi Tom I not too familiar with the CDH distribution, but this page has the default ports used by DataNode: https://docs.cloudera.com/documentation/enterprise/latest/topics/cdh_ports.html <https://docs.cloudera.com/documentation/enterprise/latest/topics/cdh_ports.html> I believe it’s the settings for dfs.datanode.http.address/dfs.datanode.https.address that you’re interested in (9864/9865) Since the data block scanner related config parameters are not set, the defaults of 3 weeks and 1MB should be applied. Thanks AustinOn 22 Oct 2020, at 06:35, TomK <tomk...@mdevsys.com> <mailto:tomk...@mdevsys.com> wrote: Hey Austin, Sanjeev, Thanks once more! Took some time to review the pages. That was certainly very helpful. Appreciated! However, I tried to access https://dn01/blockScannerReport <https://dn01/blockScannerReport> on a test Cloudera 6.3 cluster. Didn't work Tried the following as well: http://dn01:50075/blockscannerreport?listblocks <http://dn01:50075/blockscannerreport?listblocks> https://dn01:50075/blockscannerreport <https://dn01:50075/blockscannerreport> https://dn01:10006/blockscannerreport <https://dn01:10006/blockscannerreport> Checked that port 50075 is up ( netstat -pnltu ). There's no service on that port on the workers. Checked the pages: https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/cdh_ig_ports_cdh5.html <https://docs.cloudera.com/documentation/enterprise/5-14-x/topics/cdh_ig_ports_cdh5.html> It is defined on the pages. Checked if the following is set: The following 2 configurations in/hdfs-site.xml/are the most used for block scanners. * * * *dfs.block.scanner.volume.bytes.per.second* to throttle the scan bandwidth to configurable bytes per second. *Default value is 1M*. Setting this to 0 will disable the block scanner. * *dfs.datanode.scan.period.hours*to configure the scan period, which defines how often a whole scan is performed. This should be set to a long enough interval to really take effect, for the reasons explained above. *Default value is 3 weeks (504 hours)*. Setting this to 0 will use the default value. Setting this to a negative value will disable the block scanner. These are NOT explicitly set. Checked hdfs-site.xml. Nothing defined there. Checked the Configuration tab in the cluster. It's not defined either. Does this mean that the defaults are applied OR does it mean that the block / volume scanner is disabled? I see the pages detail what values for these settings mean but I didn't see any notes pertaining to the situation where both values are not explicitly set. Thx, TK On 10/21/2020 1:34 PM, संजीव (Sanjeev Tripurari) wrote:Yes Austin, you are right every datanode will do its block verification, which is send as health check report to the namenode Regards -Sanjeev On Wed, 21 Oct 2020 at 21:53, Austin Hackett <hacketta...@me.com <mailto:hacketta...@me.com>> wrote: Hi Tom It is my understanding that in addition to block verification on client reads, each data node runs a DataBlockScanner in a background thread that periodically verifies all the blocks stored on the data node. The dfs.datanode.scan.period.hours property controls how often this verification occurs. I think the reports are available via the data node /blockScannerReport HTTP endpoint, although I’m not sure I ever actually looked at one. (add ?listblocks to get the verification status of each block). More info here: https://blog.cloudera.com/hdfs-datanode-scanners-and-disk-checker-explained/ <https://blog.cloudera.com/hdfs-datanode-scanners-and-disk-checker-explained/> Thanks AustinOn 21 Oct 2020, at 16:47, TomK <tomk...@mdevsys.com <mailto:tomk...@mdevsys.com>> wrote: Hey Sanjeev, Allright. Thank you once more. This is clear. However, this poses an issue then. If during the two years, disk drives develop bad blocks but do not necessarily fail to the point that they cannot be mounted, that checksum would have changed since those filesystem blocks can no longer be read. However, from an HDFS perspective, since no checks are done regularly, that is not known. So HDFS still reports that the file is fine, in other words, no missing blocks. For example, if a disk is going bad, but those files are not read for two years, the system won't know that there is a problem. Even when removing a data node temporarily and re-adding the datanode, HDFS isn't checking because that HDFS file isn't read. So let's assume this scenario. Data nodes *dn01* to *dn10* exist. Each data node has 10 x 10TB drives. And let's assume that there is one large file on those drives and it's replicated to factor of X3. If during the two years the file isn't read, and 10 of those drives develop bad blocks or other underlying hardware issues, then it is possible that HDFS will still report everything fine, even with a replication factor of 3. Because with 10 disks failing, it's possible a block or sector has failed under each of the 3 copies of the data. But HDFS would NOT know since nothing triggered a read of that HDFS file. Based on everything below, then corruption is very much possible even with a replication of factor X3. A this point the file is unreadable but HDFS still reports no missing blocks. Similarly, if once I take a data node out, I adjust one of the files on the data disks, HDFS will not know and still report everything fine. That is until someone read's the file. Sounds like this is a very real possibility. Thx, TK On 10/21/2020 10:26 AM, संजीव (Sanjeev Tripurari) wrote:Hi Tom Therefore, if I write a file to HDFS but access it two years later, then the checksum will be computed only twice, at the beginning of the two years and again at the end when a client connects? Correct? As long as no process ever accesses the file between now and two years from now, the checksum is never redone and compared to the two year old checksum in the fsimage? yes, Exactly unless data is read checksum is not verified. (when data is written and when the data is read), if checksum is mismatched, there is no way to correct it, you will have to re-write that file. When datanode is added back in, there is no real read operation on the files themselves. The datanode just reports the blocks but doesn't really read the blocks that are there to re-verify the files and ensure consistency? yes, Exactly, datanode maintains list of files and their blocks, which it reports, along with total disk size and used size. Namenode only has list of blocks, unless datanodes is connected it wont know where the blocks are stored. Regards -Sanjeev On Wed, 21 Oct 2020 at 18:31, TomK <tomk...@mdevsys.com <mailto:tomk...@mdevsys.com>> wrote: Hey Sanjeev, Thank you very much again. This confirms my suspision. Therefore, if I write a file to HDFS but access it two years later, then the checksum will be computed only twice, at the beginning of the two years and again at the end when a client connects? Correct? As long as no process ever accesses the file between now and two years from now, the checksum is never redone and compared to the two year old checksum in the fsimage? When datanode is added back in, there is no real read operation on the files themselves. The datanode just reports the blocks but doesn't really read the blocks that are there to re-verify the files and ensure consistency? Thx, TK On 10/21/2020 12:38 AM, संजीव (Sanjeev Tripurari) wrote:Hi Tom, Every datanode sends heartbeat to namenode, on its list of blocks it has. When a datanode which is disconnected for a while, after connecting will send heartbeat to namenode, with list of blocks it has (till then namenode will have under-replicated blocks). As soon as the datanode is connected to namenode, it will clear under-replicatred blocks. *When a client connects to read or write a file, it will run checksum to validate the file.* There is no independent process running to do checksum, as it will be heavy process on each node. Regards -Sanjeev On Wed, 21 Oct 2020 at 00:18, Tom <t...@mdevsys.com <mailto:t...@mdevsys.com>> wrote: Thank you. That part I understand and am Ok with it. What I would like to know next is when again the CRC32C checksum is ran and checked against the fsimage that the block file has not changed or become corrupted? For example, if I take a datanode out, and within 15 minutes, plug it back in, does HDF rerun the CRC 32C on all data disks on that node to make sure blocks are ok? Cheers, TK Sent from my iPhoneOn Oct 20, 2020, at 1:39 PM, संजीव (Sanjeev Tripurari) <sanjeevtripur...@gmail.com <mailto:sanjeevtripur...@gmail.com>> wrote: its done as sson as a file is stored on disk.. Sanjeev On Tuesday, 20 October 2020, TomK <tomk...@mdevsys.com <mailto:tomk...@mdevsys.com>> wrote: Thanks again. At what points is the checksum validated (checked) after that? For example, is it done on a daily basis or is it done only when the file is accessed? Thx, TK On 10/20/2020 10:18 AM, संजीव (Sanjeev Tripurari) wrote:As soon as the file is written first time checksum is calculated and updated in fsimage (first in edit logs), and same is replicated other replicas. On Tue, 20 Oct 2020 at 19:15, TomK <tomk...@mdevsys.com <mailto:tomk...@mdevsys.com>> wrote: Hi Sanjeev, Thank you. It does help. At what points is the checksum calculated? Thx, TK On 10/20/2020 3:03 AM, संजीव (Sanjeev Tripurari) wrote:For Missing blocks and corrupted blocks, do check if all the datanode services are up, non of the disks where hdfs data is stored is accessible and have no issues, hosts are reachable from namenode, If you are able to re-generate the data and write its great, otherwise hadoop cannot correct itself.Could you please elaborate on this? Does it mean I have to continuously access a file for HDFS to be able to detect corrupt blocks and correct itself?*"Does HDFS check that the data node is up, data disk is mounted, path to the file exists and file can be read?"* -- yes, only after it fails it will say missing blocks. *Or does it also do a filesystem check on that data disk as well as perhaps a checksum to ensure block integrity?* -- yes, every file cheksum is maintained and cross checked, if it fails it will say corrupted blocks. hope this helps. -Sanjeev * * On Tue, 20 Oct 2020 at 09:52, TomK <tomk...@mdevsys.com <mailto:tomk...@mdevsys.com>> wrote: Hello, HDFS Missing Blocks / Corrupt Blocks Logic: What are the specific checks done to determine a block is bad and needs to be replicated? Does HDFS check that the data node is up, data disk is mounted, path to the file exists and file can be read? Or does it also do a filesystem check on that data disk as well as perhaps a checksum to ensure block integrity? I've googled on this quite a bit. I don't see the exact answer I'm looking for. I would like to know exactly what happens during file integrity verification that then constitutes missing blocks or corrupt blocks in the reports.-- Thank You,TK. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org <mailto:user-unsubscr...@hadoop.apache.org> For additional commands, e-mail: user-h...@hadoop.apache.org <mailto:user-h...@hadoop.apache.org>-- Thx,TK.-- Thx,TK.-- Thx,TK.-- Thx,TK.-- Thx,TK.-- Thx, TK.-- Thx, TK.
-- Thx, TK.