Thanks Harsh & Manoj for the inputs. Now i found that the data node is busy with block scanning. I have TBs data attached with each data node. So its taking days to complete the data block scanning. I have two questions.
1. Is data node will not allow to write the data during DataBlockScanning process ? 2. Is data node will come normal only when "Not yet verified" come to zero in data node blockScannerReport ? # Data node logs 2013-05-01 05:53:50,639 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_-7605405041820244736_20626608 2013-05-01 05:53:50,664 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_-1425088964531225881_20391711 2013-05-01 05:53:50,692 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_2259194263704433881_10277076 2013-05-01 05:53:50,740 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_2653195657740262633_18315696 2013-05-01 05:53:50,818 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_-5124560783595402637_20821252 2013-05-01 05:53:50,866 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_6596021414426970798_19649117 2013-05-01 05:53:50,931 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_7026400040099637841_20741138 2013-05-01 05:53:50,992 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_8535358360851622516_20694185 2013-05-01 05:53:51,057 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Verification succeeded for blk_7959856580255809601_20559830 # One of my Data node block scanning report http://<datanode-host>:15075/blockScannerReport Total Blocks : 2037907 Verified in last hour : 4819 Verified in last day : 107355 Verified in last week : 686873 Verified in last four weeks : 1589964 Verified in SCAN_PERIOD : 1474221 Not yet verified : 447943 Verified since restart : 318433 Scans since restart : 318058 Scan errors since restart : 0 Transient scan errors : 0 Current scan rate limit KBps : 3205 Progress this period : 101% Time left in cur period : 86.02% Thanks Selva -----Original Message----- >From "S, Manoj" <[email protected]> Subject RE: High IO Usage in Datanodes due to Replication Date Mon, 29 Apr 2013 06:41:31 GMT Adding to Harsh's comments: You can also tweak a few OS level parameters to improve the I/O performance. 1) Mount the filesystem with "noatime" option. 2) Check if changing the IO scheduling the algorithm will improve the cluster's performance. (Check this file /sys/block/<device_name>/queue/scheduler) 3) If there are lots of I/O requests and your cluster hangs because of that, you can increase the queue length by increasing the value in /sys/block/<device_name>/queue/nr_requests. -----Original Message----- From: Harsh J [mailto:[email protected]] Sent: Sunday, April 28, 2013 12:03 AM To: <[email protected]> Subject: Re: High IO Usage in Datanodes due to Replication They seem to be transferring blocks between one another. This may most likely be due to under-replication and the NN UI will have numbers on work left to perform. The inter-DN transfer is controlled by the balancing bandwidth though, so you can lower that down if you want to, to cripple it - but you'll lose out on time for a perfectly replicated state again. On Sat, Apr 27, 2013 at 11:33 PM, selva <[email protected]> wrote: > Hi All, > > I have lost amazon instances of my hadoop cluster. But i had all the > data in aws EBS volumes. So i launched new instances and attached volumes. > > But all of the datanode logs keep on print the below lines it cauased > to high IO rate. Due to IO usage i am not able to run any jobs. > > Can anyone help me to understand what it is doing? Thanks in advance. > > 2013-04-27 17:51:40,197 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(10.157.10.242:10013, > storageID=DS-407656544-10.28.217.27-10013-1353165843727, > infoPort=15075, > ipcPort=10014) Starting thread to transfer block > blk_2440813767266473910_11564425 to 10.168.18.178:10013 > 2013-04-27 17:51:40,230 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: > DatanodeRegistration(10.157.10.242:10013, > storageID=DS-407656544-10.28.217.27-10013-1353165843727, > infoPort=15075, ipcPort=10014):Transmitted block > blk_2440813767266473910_11564425 to > /10.168.18.178:10013 > 2013-04-27 17:51:40,433 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block > blk_2442656050740605335_10906493 src: /10.171.11.11:60744 dest: > /10.157.10.242:10013 > 2013-04-27 17:51:40,450 INFO > org.apache.hadoop.hdfs.server.datanode.DataNode: Received block > blk_2442656050740605335_10906493 src: /10.171.11.11:60744 dest: > /10.157.10.242:10013 of size 25431 > > Thanks > Selva > > > > > > -- Harsh J
