Also, I can't even jstack the datanode, its CPU is low, and its not eating RAM:

16:21:29 10.103.7.3 root@mtag3:/usr/java/latest/bin $ ./jstack 31771
31771: Unable to open socket file: target process not responding or
HotSpot VM not loaded
The -F option can be used when the target process is not responding
You have new mail in /var/spool/mail/root
16:21:54 10.103.7.3 root@mtag3:/usr/java/latest/bin $


When I restart the process iowait goes back to normal.  Right now
iowait in insanely higher compared to a server that had high IOwait
but which I restarted, please see attached graph.

Graph with IOwait drop is the datanode I restarted, the other, I can't
jvm jstack from.


-Jack

On Mon, Mar 28, 2011 at 4:19 PM, Jack Levin <[email protected]> wrote:
> Hello guys, we are getting those errors:
>
>
> 2011-03-28 15:08:33,485 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> /10.101.6.5:50010, dest: /10.101.6.5:51365, bytes: 66564, op:
> HDFS_READ, cliI
> D: 
> DFSClient_hb_rs_rdaf5.prod.imageshack.com,60020,1301323415015_1301323415053,
> offset: 4191232, srvID: DS-1528941561-10.101.6.5-50010-1299713950021,
> blockid: blk_-30874978
> 22408705276_723501, duration: 14409579
> 2011-03-28 15:08:33,492 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> /10.101.6.5:50010, dest: /10.101.6.5:51366, bytes: 14964, op:
> HDFS_READ, cliI
> D: 
> DFSClient_hb_rs_rdaf5.prod.imageshack.com,60020,1301323415015_1301323415053,
> offset: 67094016, srvID: DS-1528941561-10.101.6.5-50010-1299713950021,
> blockid: blk_-3224146
> 686136187733_731011, duration: 8855000
> 2011-03-28 15:08:33,495 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> /10.101.6.5:50010, dest: /10.101.6.5:51368, bytes: 51600, op:
> HDFS_READ, cliI
> D: 
> DFSClient_hb_rs_rdaf5.prod.imageshack.com,60020,1301323415015_1301323415053,
> offset: 0, srvID: DS-1528941561-10.101.6.5-50010-1299713950021,
> blockid: blk_-63843345833451
> 99846_731014, duration: 2053969
> 2011-03-28 15:08:33,503 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> /10.101.6.5:50010, dest: /10.101.6.5:42553, bytes: 462336, op:
> HDFS_READ, cli
> ID: 
> DFSClient_hb_rs_rdaf5.prod.imageshack.com,60020,1301323415015_1301323415053,
> offset: 327680, srvID: DS-1528941561-10.101.6.5-50010-1299713950021,
> blockid: blk_-47512832
> 94726600221_724785, duration: 480254862706
> 2011-03-28 15:08:33,504 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(10.101.6.5:50010,
> storageID=DS-1528941561-10.101.6.5-50010-1299713950021,
>  infoPort=50075, ipcPort=50020):Got exception while serving
> blk_-4751283294726600221_724785 to /10.101.6.5:
> java.net.SocketTimeoutException: 480000 millis timeout while waiting
> for channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/10.101.6.5:500
> 10 remote=/10.101.6.5:42553]
>        at 
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
>        at 
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
>        at 
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
>        at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:350)
>        at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:436)
>        at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:197)
>        at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:110)
>
> 2011-03-28 15:08:33,504 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(10.101.6.5:50010,
> storageID=DS-1528941561-10.101.6.5-50010-1299713950021
> , infoPort=50075, ipcPort=50020):DataXceiver
> java.net.SocketTimeoutException: 480000 millis timeout while waiting
> for channel to be ready for write. ch :
> java.nio.channels.SocketChannel[connected local=/10.101.6.5:500
> 10 remote=/10.101.6.5:42553]
>        at 
> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
>        at 
> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
>        at 
> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
>        at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:350)
>        at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:436)
>        at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:197)
>        at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:110)
> 2011-03-28 15:08:33,504 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> /10.101.6.5:50010, dest: /10.101.6.5:51369, bytes: 66564, op:
> HDFS_READ, cliI
> D: 
> DFSClient_hb_rs_rdaf5.prod.imageshack.com,60020,1301323415015_1301323415053,
> offset: 4781568, srvID: DS-1528941561-10.101.6.5-50010-1299713950021,
> blockid: blk_-30874978
> 22408705276_723501, duration: 11478016
> 2011-03-28 15:08:33,506 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src:
> /10.101.6.5:50010, dest: /10.101.6.5:51370, bytes: 66564, op:
> HDFS_READ, cliI
> D: 
> DFSClient_hb_rs_rdaf5.prod.imageshack.com,60020,1301323415015_1301323415053,
> offset: 66962944, srvID: DS-1528941561-10.101.6.5-50010-1299713950021,
> blockid: blk_-3224146
> 686136187733_731011, duration: 7643688
>
>
> RS talking to DN, and we are getting timeouts, there are no issues
> like ulimit afaik, as we start them with 32k.  Any ideas what the deal
> is?
>
> -Jack
>

Reply via email to