> How do you check the
switch/link/uplink ?
 
This is entirely dependent on how your network is put together, and with what 
components, and what type of monitoring is in place.


Best regards,


    - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein (via 
Tom White)



>________________________________
> From: Mikael Sitruk <[email protected]>
>To: [email protected]; Andrew Purtell <[email protected]> 
>Sent: Thursday, February 23, 2012 2:23 PM
>Subject: Re: Exception in hbase 0.92. with DFS, - Bad connect ack
> 
>
>Thanks Andrew, for the quick response.
>I suspect also the network, i have ssh that is blocking from time to time. How 
>do you check the 
switch/link/uplink ?
>
>
>Mikael.S
>
>
>On Fri, Feb 24, 2012 at 12:06 AM, Andrew Purtell <[email protected]> wrote:
>
>Check your switch/link/uplink utilization.
>> 
>>HDFS-941 might help. That is not in Hadoop 1.0 according to a cursory search 
>>over branch history in the Git mirror.
>>
>>
>>As another datapoint, we see this in our production with a Hadoop that is 
>>much closer to CDH3; but, we have some known issues with the network design 
>>in our legacy datacenters and plan to resolve it with an eventual relocation. 
>>I'm also integrating HDFS-941.
>>
>>
>>Best regards,
>>
>>
>>    - Andy
>>
>>Problems worthy of attack prove their worth by hitting back. - Piet Hein (via 
>>Tom White)
>>
>>
>>
>>
>>----- Original Message -----
>>> From: Mikael Sitruk <[email protected]>
>>> To: [email protected]
>>> Cc:
>>> Sent: Thursday, February 23, 2012 1:25 PM
>>> Subject: Exception in hbase 0.92. with DFS, - Bad connect ack
>>>
>>> Hi
>>>
>>> I see that i have in my hbase logs a lot of the following (target IP is
>>> changing)
>>> 2012-02-23 23:04:02,699 INFO org.apache.hadoop.hdfs.DFSClient: Exception in
>>> createBlockOutputStream 10.232.83.87:50010 java.io.IOException: Bad connect
>>> ack with firstBadLink as 10.232.83.118:50010
>>> 2012-02-23 23:04:02,699 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
>>> block blk_4678388308309640326_170570
>>> 2012-02-23 23:04:02,701 INFO org.apache.hadoop.hdfs.DFSClient: Excluding
>>> datanode 10.232.83.118:50010
>>>
>>> Then checking the hdfs log of the same server (87)
>>> 2012-02-23 23:04:02,698 INFO
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>>> blk_4678388308309640326_170570 received exception
>>> java.net.SocketTimeoutException: 66000 millis timeout while waiting for
>>> channel to be ready for connect. ch :
>>> java.nio.channels.SocketChannel[connection-pending remote=/
>>> 10.232.83.118:50010]
>>> 2012-02-23 23:04:02,699 ERROR
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
>>
>>> 10.232.83.87:50010,
>>> storageID=DS-1257662823-10.232.83.87-50010-1329398253085, infoPort=50075,
>>> ipcPort=50020):DataXceiver
>>> java.net.SocketTimeoutException: 66000 millis timeout while waiting for
>>> channel to be ready for connect. ch :
>>> java.nio.channels.SocketChannel[connection-pending remote=/
>>> 10.232.83.118:50010]
>>>         at
>>> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
>>>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:656)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:319)
>>>         at
>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:107)
>>>         at java.lang.Thread.run(Thread.java:662)
>>>
>>>
>>> Looking at the target (118) server hdfs log does not seems to show any
>>> problem around the same time.
>>> 2012-02-23 23:04:01,648 INFO
>>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
>>> 10.232.83.118:45623, dest: /10.232.83.118:50010, bytes: 67108864, op:
>>> HDFS_WRITE, cliID: DFSClient_hb_rs_shaked118,60020,1329985953141, offset:
>>> 0, srvID: DS-1348867834-10.232.83.118-50010-1329398246569, blockid:
>>> blk_-1747243057136009792_170577, duration: 6932047000
>>> 2012-02-23 23:04:01,649 INFO
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for
>>> block blk_-1747243057136009792_170577 terminating
>>> 2012-02-23 23:04:01,656 INFO
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>> blk_-4467275870825484381_170577 src: /10.232.83.118:45626 dest: /
>>> 10.232.83.118:50010
>>
>>> 2012-02-23 23:04:03,467 INFO
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>> blk_6330134749736235430_170577 src: /10.232.83.114:49175 dest: /
>>> 10.232.83.118:50010
>>> 2012-02-23 23:04:05,153 INFO
>>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
>>
>>> 10.232.83.118:50010, dest: /10.232.83.118:45615, bytes: 67633152, op:
>>> HDFS_READ, cliID: DFSClient_hb_rs_shaked118,60020,1329985953141, offset: 0,
>>> srvID: DS-1348867834-10.232.83.118-50010-1329398246569, blockid:
>>> blk_-7285361301892533992_165555, duration: 27134342000
>>> 2012-02-23 23:04:08,569 INFO
>>> org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
>>> 10.232.83.118:45626, dest: /10.232.83.118:50010, bytes: 67108864, op:
>>> HDFS_WRITE, cliID: DFSClient_hb_rs_shaked118,60020,1329985953141, offset:
>>> 0, srvID: DS-1348867834-10.232.83.118-50010-1329398246569, blockid:
>>> blk_-4467275870825484381_170577, duration: 6906584000
>>> 2012-02-23 23:04:08,570 INFO
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for
>>> block blk_-4467275870825484381_170577 terminating
>>> 2012-02-23 23:04:08,572 INFO
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>> blk_6927577191995683160_170577 src: /10.232.83.118:45629 dest: /
>>> 10.232.83.118:50010
>>
>>> 2012-02-23 23:04:09,283 INFO
>>> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
>>> blk_7440488846881064366_170577 src: /10.232.83.86:60436 dest: /
>>> 10.232.83.118:50010
>>>
>>> I have checked gc logs, but no pauses where noted (all full gc pauses
>>> <10ms).
>>>
>>> Any idea of what the problem can be?
>>>
>>> I use HB: 0.92.0 and HDFS 1.0.0
>>> Thanks
>>> Mikael.S
>>>
>>
>
>
>
>-- 
>
>Mikael.S
>
>
>

Reply via email to