Thanks Andrew, for the quick response.
I suspect also the network, i have ssh that is blocking from time to time.
How do you check the  switch/link/uplink ?

Mikael.S

On Fri, Feb 24, 2012 at 12:06 AM, Andrew Purtell <[email protected]>wrote:

> Check your switch/link/uplink utilization.
>
> HDFS-941 might help. That is not in Hadoop 1.0 according to a cursory
> search over branch history in the Git mirror.
>
>
> As another datapoint, we see this in our production with a Hadoop that is
> much closer to CDH3; but, we have some known issues with the network design
> in our legacy datacenters and plan to resolve it with an eventual
> relocation. I'm also integrating HDFS-941.
>
>
> Best regards,
>
>
>     - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>
>
>
> ----- Original Message -----
> > From: Mikael Sitruk <[email protected]>
> > To: [email protected]
> > Cc:
> > Sent: Thursday, February 23, 2012 1:25 PM
> > Subject: Exception in hbase 0.92. with DFS, - Bad connect ack
> >
> > Hi
> >
> > I see that i have in my hbase logs a lot of the following (target IP is
> > changing)
> > 2012-02-23 23:04:02,699 INFO org.apache.hadoop.hdfs.DFSClient: Exception
> in
> > createBlockOutputStream 10.232.83.87:50010 java.io.IOException: Bad
> connect
> > ack with firstBadLink as 10.232.83.118:50010
> > 2012-02-23 23:04:02,699 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
> > block blk_4678388308309640326_170570
> > 2012-02-23 23:04:02,701 INFO org.apache.hadoop.hdfs.DFSClient: Excluding
> > datanode 10.232.83.118:50010
> >
> > Then checking the hdfs log of the same server (87)
> > 2012-02-23 23:04:02,698 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
> > blk_4678388308309640326_170570 received exception
> > java.net.SocketTimeoutException: 66000 millis timeout while waiting for
> > channel to be ready for connect. ch :
> > java.nio.channels.SocketChannel[connection-pending remote=/
> > 10.232.83.118:50010]
> > 2012-02-23 23:04:02,699 ERROR
> > org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> > 10.232.83.87:50010,
> > storageID=DS-1257662823-10.232.83.87-50010-1329398253085, infoPort=50075,
> > ipcPort=50020):DataXceiver
> > java.net.SocketTimeoutException: 66000 millis timeout while waiting for
> > channel to be ready for connect. ch :
> > java.nio.channels.SocketChannel[connection-pending remote=/
> > 10.232.83.118:50010]
> >         at
> >
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
> >         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:656)
> >         at
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:319)
> >         at
> >
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:107)
> >         at java.lang.Thread.run(Thread.java:662)
> >
> >
> > Looking at the target (118) server hdfs log does not seems to show any
> > problem around the same time.
> > 2012-02-23 23:04:01,648 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> > 10.232.83.118:45623, dest: /10.232.83.118:50010, bytes: 67108864, op:
> > HDFS_WRITE, cliID: DFSClient_hb_rs_shaked118,60020,1329985953141, offset:
> > 0, srvID: DS-1348867834-10.232.83.118-50010-1329398246569, blockid:
> > blk_-1747243057136009792_170577, duration: 6932047000
> > 2012-02-23 23:04:01,649 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for
> > block blk_-1747243057136009792_170577 terminating
> > 2012-02-23 23:04:01,656 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> > blk_-4467275870825484381_170577 src: /10.232.83.118:45626 dest: /
> > 10.232.83.118:50010
> > 2012-02-23 23:04:03,467 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> > blk_6330134749736235430_170577 src: /10.232.83.114:49175 dest: /
> > 10.232.83.118:50010
> > 2012-02-23 23:04:05,153 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> > 10.232.83.118:50010, dest: /10.232.83.118:45615, bytes: 67633152, op:
> > HDFS_READ, cliID: DFSClient_hb_rs_shaked118,60020,1329985953141, offset:
> 0,
> > srvID: DS-1348867834-10.232.83.118-50010-1329398246569, blockid:
> > blk_-7285361301892533992_165555, duration: 27134342000
> > 2012-02-23 23:04:08,569 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /
> > 10.232.83.118:45626, dest: /10.232.83.118:50010, bytes: 67108864, op:
> > HDFS_WRITE, cliID: DFSClient_hb_rs_shaked118,60020,1329985953141, offset:
> > 0, srvID: DS-1348867834-10.232.83.118-50010-1329398246569, blockid:
> > blk_-4467275870825484381_170577, duration: 6906584000
> > 2012-02-23 23:04:08,570 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for
> > block blk_-4467275870825484381_170577 terminating
> > 2012-02-23 23:04:08,572 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> > blk_6927577191995683160_170577 src: /10.232.83.118:45629 dest: /
> > 10.232.83.118:50010
> > 2012-02-23 23:04:09,283 INFO
> > org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> > blk_7440488846881064366_170577 src: /10.232.83.86:60436 dest: /
> > 10.232.83.118:50010
> >
> > I have checked gc logs, but no pauses where noted (all full gc pauses
> > <10ms).
> >
> > Any idea of what the problem can be?
> >
> > I use HB: 0.92.0 and HDFS 1.0.0
> > Thanks
> > Mikael.S
> >
>



-- 
Mikael.S

Reply via email to