Great - thanks for all the information. Reducing the timeout seems like one plausible approach to me. However, I had two follow up questions: 1) For writes - since these need to happen synchronously - if a datanode goes down, we would essentially have some slight inconsistency in terms of data across the 3 replicas - since that node will not have the latest walEdit (say for a walAppend) - in such a case, how does HDFS handle this issue ? What if the DataNode is brought back up before it expired within the NameNode ? 2) Does the NameNode suggest DataNodes that are local to the RegionServer if we have a read operation.
Also, I am running the DataNode with Xmx=1024M - could that be a reason behind the data node crashes ? Thanks Varun On Wed, Nov 7, 2012 at 11:53 AM, Jeremy Carroll <[email protected]> wrote: > Er. Inconsistent. Sorry for typo. Basically when the underlying file system > is unstable, HBase can become unstable as well. > > On Wed, Nov 7, 2012 at 11:52 AM, Jeremy Carroll <[email protected]> > wrote: > > > It's important to realize that HBase is a strongly consistent system. So > > if a DataNode is down (But looks alive due to HDFS not marking it as > down). > > The system will choose to be unavailable, rather than consistent. During > > this timeframe where the underlying HDFS file system was not operating > > normally (Did not mark nodes as failed), HBase can give up / timeout on a > > lot of operations. HDFS replicates data by default factor of 3. So during > > this time the node may have been a replication target, but unable to > > satisfy that request. > > > > > > On Wed, Nov 7, 2012 at 9:57 AM, Varun Sharma <[email protected]> > wrote: > > > >> Thanks for the response. One more point is that I am running hadoop > 1.0.4 > >> with hbase 0.92 - not sure if that is known to have these issues. > >> > >> I had one quick question though - these logs are picked from > 10.31.138.145 > >> and from my understanding of the logs below, its still going to another > >> bad > >> datanode for retrieving the block even though it should already have the > >> data block - see last line... > >> > >> 12/11/07 02:17:45 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor > >> exception for block blk_2813460962462751946_78454java.io.IOException: > Bad > >> response 1 for block blk_2813460962462751946_78454 from datanode > >> 10.31.190.107:9200 > >> at > >> > >> > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:3084) > >> > >> 12/11/07 02:17:45 WARN hdfs.DFSClient: Error Recovery for block > >> blk_2813460962462751946_78454 bad datanode[1] 10.31.190.107:9200 > >> 12/11/07 02:17:45 WARN hdfs.DFSClient: Error Recovery for block > >> blk_2813460962462751946_78454 in pipeline *10.31.138.245:9200, > >> 10.31.190.107:9200, 10.159.19.90:9200: bad datanode 10.31.190.107:9200* > >> > >> Looking at the DataNode logs - it seems that the local datanode is > trying > >> to connect to the remote bad datanode. Is this for replicating the > >> WALEdit ? > >> > >> 2012-11-07 02:17:45,142 INFO > >> org.apache.hadoop.hdfs.server.datanode.DataNode > >> (PacketResponder 2 for Block blk_2813460962462751946_78454): > >> PacketResponder blk_2813460962462751946_78454 2 Exception > >> java.net.SocketTimeoutException: > >> 66000 millis timeout while waiting for channel to be ready for read. ch > : > >> java.nio.channels.SocketChannel[*connected local=/**10.31.138.245:33965 > >> remote=/10.31.190.107:9200]* > >> * > >> * > >> Also, this is preceded by a whole bunch of slow operations with > >> processingtimems close to 20 seconds like these - are these other slow > >> walEdit appends (slowed down due to HDFS) ? > >> > >> 12/11/07 02:16:01 WARN ipc.HBaseServer: (responseTooSlow): > >> > >> > {"processingtimems":21957,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@7198c05d > >> ), > >> rpc version=1, client version=29, > methodsFingerPrint=54742778","client":" > >> 10.31.128.131:55327 > >> > >> > ","starttimems":1352254539935,"queuetimems":0,"class":"HRegionServer","responsesize":0,"method":"multi"} > >> > >> Thanks > >> Varun > >> > >> On Wed, Nov 7, 2012 at 7:25 AM, Jeremy Carroll <[email protected]> > >> wrote: > >> > >> > Sorry. It's early in the morning here. Did not see the 'read timeout'. > >> +1 > >> > to Nicolas's response. > >> > > >> > On Wed, Nov 7, 2012 at 7:22 AM, Jeremy Carroll <[email protected]> > >> > wrote: > >> > > >> > > One trick I have used for a while is to > >> > > set dfs.datanode.socket.write.timeout in hdfs-site.xml to 0 > >> (disabled). > >> > > It's not going to solve your underlying IOPS capacity issue with > your > >> > > servers, but it can help for short bursty periods. Basically it's > >> hiding > >> > > the real issue, but it can help in the short term. > >> > > > >> > > > >> > > On Wed, Nov 7, 2012 at 1:43 AM, Varun Sharma <[email protected]> > >> > wrote: > >> > > > >> > >> Hi, > >> > >> > >> > >> I am seeing extremely long HDFS timeouts - and this seems to be > >> > associated > >> > >> with the loss of a DataNode. Here is the RS log: > >> > >> > >> > >> 12/11/07 02:17:45 WARN hdfs.DFSClient: DFSOutputStream > >> ResponseProcessor > >> > >> exception for block > >> blk_2813460962462751946_78454java.io.IOException: > >> > Bad > >> > >> response 1 for block blk_2813460962462751946_78454 from datanode > >> > >> 10.31.190.107:9200 > >> > >> at > >> > >> > >> > >> > >> > > >> > org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:3084) > >> > >> > >> > >> 12/11/07 02:17:45 WARN hdfs.DFSClient: Error Recovery for block > >> > >> blk_2813460962462751946_78454 bad datanode[1] 10.31.190.107:9200 > >> > >> 12/11/07 02:17:45 WARN hdfs.DFSClient: Error Recovery for block > >> > >> blk_2813460962462751946_78454 in pipeline 10.31.138.245:9200, > >> > >> 10.31.190.107:9200, 10.159.19.90:9200: bad datanode > >> 10.31.190.107:9200 > >> > >> 12/11/07 02:17:45 WARN wal.HLog: IPC Server handler 35 on 60020 > took > >> > 65955 > >> > >> ms appending an edit to hlog; editcount=476686, len~=76.0 > >> > >> 12/11/07 02:17:45 WARN wal.HLog: HDFS pipeline error detected. > Found > >> 2 > >> > >> replicas but expecting no less than 3 replicas. Requesting close > of > >> > hlog. > >> > >> > >> > >> The corresponding DN log goes like this > >> > >> > >> > >> 2012-11-07 02:17:45,142 INFO > >> > >> org.apache.hadoop.hdfs.server.datanode.DataNode (PacketResponder 2 > >> for > >> > >> Block blk_2813460962462751946_78454): PacketResponder > >> > >> blk_2813460962462751946_78454 2 Exception > >> > java.net.SocketTimeoutException: > >> > >> 66000 millis timeout while waiting for channel to be ready for > read. > >> ch > >> > : > >> > >> java.nio.channels.SocketChannel[connected local=/ > 10.31.138.245:33965 > >> > >> remote=/ > >> > >> 10.31.190.107:9200] > >> > >> at > >> > >> > >> > >> > >> > > >> > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164) > >> > >> at > >> > >> > >> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) > >> > >> at > >> > >> > >> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) > >> > >> at > >> java.io.DataInputStream.readFully(DataInputStream.java:178) > >> > >> at > java.io.DataInputStream.readLong(DataInputStream.java:399) > >> > >> at > >> > >> > >> > >> > >> > > >> > org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:124) > >> > >> at > >> > >> > >> > >> > >> > > >> > org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:806) > >> > >> at java.lang.Thread.run(Thread.java:662) > >> > >> > >> > >> It seems like the DataNode local to the region server is trying to > >> grab > >> > >> the > >> > >> block from another DN and that is timing out because of this other > >> data > >> > >> node being bad. All in all this causes response times to be > terribly > >> > poor. > >> > >> Is there a way around this or am I missing something ? > >> > >> > >> > >> Varun > >> > >> > >> > > > >> > > > >> > > >> > > > > >
