On Tue, Apr 3, 2012 at 9:56 AM, Jonathan Hsieh <[email protected]> wrote: > The hypothesis was that since I was seeing TCP ack delays in ganglia, it > may have to do with the TCP_NODELAY setting on the write side. The hdfs > client sets this in the read side DFSInputStream, here but not on the > DFSOutputStream write side: > > https://github.com/apache/hadoop-common/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L836 > > > // TCP_NODELAY is crucial here because of bad interactions between > // Nagle's Algorithm and Delayed ACKs. With connection keepalive > // between the client and DN, the conversation looks like: > // 1. Client -> DN: Read block X > // 2. DN -> Client: data for block X > // 3. Client -> DN: Status OK (successful read) > // 4. Client -> DN: Read block Y > // The fact that step #3 and #4 are both in the client->DN direction > // triggers Nagling. If the DN is using delayed ACKs, this results > // in a delay of 40ms or more. > // > > The fact that I am getting ackDelays on a write test may indicate that we > need this set TCP_NODELAY on the HBase HLog write side -- > (HDFS's DFSClient.DFSOutputStream in hadoop 0.20.x and DFSOutputStream in > 0.23.) I did a quick hack and test adding socket.setNoTcpDelay(true) on > that write side of a hadoop 0.20.x and reran the PE tests; unfortunately, > we still seem to have the socketTimeoutException problems. Needs more > digging.. >
Thanks Jon for the above. St.Ack
