The hypothesis was that since I was seeing TCP ack delays in ganglia, it may have to do with the TCP_NODELAY setting on the write side. The hdfs client sets this in the read side DFSInputStream, here but not on the DFSOutputStream write side:
https://github.com/apache/hadoop-common/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L836 // TCP_NODELAY is crucial here because of bad interactions between // Nagle's Algorithm and Delayed ACKs. With connection keepalive // between the client and DN, the conversation looks like: // 1. Client -> DN: Read block X // 2. DN -> Client: data for block X // 3. Client -> DN: Status OK (successful read) // 4. Client -> DN: Read block Y // The fact that step #3 and #4 are both in the client->DN direction // triggers Nagling. If the DN is using delayed ACKs, this results // in a delay of 40ms or more. // The fact that I am getting ackDelays on a write test may indicate that we need this set TCP_NODELAY on the HBase HLog write side -- (HDFS's DFSClient.DFSOutputStream in hadoop 0.20.x and DFSOutputStream in 0.23.) I did a quick hack and test adding socket.setNoTcpDelay(true) on that write side of a hadoop 0.20.x and reran the PE tests; unfortunately, we still seem to have the socketTimeoutException problems. Needs more digging.. Jon On Mon, Apr 2, 2012 at 8:50 PM, Stack <[email protected]> wrote: > On Mon, Apr 2, 2012 at 8:19 PM, Jonathan Hsieh <[email protected]> wrote: > > I'm in the process of testing a hypothesis Todd suggested > > and will share results after test is done. > > > > What is the hypothesis? > St.Ack > -- // Jonathan Hsieh (shay) // Software Engineer, Cloudera // [email protected]
