Bob Hansen created HDFS-8746:
--------------------------------

             Summary: Reduce the latency of streaming reads by re-using DN 
connections
                 Key: HDFS-8746
                 URL: https://issues.apache.org/jira/browse/HDFS-8746
             Project: Hadoop HDFS
          Issue Type: Sub-task
          Components: hdfs-client
            Reporter: Bob Hansen
            Assignee: Bob Hansen


The current libhdfspp implementation opens a new connection for each pread.  
For streaming reads (especially streaming short-buffer reads coming from the C 
API, and especially once we get SSL handshake overhead), our throughput will be 
dominated by the connection latency of reconnecting to the DataNodes.

The target use case is a multi-block file that is being sequentially streamed 
and processed by the client application, which consumes the data as it comes 
from the DN and throws it away.  The data is read into moderately small buffers 
(~64k - ~1MB) owned by the consumer, and overall throughput is the critical 
metric.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to