[jira] [Comment Edited] (HDFS-14820) The default 8KB buffer of BlockReaderRemote#newBlockReader#BufferedOutputStream is too big
[ https://issues.apache.org/jira/browse/HDFS-14820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17050517#comment-17050517 ] Wei-Chiu Chuang edited comment on HDFS-14820 at 3/3/20 8:16 PM: The current implementation is, DFS client sends a request (which is short) to DataNode asking for a block using an output stream. After that, client receives block data DataNode (which can be several MBs long) using an input stream. This patch changes the buffer size of the former, the output stream. There is absolutely no reason to use a 8kb buffer size for this stream. For the input stream, yes what [~eyang] says makes sense. The following is the code snippet for the data sent in each output stream) {code} OpReadBlockProto proto = OpReadBlockProto.newBuilder() .setHeader(DataTransferProtoUtil.buildClientHeader(blk, clientName, blockToken)) .setOffset(blockOffset) .setLen(length) .setSendChecksums(sendChecksum) .setCachingStrategy(getCachingStrategy(cachingStrategy)) .build(); {code} Also note that the stream objects are not recycled. One block is one output/input stream object. was (Author: jojochuang): The current implementation is, DFS client send a request (which is short) to DataNode asking for a block using an output stream. After that, client receives block data DataNode (which can be several MBs long) using an input stream. This patch changes the buffer size of the former, output stream. There is absolutely no reason to use a 8kb buffer size for this stream. The input stream, yes what [~eyang] says makes sense. {code} OpReadBlockProto proto = OpReadBlockProto.newBuilder() .setHeader(DataTransferProtoUtil.buildClientHeader(blk, clientName, blockToken)) .setOffset(blockOffset) .setLen(length) .setSendChecksums(sendChecksum) .setCachingStrategy(getCachingStrategy(cachingStrategy)) .build(); {code} Also note that the stream objects are not recycled. One block is one output/input stream object. > The default 8KB buffer of > BlockReaderRemote#newBlockReader#BufferedOutputStream is too big > --- > > Key: HDFS-14820 > URL: https://issues.apache.org/jira/browse/HDFS-14820 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14820.001.patch, HDFS-14820.002.patch, > HDFS-14820.003.patch > > > this issue is similar to HDFS-14535. > {code:java} > public static BlockReader newBlockReader(String file, > ExtendedBlock block, > Token blockToken, > long startOffset, long len, > boolean verifyChecksum, > String clientName, > Peer peer, DatanodeID datanodeID, > PeerCache peerCache, > CachingStrategy cachingStrategy, > int networkDistance) throws IOException { > // in and out will be closed when sock is closed (by the caller) > final DataOutputStream out = new DataOutputStream(new BufferedOutputStream( > peer.getOutputStream())); > new Sender(out).readBlock(block, blockToken, clientName, startOffset, len, > verifyChecksum, cachingStrategy); > } > public BufferedOutputStream(OutputStream out) { > this(out, 8192); > } > {code} > Sender#readBlock parameter( block,blockToken, clientName, startOffset, len, > verifyChecksum, cachingStrategy) could not use such a big buffer. > So i think it should reduce BufferedOutputStream buffer. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14820) The default 8KB buffer of BlockReaderRemote#newBlockReader#BufferedOutputStream is too big
[ https://issues.apache.org/jira/browse/HDFS-14820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979714#comment-16979714 ] Wei-Chiu Chuang edited comment on HDFS-14820 at 11/22/19 12:09 AM: --- In this case, the buffer size is used for client-to-DataNode request. It's not used to read the data from DataNode. So IMO, this is a strictly better change and it doesn't really change the behavior. There's no good reason to have a 8KB buffer. was (Author: jojochuang): In this case, the buffer size is used for client-to-DataNode request. It's not used to read the data from DataNode. So IMO, this is a strictly better change. There's no good reason to have a 8KB buffer. > The default 8KB buffer of > BlockReaderRemote#newBlockReader#BufferedOutputStream is too big > --- > > Key: HDFS-14820 > URL: https://issues.apache.org/jira/browse/HDFS-14820 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14820.001.patch, HDFS-14820.002.patch > > > this issue is similar to HDFS-14535. > {code:java} > public static BlockReader newBlockReader(String file, > ExtendedBlock block, > Token blockToken, > long startOffset, long len, > boolean verifyChecksum, > String clientName, > Peer peer, DatanodeID datanodeID, > PeerCache peerCache, > CachingStrategy cachingStrategy, > int networkDistance) throws IOException { > // in and out will be closed when sock is closed (by the caller) > final DataOutputStream out = new DataOutputStream(new BufferedOutputStream( > peer.getOutputStream())); > new Sender(out).readBlock(block, blockToken, clientName, startOffset, len, > verifyChecksum, cachingStrategy); > } > public BufferedOutputStream(OutputStream out) { > this(out, 8192); > } > {code} > Sender#readBlock parameter( block,blockToken, clientName, startOffset, len, > verifyChecksum, cachingStrategy) could not use such a big buffer. > So i think it should reduce BufferedOutputStream buffer. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-14820) The default 8KB buffer of BlockReaderRemote#newBlockReader#BufferedOutputStream is too big
[ https://issues.apache.org/jira/browse/HDFS-14820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926304#comment-16926304 ] Lisheng Sun edited comment on HDFS-14820 at 9/10/19 3:38 AM: - hi [~elgoiri] {quote}What is the current default value? 8KB? {quote} as follow code, current default value is 8KB. {code:java} final DataOutputStream out = new DataOutputStream(new BufferedOutputStream( peer.getOutputStream())); public BufferedOutputStream(OutputStream out) { this(out, 8192); } {code} i have updated buffer is 512B, taken a lot tests and the resut is ok. I can do the pressure test and use the new buffer in our prodution environment later. i agree your suggestion,we can first make it configurable and make the default the old value. Adjust the buffer according to user need. was (Author: leosun08): hi [~elgoiri] {quote}What is the current default value? 8KB? {quote} as follow code, current default value is 8KB. {code:java} final DataOutputStream out = new DataOutputStream(new BufferedOutputStream( peer.getOutputStream())); public BufferedOutputStream(OutputStream out) { this(out, 8192); } {code} i have updated buffer is 512B, taken a test and the resut is ok. i agree your suggestion,make it configurable and make the default the old value. > The default 8KB buffer of > BlockReaderRemote#newBlockReader#BufferedOutputStream is too big > --- > > Key: HDFS-14820 > URL: https://issues.apache.org/jira/browse/HDFS-14820 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Lisheng Sun >Assignee: Lisheng Sun >Priority: Major > Attachments: HDFS-14820.001.patch > > > this issue is similar to HDFS-14535. > {code:java} > public static BlockReader newBlockReader(String file, > ExtendedBlock block, > Token blockToken, > long startOffset, long len, > boolean verifyChecksum, > String clientName, > Peer peer, DatanodeID datanodeID, > PeerCache peerCache, > CachingStrategy cachingStrategy, > int networkDistance) throws IOException { > // in and out will be closed when sock is closed (by the caller) > final DataOutputStream out = new DataOutputStream(new BufferedOutputStream( > peer.getOutputStream())); > new Sender(out).readBlock(block, blockToken, clientName, startOffset, len, > verifyChecksum, cachingStrategy); > } > public BufferedOutputStream(OutputStream out) { > this(out, 8192); > } > {code} > Sender#readBlock parameter( block,blockToken, clientName, startOffset, len, > verifyChecksum, cachingStrategy) could not use such a big buffer. > So i think it should reduce BufferedOutputStream buffer. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org