[jira] [Comment Edited] (HDFS-14820) The default 8KB buffer of BlockReaderRemote#newBlockReader#BufferedOutputStream is too big

2020-03-03 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17050517#comment-17050517
 ] 

Wei-Chiu Chuang edited comment on HDFS-14820 at 3/3/20 8:16 PM:


The current implementation is, DFS client sends a request (which is short) to 
DataNode asking for a block using an output stream. After that, client receives 
block data DataNode  (which can be several MBs long) using an input stream.

This patch changes the buffer size of the former, the output stream. There is 
absolutely no reason to use a 8kb buffer size for this stream. For the input 
stream, yes what [~eyang] says makes sense.

The following is the code snippet for the data sent in each output stream)

{code}
OpReadBlockProto proto = OpReadBlockProto.newBuilder()
.setHeader(DataTransferProtoUtil.buildClientHeader(blk, clientName,
blockToken))
.setOffset(blockOffset)
.setLen(length)
.setSendChecksums(sendChecksum)
.setCachingStrategy(getCachingStrategy(cachingStrategy))
.build();
{code}

Also note that the stream objects are not recycled. One block is one 
output/input stream object.


was (Author: jojochuang):
The current implementation is, DFS client send a request (which is short) to 
DataNode asking for a block using an output stream. After that, client receives 
block data DataNode  (which can be several MBs long) using an input stream.

This patch changes the buffer size of the former, output stream. There is 
absolutely no reason to use a 8kb buffer size for this stream. The input 
stream, yes what [~eyang] says makes sense.

{code}
OpReadBlockProto proto = OpReadBlockProto.newBuilder()
.setHeader(DataTransferProtoUtil.buildClientHeader(blk, clientName,
blockToken))
.setOffset(blockOffset)
.setLen(length)
.setSendChecksums(sendChecksum)
.setCachingStrategy(getCachingStrategy(cachingStrategy))
.build();
{code}

Also note that the stream objects are not recycled. One block is one 
output/input stream object.

>  The default 8KB buffer of 
> BlockReaderRemote#newBlockReader#BufferedOutputStream is too big
> ---
>
> Key: HDFS-14820
> URL: https://issues.apache.org/jira/browse/HDFS-14820
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14820.001.patch, HDFS-14820.002.patch, 
> HDFS-14820.003.patch
>
>
> this issue is similar to HDFS-14535.
> {code:java}
> public static BlockReader newBlockReader(String file,
> ExtendedBlock block,
> Token blockToken,
> long startOffset, long len,
> boolean verifyChecksum,
> String clientName,
> Peer peer, DatanodeID datanodeID,
> PeerCache peerCache,
> CachingStrategy cachingStrategy,
> int networkDistance) throws IOException {
>   // in and out will be closed when sock is closed (by the caller)
>   final DataOutputStream out = new DataOutputStream(new BufferedOutputStream(
>   peer.getOutputStream()));
>   new Sender(out).readBlock(block, blockToken, clientName, startOffset, len,
>   verifyChecksum, cachingStrategy);
> }
> public BufferedOutputStream(OutputStream out) {
> this(out, 8192);
> }
> {code}
> Sender#readBlock parameter( block,blockToken, clientName, startOffset, len, 
> verifyChecksum, cachingStrategy) could not use such a big buffer.
> So i think it should reduce BufferedOutputStream buffer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14820) The default 8KB buffer of BlockReaderRemote#newBlockReader#BufferedOutputStream is too big

2019-11-21 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979714#comment-16979714
 ] 

Wei-Chiu Chuang edited comment on HDFS-14820 at 11/22/19 12:09 AM:
---

In this case, the buffer size is used for client-to-DataNode request. It's not 
used to read the data from DataNode. So IMO, this is a strictly better change 
and it doesn't really change the behavior. There's no good reason to have a 8KB 
buffer.


was (Author: jojochuang):
In this case, the buffer size is used for client-to-DataNode request. It's not 
used to read the data from DataNode. So IMO, this is a strictly better change. 
There's no good reason to have a 8KB buffer.

>  The default 8KB buffer of 
> BlockReaderRemote#newBlockReader#BufferedOutputStream is too big
> ---
>
> Key: HDFS-14820
> URL: https://issues.apache.org/jira/browse/HDFS-14820
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14820.001.patch, HDFS-14820.002.patch
>
>
> this issue is similar to HDFS-14535.
> {code:java}
> public static BlockReader newBlockReader(String file,
> ExtendedBlock block,
> Token blockToken,
> long startOffset, long len,
> boolean verifyChecksum,
> String clientName,
> Peer peer, DatanodeID datanodeID,
> PeerCache peerCache,
> CachingStrategy cachingStrategy,
> int networkDistance) throws IOException {
>   // in and out will be closed when sock is closed (by the caller)
>   final DataOutputStream out = new DataOutputStream(new BufferedOutputStream(
>   peer.getOutputStream()));
>   new Sender(out).readBlock(block, blockToken, clientName, startOffset, len,
>   verifyChecksum, cachingStrategy);
> }
> public BufferedOutputStream(OutputStream out) {
> this(out, 8192);
> }
> {code}
> Sender#readBlock parameter( block,blockToken, clientName, startOffset, len, 
> verifyChecksum, cachingStrategy) could not use such a big buffer.
> So i think it should reduce BufferedOutputStream buffer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14820) The default 8KB buffer of BlockReaderRemote#newBlockReader#BufferedOutputStream is too big

2019-09-09 Thread Lisheng Sun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926304#comment-16926304
 ] 

Lisheng Sun edited comment on HDFS-14820 at 9/10/19 3:38 AM:
-

hi [~elgoiri]
{quote}What is the current default value? 8KB?
{quote}
as follow code, current default value is 8KB.
{code:java}
final DataOutputStream out = new DataOutputStream(new BufferedOutputStream(
  peer.getOutputStream()));

public BufferedOutputStream(OutputStream out) {
this(out, 8192);
}
{code}
i have updated buffer is 512B, taken a lot tests and the resut is ok. I can do 
the pressure test and  use the new buffer in our prodution environment later.
 i agree your suggestion,we can first make it configurable and make the default 
the old value.

Adjust the buffer according to user need.


was (Author: leosun08):
hi [~elgoiri]
{quote}What is the current default value? 8KB?
{quote}
as follow code, current default value is 8KB.
{code:java}
final DataOutputStream out = new DataOutputStream(new BufferedOutputStream(
  peer.getOutputStream()));

public BufferedOutputStream(OutputStream out) {
this(out, 8192);
}
{code}
i have updated buffer is 512B, taken a test and the resut is ok.
 i agree your suggestion,make it configurable and make the default the old 
value.

>  The default 8KB buffer of 
> BlockReaderRemote#newBlockReader#BufferedOutputStream is too big
> ---
>
> Key: HDFS-14820
> URL: https://issues.apache.org/jira/browse/HDFS-14820
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14820.001.patch
>
>
> this issue is similar to HDFS-14535.
> {code:java}
> public static BlockReader newBlockReader(String file,
> ExtendedBlock block,
> Token blockToken,
> long startOffset, long len,
> boolean verifyChecksum,
> String clientName,
> Peer peer, DatanodeID datanodeID,
> PeerCache peerCache,
> CachingStrategy cachingStrategy,
> int networkDistance) throws IOException {
>   // in and out will be closed when sock is closed (by the caller)
>   final DataOutputStream out = new DataOutputStream(new BufferedOutputStream(
>   peer.getOutputStream()));
>   new Sender(out).readBlock(block, blockToken, clientName, startOffset, len,
>   verifyChecksum, cachingStrategy);
> }
> public BufferedOutputStream(OutputStream out) {
> this(out, 8192);
> }
> {code}
> Sender#readBlock parameter( block,blockToken, clientName, startOffset, len, 
> verifyChecksum, cachingStrategy) could not use such a big buffer.
> So i think it should reduce BufferedOutputStream buffer.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org