[jira] [Commented] (HDFS-15131) FoldedTreeSet appears to degrade over time

2020-08-27 Thread Mania Abdi (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186025#comment-17186025
 ] 

Mania Abdi commented on HDFS-15131:
---

[~sodonnell]  I am researching on debugging distributed system,  I am building 
a tool that allows developers to find the frequent processing patterns within 
the workflow graphs of applications. This bug seems like a case study for my 
proposed research. Would it be possible to let me know what was the process of 
catching this bugs for you? and  how can I reproduce this bug, what was the 
test that leads to this bug? 

 Is it possible that we get access to the test program you mentioned 2 comments 
above?

 

> FoldedTreeSet appears to degrade over time
> --
>
> Key: HDFS-15131
> URL: https://issues.apache.org/jira/browse/HDFS-15131
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.3.0
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>
> We have seen some occurrences of the Namenode getting very slow on delete 
> operations, to the point where IBRs get blocked frequently and files fail to 
> close. On one cluster in particular, after about 4 weeks of uptime, the 
> Namenode started responding very poorly. Restarting it corrected the problem 
> for another 4 weeks. 
> In that example, jstacks in the namenode always pointed to slow operations 
> around a HDFS delete call which was performing an operation on the 
> FoldedTreeSet structure. The captured jstacks always pointed at an operation 
> on the folded tree set each time they were sampled:
> {code}
> "IPC Server handler 573 on 8020" #663 daemon prio=5 os_prio=0 
> tid=0x7fe6a4087800 nid=0x97a6 runnable [0x7fe67bdfd000]
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.removeAndGet(FoldedTreeSet.java:879)
>   at 
> org.apache.hadoop.hdfs.util.FoldedTreeSet.remove(FoldedTreeSet.java:911)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo.removeBlock(DatanodeStorageInfo.java:263)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:194)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlocksMap.removeBlock(BlocksMap.java:108)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlockFromMap(BlockManager.java:3676)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.removeBlock(BlockManager.java:3507)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4158)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4132)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4069)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4053)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:845)
>   at 
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:308)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:603)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2212)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
> {code}
> The observation in this case, was that the namenode worked fine after a 
> restart and then at some point after about 4 weeks of uptime, this problem 
> started happening, and it would persist until the namenode was restarted. 
> Then the problem did not return for about another 4 weeks.
> On a completely different cluster and version, I recently came across a 
> problem where files were again failing to close (last block does not have 
> sufficient number of replicas) and the datanodes were logging a lot of 
> messages like the following:
> {code}
> 2019-11-27 09:00:49,678 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Took 21540ms to process 1 commands from NN
> {code}
> These messages had a range of durations and were fairly frequent. Focusing on 
> the longer messages at around 20 seconds and checking a few 

[jira] [Comment Edited] (HDFS-14111) hdfsOpenFile on HDFS causes unnecessary IO from file offset 0

2020-08-27 Thread Mania Abdi (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186020#comment-17186020
 ] 

Mania Abdi edited comment on HDFS-14111 at 8/27/20, 5:57 PM:
-

Hi 

[~tlipcon] and 

[~stakiar] 

[~tlipcon] I am researching on debugging distributed system,  I am building a 
tool that allows developers to find the frequent processing patterns within the 
workflow graphs of applications. This bug seems like a case study for my 
proposed research. Would it be possible to let me know what was the process of 
catching this bugs for you? and  how can I reproduce this bug, what was the 
test that leads to this bug? 

 

Regards

Mania

 

 


was (Author: maniaabdi):
Hi 

[~tlipcon] and 

[~stakiar]  I am researching on debugging distributed system,  I am building a 
tool that allows developers to find the frequent processing patterns within the 
workflow graphs of applications. This bug seems like a case study for my 
proposed research. Would it be possible to let me know what was the process of 
catching this bugs for you? and  how can I reproduce this bug, what was the 
test that leads to this bug? 

 

Regards

Mania

 

 

> hdfsOpenFile on HDFS causes unnecessary IO from file offset 0
> -
>
> Key: HDFS-14111
> URL: https://issues.apache.org/jira/browse/HDFS-14111
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, libhdfs
>Affects Versions: 3.2.0
>Reporter: Todd Lipcon
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14111.001.patch, HDFS-14111.002.patch, 
> HDFS-14111.003.patch
>
>
> hdfsOpenFile() calls readDirect() with a 0-length argument in order to check 
> whether the underlying stream supports bytebuffer reads. With DFSInputStream, 
> the read(0) isn't short circuited, and results in the DFSClient opening a 
> block reader. In the case of a remote block, the block reader will actually 
> issue a read of the whole block, causing the datanode to perform unnecessary 
> IO and network transfers in order to fill up the client's TCP buffers. This 
> causes performance degradation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14111) hdfsOpenFile on HDFS causes unnecessary IO from file offset 0

2020-08-27 Thread Mania Abdi (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186020#comment-17186020
 ] 

Mania Abdi edited comment on HDFS-14111 at 8/27/20, 5:56 PM:
-

Hi 

[~tlipcon] and 

[~stakiar]  I am researching on debugging distributed system,  I am building a 
tool that allows developers to find the frequent processing patterns within the 
workflow graphs of applications. This bug seems like a case study for my 
proposed research. Would it be possible to let me know what was the process of 
catching this bugs for you? and  how can I reproduce this bug, what was the 
test that leads to this bug? 

 

Regards

Mania

 

 


was (Author: maniaabdi):
Hi 

[~tlipcon] and 

[~stakiar]  I am researching on debugging distributed system,  I am building a 
tool that allows developers to find the frequent processing patterns within the 
workflow graphs of applications. This bug seems like a case study for my 
proposed research. Would it be possible to let me know what was the process of 
catching this bugs for you? and  how can I reproduce this bug? 

 

Regards

Mania

 

 

> hdfsOpenFile on HDFS causes unnecessary IO from file offset 0
> -
>
> Key: HDFS-14111
> URL: https://issues.apache.org/jira/browse/HDFS-14111
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, libhdfs
>Affects Versions: 3.2.0
>Reporter: Todd Lipcon
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14111.001.patch, HDFS-14111.002.patch, 
> HDFS-14111.003.patch
>
>
> hdfsOpenFile() calls readDirect() with a 0-length argument in order to check 
> whether the underlying stream supports bytebuffer reads. With DFSInputStream, 
> the read(0) isn't short circuited, and results in the DFSClient opening a 
> block reader. In the case of a remote block, the block reader will actually 
> issue a read of the whole block, causing the datanode to perform unnecessary 
> IO and network transfers in order to fill up the client's TCP buffers. This 
> causes performance degradation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14111) hdfsOpenFile on HDFS causes unnecessary IO from file offset 0

2020-08-27 Thread Mania Abdi (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17186020#comment-17186020
 ] 

Mania Abdi commented on HDFS-14111:
---

Hi 

[~tlipcon] and 

[~stakiar]  I am researching on debugging distributed system,  I am building a 
tool that allows developers to find the frequent processing patterns within the 
workflow graphs of applications. This bug seems like a case study for my 
proposed research. Would it be possible to let me know what was the process of 
catching this bugs for you? and  how can I reproduce this bug? 

 

Regards

Mania

 

 

> hdfsOpenFile on HDFS causes unnecessary IO from file offset 0
> -
>
> Key: HDFS-14111
> URL: https://issues.apache.org/jira/browse/HDFS-14111
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client, libhdfs
>Affects Versions: 3.2.0
>Reporter: Todd Lipcon
>Assignee: Sahil Takiar
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14111.001.patch, HDFS-14111.002.patch, 
> HDFS-14111.003.patch
>
>
> hdfsOpenFile() calls readDirect() with a 0-length argument in order to check 
> whether the underlying stream supports bytebuffer reads. With DFSInputStream, 
> the read(0) isn't short circuited, and results in the DFSClient opening a 
> block reader. In the case of a remote block, the block reader will actually 
> issue a read of the whole block, causing the datanode to perform unnecessary 
> IO and network transfers in order to fill up the client's TCP buffers. This 
> causes performance degradation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15206) HDFS synchronous reads from local file system

2020-03-04 Thread Mania Abdi (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mania Abdi updated HDFS-15206:
--
Labels: performance  (was: )

> HDFS synchronous reads from local file system
> -
>
> Key: HDFS-15206
> URL: https://issues.apache.org/jira/browse/HDFS-15206
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
> Environment: !Screenshot from 2020-03-03 22-07-26.png!
>Reporter: Mania Abdi
>Priority: Minor
>  Labels: performance
> Attachments: Screenshot from 2020-03-03 22-07-26.png
>
>
> Hello everyone,
> I ran a simple benchmark with runs ``` hadoop fs -get /file1.txt ```, and 
> file1.txt has 1MB size and I capture the workflow of requests using XTrace. 
> By evaluating the workflow trace, I noticed that datanode issues 64KB 
> synchronous read request to local file system to read the data, and sends the 
> data back and waits for completion. I had a code walk over HDFS code to 
> verify the pattern and it was correct. I want to have two suggestions, (1) 
> since each file in HDFS block size is usually 128MB, We could use the mmap 
> mapping via FileChannel class to load the file into memory and enable file 
> system prefetching and look ahead in background, instead of synchronously 
> reading from disk. The second suggestion is to use asynchronous read 
> operations to local disk of the datanode. I was wondering if there is a logic 
> behind synchronous reads from the file system?
>  
> Code: 
> $HADOOP_SRC/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java
>  line 586
> {code:java}
>   /**
>* Sends a packet with up to maxChunks chunks of data.
>* 
>* @param pkt buffer used for writing packet data
>* @param maxChunks maximum number of chunks to send
>* @param out stream to send data to
>* @param transferTo use transferTo to send data
>* @param throttler used for throttling data transfer bandwidth
>*/
>   private int sendPacket(ByteBuffer pkt, int maxChunks, OutputStream out,
>   boolean transferTo, DataTransferThrottler throttler) throws IOException 
> {
> int dataLen = (int) Math.min(endOffset - offset,
>  (chunkSize * (long) maxChunks));
> 
> int numChunks = numberOfChunks(dataLen); // Number of chunks be sent in 
> the packet
> int checksumDataLen = numChunks * checksumSize;
> int packetLen = dataLen + checksumDataLen + 4;
> boolean lastDataPacket = offset + dataLen == endOffset && dataLen > 0;
> // The packet buffer is organized as follows:
> // ___D?D?D?D?
> //^   ^
> //|   \ checksumOff
> //\ headerOff
> // _ padding, since the header is variable-length
> // H = header and length prefixes
> // C = checksums
> // D? = data, if transferTo is false.
> 
> int headerLen = writePacketHeader(pkt, dataLen, packetLen);
> 
> // Per above, the header doesn't start at the beginning of the
> // buffer
> int headerOff = pkt.position() - headerLen;
> 
> int checksumOff = pkt.position();
> byte[] buf = pkt.array();
> 
> if (checksumSize > 0 && checksumIn != null) {
>   readChecksum(buf, checksumOff, checksumDataLen);  // write in 
> progress that we need to use to get last checksum
>   if (lastDataPacket && lastChunkChecksum != null) {
> int start = checksumOff + checksumDataLen - checksumSize;
> byte[] updatedChecksum = lastChunkChecksum.getChecksum();
> 
> if (updatedChecksum != null) {
>   System.arraycopy(updatedChecksum, 0, buf, start, checksumSize);
> }
>   }
> }
> 
> int dataOff = checksumOff + checksumDataLen;
> if (!transferTo) { // normal transfer
>   IOUtils.readFully(blockIn, buf, dataOff, dataLen);  
>   if (verifyChecksum) {
> verifyChecksum(buf, dataOff, dataLen, numChunks, checksumOff);
>   }
> }
> 
> try {
>   if (transferTo) {
> SocketOutputStream sockOut = (SocketOutputStream)out;
> // First write header and checksums
> sockOut.write(buf, headerOff, dataOff - headerOff);
> 
> // no need to flush since we know out is not a buffered stream
> FileChannel fileCh = ((FileInputStream)blockIn).getChannel();
> LongWritable waitTime = new LongWritable();
> LongWritable transferTime = new LongWritable();
> sockOut.transferToFully(fileCh, blockInPosition, dataLen, 
> waitTime, transferTime);
> 
> datanode.metrics.addSendDataPacketBlockedOnNetworkNanos(waitTime.get());
> datanode.metrics.addSendDataPacketTransferNanos(transferTime.get());
> blockInPosition += 

[jira] [Updated] (HDFS-15206) HDFS synchronous reads from local file system

2020-03-04 Thread Mania Abdi (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mania Abdi updated HDFS-15206:
--
Component/s: datanode

> HDFS synchronous reads from local file system
> -
>
> Key: HDFS-15206
> URL: https://issues.apache.org/jira/browse/HDFS-15206
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
> Environment: !Screenshot from 2020-03-03 22-07-26.png!
>Reporter: Mania Abdi
>Priority: Minor
> Attachments: Screenshot from 2020-03-03 22-07-26.png
>
>
> Hello everyone,
> I ran a simple benchmark with runs ``` hadoop fs -get /file1.txt ```, and 
> file1.txt has 1MB size and I capture the workflow of requests using XTrace. 
> By evaluating the workflow trace, I noticed that datanode issues 64KB 
> synchronous read request to local file system to read the data, and sends the 
> data back and waits for completion. I had a code walk over HDFS code to 
> verify the pattern and it was correct. I want to have two suggestions, (1) 
> since each file in HDFS block size is usually 128MB, We could use the mmap 
> mapping via FileChannel class to load the file into memory and enable file 
> system prefetching and look ahead in background, instead of synchronously 
> reading from disk. The second suggestion is to use asynchronous read 
> operations to local disk of the datanode. I was wondering if there is a logic 
> behind synchronous reads from the file system?
>  
> Code: 
> $HADOOP_SRC/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java
>  line 586
> {code:java}
>   /**
>* Sends a packet with up to maxChunks chunks of data.
>* 
>* @param pkt buffer used for writing packet data
>* @param maxChunks maximum number of chunks to send
>* @param out stream to send data to
>* @param transferTo use transferTo to send data
>* @param throttler used for throttling data transfer bandwidth
>*/
>   private int sendPacket(ByteBuffer pkt, int maxChunks, OutputStream out,
>   boolean transferTo, DataTransferThrottler throttler) throws IOException 
> {
> int dataLen = (int) Math.min(endOffset - offset,
>  (chunkSize * (long) maxChunks));
> 
> int numChunks = numberOfChunks(dataLen); // Number of chunks be sent in 
> the packet
> int checksumDataLen = numChunks * checksumSize;
> int packetLen = dataLen + checksumDataLen + 4;
> boolean lastDataPacket = offset + dataLen == endOffset && dataLen > 0;
> // The packet buffer is organized as follows:
> // ___D?D?D?D?
> //^   ^
> //|   \ checksumOff
> //\ headerOff
> // _ padding, since the header is variable-length
> // H = header and length prefixes
> // C = checksums
> // D? = data, if transferTo is false.
> 
> int headerLen = writePacketHeader(pkt, dataLen, packetLen);
> 
> // Per above, the header doesn't start at the beginning of the
> // buffer
> int headerOff = pkt.position() - headerLen;
> 
> int checksumOff = pkt.position();
> byte[] buf = pkt.array();
> 
> if (checksumSize > 0 && checksumIn != null) {
>   readChecksum(buf, checksumOff, checksumDataLen);  // write in 
> progress that we need to use to get last checksum
>   if (lastDataPacket && lastChunkChecksum != null) {
> int start = checksumOff + checksumDataLen - checksumSize;
> byte[] updatedChecksum = lastChunkChecksum.getChecksum();
> 
> if (updatedChecksum != null) {
>   System.arraycopy(updatedChecksum, 0, buf, start, checksumSize);
> }
>   }
> }
> 
> int dataOff = checksumOff + checksumDataLen;
> if (!transferTo) { // normal transfer
>   IOUtils.readFully(blockIn, buf, dataOff, dataLen);  
>   if (verifyChecksum) {
> verifyChecksum(buf, dataOff, dataLen, numChunks, checksumOff);
>   }
> }
> 
> try {
>   if (transferTo) {
> SocketOutputStream sockOut = (SocketOutputStream)out;
> // First write header and checksums
> sockOut.write(buf, headerOff, dataOff - headerOff);
> 
> // no need to flush since we know out is not a buffered stream
> FileChannel fileCh = ((FileInputStream)blockIn).getChannel();
> LongWritable waitTime = new LongWritable();
> LongWritable transferTime = new LongWritable();
> sockOut.transferToFully(fileCh, blockInPosition, dataLen, 
> waitTime, transferTime);
> 
> datanode.metrics.addSendDataPacketBlockedOnNetworkNanos(waitTime.get());
> datanode.metrics.addSendDataPacketTransferNanos(transferTime.get());
> blockInPosition += dataLen;
>   } else {
> // normal 

[jira] [Updated] (HDFS-15206) HDFS synchronous reads from local file system

2020-03-04 Thread Mania Abdi (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mania Abdi updated HDFS-15206:
--
Priority: Minor  (was: Major)

> HDFS synchronous reads from local file system
> -
>
> Key: HDFS-15206
> URL: https://issues.apache.org/jira/browse/HDFS-15206
> Project: Hadoop HDFS
>  Issue Type: Improvement
> Environment: !Screenshot from 2020-03-03 22-07-26.png!
>Reporter: Mania Abdi
>Priority: Minor
> Attachments: Screenshot from 2020-03-03 22-07-26.png
>
>
> Hello everyone,
> I ran a simple benchmark with runs ``` hadoop fs -get /file1.txt ```, and 
> file1.txt has 1MB size and I capture the workflow of requests using XTrace. 
> By evaluating the workflow trace, I noticed that datanode issues 64KB 
> synchronous read request to local file system to read the data, and sends the 
> data back and waits for completion. I had a code walk over HDFS code to 
> verify the pattern and it was correct. I want to have two suggestions, (1) 
> since each file in HDFS block size is usually 128MB, We could use the mmap 
> mapping via FileChannel class to load the file into memory and enable file 
> system prefetching and look ahead in background, instead of synchronously 
> reading from disk. The second suggestion is to use asynchronous read 
> operations to local disk of the datanode. I was wondering if there is a logic 
> behind synchronous reads from the file system?
>  
> Code: 
> $HADOOP_SRC/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java
>  line 586
> {code:java}
>   /**
>* Sends a packet with up to maxChunks chunks of data.
>* 
>* @param pkt buffer used for writing packet data
>* @param maxChunks maximum number of chunks to send
>* @param out stream to send data to
>* @param transferTo use transferTo to send data
>* @param throttler used for throttling data transfer bandwidth
>*/
>   private int sendPacket(ByteBuffer pkt, int maxChunks, OutputStream out,
>   boolean transferTo, DataTransferThrottler throttler) throws IOException 
> {
> int dataLen = (int) Math.min(endOffset - offset,
>  (chunkSize * (long) maxChunks));
> 
> int numChunks = numberOfChunks(dataLen); // Number of chunks be sent in 
> the packet
> int checksumDataLen = numChunks * checksumSize;
> int packetLen = dataLen + checksumDataLen + 4;
> boolean lastDataPacket = offset + dataLen == endOffset && dataLen > 0;
> // The packet buffer is organized as follows:
> // ___D?D?D?D?
> //^   ^
> //|   \ checksumOff
> //\ headerOff
> // _ padding, since the header is variable-length
> // H = header and length prefixes
> // C = checksums
> // D? = data, if transferTo is false.
> 
> int headerLen = writePacketHeader(pkt, dataLen, packetLen);
> 
> // Per above, the header doesn't start at the beginning of the
> // buffer
> int headerOff = pkt.position() - headerLen;
> 
> int checksumOff = pkt.position();
> byte[] buf = pkt.array();
> 
> if (checksumSize > 0 && checksumIn != null) {
>   readChecksum(buf, checksumOff, checksumDataLen);  // write in 
> progress that we need to use to get last checksum
>   if (lastDataPacket && lastChunkChecksum != null) {
> int start = checksumOff + checksumDataLen - checksumSize;
> byte[] updatedChecksum = lastChunkChecksum.getChecksum();
> 
> if (updatedChecksum != null) {
>   System.arraycopy(updatedChecksum, 0, buf, start, checksumSize);
> }
>   }
> }
> 
> int dataOff = checksumOff + checksumDataLen;
> if (!transferTo) { // normal transfer
>   IOUtils.readFully(blockIn, buf, dataOff, dataLen);  
>   if (verifyChecksum) {
> verifyChecksum(buf, dataOff, dataLen, numChunks, checksumOff);
>   }
> }
> 
> try {
>   if (transferTo) {
> SocketOutputStream sockOut = (SocketOutputStream)out;
> // First write header and checksums
> sockOut.write(buf, headerOff, dataOff - headerOff);
> 
> // no need to flush since we know out is not a buffered stream
> FileChannel fileCh = ((FileInputStream)blockIn).getChannel();
> LongWritable waitTime = new LongWritable();
> LongWritable transferTime = new LongWritable();
> sockOut.transferToFully(fileCh, blockInPosition, dataLen, 
> waitTime, transferTime);
> 
> datanode.metrics.addSendDataPacketBlockedOnNetworkNanos(waitTime.get());
> datanode.metrics.addSendDataPacketTransferNanos(transferTime.get());
> blockInPosition += dataLen;
>   } else {
> // normal transfer
> 

[jira] [Updated] (HDFS-15206) HDFS synchronous reads from local file system

2020-03-03 Thread Mania Abdi (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mania Abdi updated HDFS-15206:
--
Description: 
Hello everyone,

I ran a simple benchmark with runs ``` hadoop fs -get /file1.txt ```, and 
file1.txt has 1MB size and I capture the workflow of requests using XTrace. By 
evaluating the workflow trace, I noticed that datanode issues 64KB synchronous 
read request to local file system to read the data, and sends the data back and 
waits for completion. I had a code walk over HDFS code to verify the pattern 
and it was correct. I want to have two suggestions, (1) since each file in HDFS 
block size is usually 128MB, We could use the mmap mapping via FileChannel 
class to load the file into memory and enable file system prefetching and look 
ahead in background, instead of synchronously reading from disk. The second 
suggestion is to use asynchronous read operations to local disk of the 
datanode. I was wondering if there is a logic behind synchronous reads from the 
file system?

 

Code: 
$HADOOP_SRC/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java
 line 586
{code:java}
  /**
   * Sends a packet with up to maxChunks chunks of data.
   * 
   * @param pkt buffer used for writing packet data
   * @param maxChunks maximum number of chunks to send
   * @param out stream to send data to
   * @param transferTo use transferTo to send data
   * @param throttler used for throttling data transfer bandwidth
   */
  private int sendPacket(ByteBuffer pkt, int maxChunks, OutputStream out,
  boolean transferTo, DataTransferThrottler throttler) throws IOException {
int dataLen = (int) Math.min(endOffset - offset,
 (chunkSize * (long) maxChunks));

int numChunks = numberOfChunks(dataLen); // Number of chunks be sent in the 
packet
int checksumDataLen = numChunks * checksumSize;
int packetLen = dataLen + checksumDataLen + 4;
boolean lastDataPacket = offset + dataLen == endOffset && dataLen > 0;
// The packet buffer is organized as follows:
// ___D?D?D?D?
//^   ^
//|   \ checksumOff
//\ headerOff
// _ padding, since the header is variable-length
// H = header and length prefixes
// C = checksums
// D? = data, if transferTo is false.

int headerLen = writePacketHeader(pkt, dataLen, packetLen);

// Per above, the header doesn't start at the beginning of the
// buffer
int headerOff = pkt.position() - headerLen;

int checksumOff = pkt.position();
byte[] buf = pkt.array();

if (checksumSize > 0 && checksumIn != null) {
  readChecksum(buf, checksumOff, checksumDataLen);  // write in 
progress that we need to use to get last checksum
  if (lastDataPacket && lastChunkChecksum != null) {
int start = checksumOff + checksumDataLen - checksumSize;
byte[] updatedChecksum = lastChunkChecksum.getChecksum();

if (updatedChecksum != null) {
  System.arraycopy(updatedChecksum, 0, buf, start, checksumSize);
}
  }
}

int dataOff = checksumOff + checksumDataLen;
if (!transferTo) { // normal transfer
  IOUtils.readFully(blockIn, buf, dataOff, dataLen);  
  if (verifyChecksum) {
verifyChecksum(buf, dataOff, dataLen, numChunks, checksumOff);
  }
}

try {
  if (transferTo) {
SocketOutputStream sockOut = (SocketOutputStream)out;
// First write header and checksums
sockOut.write(buf, headerOff, dataOff - headerOff);

// no need to flush since we know out is not a buffered stream
FileChannel fileCh = ((FileInputStream)blockIn).getChannel();
LongWritable waitTime = new LongWritable();
LongWritable transferTime = new LongWritable();
sockOut.transferToFully(fileCh, blockInPosition, dataLen, 
waitTime, transferTime);
datanode.metrics.addSendDataPacketBlockedOnNetworkNanos(waitTime.get());
datanode.metrics.addSendDataPacketTransferNanos(transferTime.get());
blockInPosition += dataLen;
  } else {
// normal transfer
out.write(buf, headerOff, dataOff + dataLen - headerOff);
  }
} catch (IOException e) {
  if (e instanceof SocketTimeoutException) {
/*
 * writing to client timed out.  This happens if the client reads
 * part of a block and then decides not to read the rest (but leaves
 * the socket open).
 * 
 * Reporting of this case is done in DataXceiver#run
 */
  } else {
/* Exception while writing to the client. Connection closure from
 * the other end is mostly the case and we do not care much about
 * it. But other things can go wrong, especially in transferTo(),
 * which we do not 

[jira] [Updated] (HDFS-15206) HDFS synchronous reads from local file system

2020-03-03 Thread Mania Abdi (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mania Abdi updated HDFS-15206:
--
Description: 
Hello everyone,

I ran a simple benchmark with runs ``` hadoop fs -get /file1.txt ```, and 
file1.txt has 1MB size and I capture the workflow of requests using XTrace. By 
evaluating the workflow trace, I noticed that datanode issues 64KB synchronous 
read request to local file system to read the data, and sends the data back and 
waits for completion. I had a code walk over HDFS code to verify the pattern 
and it was correct. I want to have two suggestions, (1) since each file in HDFS 
block size is usually 128MB, We could use the mmap mapping via FileChannel 
class to load the file into memory and enable file system prefetching and look 
ahead in background, instead of synchronously reading from disk. The second 
suggestion is to use asynchronous read operations to local disk of the 
datanode. I was wondering if there is a logic behind synchronous reads from the 
file system?

 

Code: 
$HADOOP_SRC/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockSender.java
 line 586
{code:java}
  /**
   * Sends a packet with up to maxChunks chunks of data.
   * 
   * @param pkt buffer used for writing packet data
   * @param maxChunks maximum number of chunks to send
   * @param out stream to send data to
   * @param transferTo use transferTo to send data
   * @param throttler used for throttling data transfer bandwidth
   */
  private int sendPacket(ByteBuffer pkt, int maxChunks, OutputStream out,
  boolean transferTo, DataTransferThrottler throttler) throws IOException {
int dataLen = (int) Math.min(endOffset - offset,
 (chunkSize * (long) maxChunks));

int numChunks = numberOfChunks(dataLen); // Number of chunks be sent in the 
packet
int checksumDataLen = numChunks * checksumSize;
int packetLen = dataLen + checksumDataLen + 4;
boolean lastDataPacket = offset + dataLen == endOffset && dataLen > 0;
// The packet buffer is organized as follows:
// ___D?D?D?D?
//^   ^
//|   \ checksumOff
//\ headerOff
// _ padding, since the header is variable-length
// H = header and length prefixes
// C = checksums
// D? = data, if transferTo is false.

int headerLen = writePacketHeader(pkt, dataLen, packetLen);

// Per above, the header doesn't start at the beginning of the
// buffer
int headerOff = pkt.position() - headerLen;

int checksumOff = pkt.position();
byte[] buf = pkt.array();

if (checksumSize > 0 && checksumIn != null) {
  readChecksum(buf, checksumOff, checksumDataLen);  // write in 
progress that we need to use to get last checksum
  if (lastDataPacket && lastChunkChecksum != null) {
int start = checksumOff + checksumDataLen - checksumSize;
byte[] updatedChecksum = lastChunkChecksum.getChecksum();

if (updatedChecksum != null) {
  System.arraycopy(updatedChecksum, 0, buf, start, checksumSize);
}
  }
}

int dataOff = checksumOff + checksumDataLen;
if (!transferTo) { // normal transfer
  IOUtils.readFully(blockIn, buf, dataOff, dataLen);  
  if (verifyChecksum) {
verifyChecksum(buf, dataOff, dataLen, numChunks, checksumOff);
  }
}

try {
  if (transferTo) {
SocketOutputStream sockOut = (SocketOutputStream)out;
// First write header and checksums
sockOut.write(buf, headerOff, dataOff - headerOff);

// no need to flush since we know out is not a buffered stream
FileChannel fileCh = ((FileInputStream)blockIn).getChannel();
LongWritable waitTime = new LongWritable();
LongWritable transferTime = new LongWritable();
sockOut.transferToFully(fileCh, blockInPosition, dataLen, 
waitTime, transferTime);
datanode.metrics.addSendDataPacketBlockedOnNetworkNanos(waitTime.get());
datanode.metrics.addSendDataPacketTransferNanos(transferTime.get());
blockInPosition += dataLen;
  } else {
// normal transfer
out.write(buf, headerOff, dataOff + dataLen - headerOff);
  }
} catch (IOException e) {
  if (e instanceof SocketTimeoutException) {
/*
 * writing to client timed out.  This happens if the client reads
 * part of a block and then decides not to read the rest (but leaves
 * the socket open).
 * 
 * Reporting of this case is done in DataXceiver#run
 */
  } else {
/* Exception while writing to the client. Connection closure from
 * the other end is mostly the case and we do not care much about
 * it. But other things can go wrong, especially in transferTo(),
 * which we do not 

[jira] [Created] (HDFS-15206) HDFS synchronous reads from local file system

2020-03-03 Thread Mania Abdi (Jira)
Mania Abdi created HDFS-15206:
-

 Summary: HDFS synchronous reads from local file system
 Key: HDFS-15206
 URL: https://issues.apache.org/jira/browse/HDFS-15206
 Project: Hadoop HDFS
  Issue Type: Improvement
 Environment: !Screenshot from 2020-03-03 22-07-26.png!
Reporter: Mania Abdi
 Attachments: Screenshot from 2020-03-03 22-07-26.png

Hello everyone,

 I ran a simple benchmark with runs ``` hadoop fs -get /file1.txt ```, and 
file1.txt has 1MB size and I capture the workflow of requests using XTrace. By 
evaluating the workflow trace, I noticed that datanode issues 64KB synchronous 
read request to local file system to read the data, and sends the data back and 
waits for completion. I had a code walk over HDFS code to verify the pattern 
and it was correct. I want to have two suggestions, (1) since each file in HDFS 
block size is usually 128MB, We could use the mmap mapping via FileChannel 
class to load the file into memory and enable file system prefetching and look 
ahead in background, instead of synchronously reading from disk. The second 
suggestion is to use asynchronous read operations to local disk of the 
datanode. I was wondering if there is a logic behind synchronous reads from the 
file system?

 

Code: 

 

 

 

 

XTrace: [http://brownsys.github.io/tracing-framework/xtrace/server/]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org