[jira] [Commented] (HDFS-6596) Improve InputStream when read spans two blocks

2015-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14524795#comment-14524795
 ] 

Hadoop QA commented on HDFS-6596:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12653150/HDFS-6596.3.patch |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / f1a152c |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/10608/console |


This message was automatically generated.

 Improve InputStream when read spans two blocks
 --

 Key: HDFS-6596
 URL: https://issues.apache.org/jira/browse/HDFS-6596
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu
 Attachments: HDFS-6596.1.patch, HDFS-6596.2.patch, HDFS-6596.2.patch, 
 HDFS-6596.2.patch, HDFS-6596.3.patch, HDFS-6596.3.patch


 In the current implementation of DFSInputStream, read(buffer, offset, length) 
 is implemented as following:
 {code}
 int realLen = (int) Math.min(len, (blockEnd - pos + 1L));
 if (locatedBlocks.isLastBlockComplete()) {
   realLen = (int) Math.min(realLen, locatedBlocks.getFileLength());
 }
 int result = readBuffer(strategy, off, realLen, corruptedBlockMap);
 {code}
 From the above code, we can conclude that the read will return at most 
 (blockEnd - pos + 1) bytes. As a result, when read spans two blocks, the 
 caller must call read() second time to complete the request, and must wait 
 second time to acquire the DFSInputStream lock(read() is synchronized for 
 DFSInputStream). For latency sensitive applications, such as hbase, this will 
 result in latency pain point when they under massive race conditions. So here 
 we propose that we should loop internally in read() to do best effort read.
 In the current implementation of pread(read(position, buffer, offset, 
 lenght)), it does loop internally to do best effort read. So we can refactor 
 to support this on normal read.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-6596) Improve InputStream when read spans two blocks

2014-06-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047472#comment-14047472
 ] 

Hadoop QA commented on HDFS-6596:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653103/HDFS-6596.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.
See 
https://builds.apache.org/job/PreCommit-HDFS-Build/7245//artifact/trunk/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7245//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7245//console

This message is automatically generated.

 Improve InputStream when read spans two blocks
 --

 Key: HDFS-6596
 URL: https://issues.apache.org/jira/browse/HDFS-6596
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu
 Attachments: HDFS-6596.1.patch, HDFS-6596.2.patch, HDFS-6596.2.patch, 
 HDFS-6596.2.patch


 In the current implementation of DFSInputStream, read(buffer, offset, length) 
 is implemented as following:
 {code}
 int realLen = (int) Math.min(len, (blockEnd - pos + 1L));
 if (locatedBlocks.isLastBlockComplete()) {
   realLen = (int) Math.min(realLen, locatedBlocks.getFileLength());
 }
 int result = readBuffer(strategy, off, realLen, corruptedBlockMap);
 {code}
 From the above code, we can conclude that the read will return at most 
 (blockEnd - pos + 1) bytes. As a result, when read spans two blocks, the 
 caller must call read() second time to complete the request, and must wait 
 second time to acquire the DFSInputStream lock(read() is synchronized for 
 DFSInputStream). For latency sensitive applications, such as hbase, this will 
 result in latency pain point when they under massive race conditions. So here 
 we propose that we should loop internally in read() to do best effort read.
 In the current implementation of pread(read(position, buffer, offset, 
 lenght)), it does loop internally to do best effort read. So we can refactor 
 to support this on normal read.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6596) Improve InputStream when read spans two blocks

2014-06-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047617#comment-14047617
 ] 

Hadoop QA commented on HDFS-6596:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653122/HDFS-6596.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.namenode.TestCheckpoint

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7247//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7247//console

This message is automatically generated.

 Improve InputStream when read spans two blocks
 --

 Key: HDFS-6596
 URL: https://issues.apache.org/jira/browse/HDFS-6596
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu
 Attachments: HDFS-6596.1.patch, HDFS-6596.2.patch, HDFS-6596.2.patch, 
 HDFS-6596.2.patch, HDFS-6596.3.patch


 In the current implementation of DFSInputStream, read(buffer, offset, length) 
 is implemented as following:
 {code}
 int realLen = (int) Math.min(len, (blockEnd - pos + 1L));
 if (locatedBlocks.isLastBlockComplete()) {
   realLen = (int) Math.min(realLen, locatedBlocks.getFileLength());
 }
 int result = readBuffer(strategy, off, realLen, corruptedBlockMap);
 {code}
 From the above code, we can conclude that the read will return at most 
 (blockEnd - pos + 1) bytes. As a result, when read spans two blocks, the 
 caller must call read() second time to complete the request, and must wait 
 second time to acquire the DFSInputStream lock(read() is synchronized for 
 DFSInputStream). For latency sensitive applications, such as hbase, this will 
 result in latency pain point when they under massive race conditions. So here 
 we propose that we should loop internally in read() to do best effort read.
 In the current implementation of pread(read(position, buffer, offset, 
 lenght)), it does loop internally to do best effort read. So we can refactor 
 to support this on normal read.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6596) Improve InputStream when read spans two blocks

2014-06-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047779#comment-14047779
 ] 

Hadoop QA commented on HDFS-6596:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653150/HDFS-6596.3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7252//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7252//console

This message is automatically generated.

 Improve InputStream when read spans two blocks
 --

 Key: HDFS-6596
 URL: https://issues.apache.org/jira/browse/HDFS-6596
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu
 Attachments: HDFS-6596.1.patch, HDFS-6596.2.patch, HDFS-6596.2.patch, 
 HDFS-6596.2.patch, HDFS-6596.3.patch, HDFS-6596.3.patch


 In the current implementation of DFSInputStream, read(buffer, offset, length) 
 is implemented as following:
 {code}
 int realLen = (int) Math.min(len, (blockEnd - pos + 1L));
 if (locatedBlocks.isLastBlockComplete()) {
   realLen = (int) Math.min(realLen, locatedBlocks.getFileLength());
 }
 int result = readBuffer(strategy, off, realLen, corruptedBlockMap);
 {code}
 From the above code, we can conclude that the read will return at most 
 (blockEnd - pos + 1) bytes. As a result, when read spans two blocks, the 
 caller must call read() second time to complete the request, and must wait 
 second time to acquire the DFSInputStream lock(read() is synchronized for 
 DFSInputStream). For latency sensitive applications, such as hbase, this will 
 result in latency pain point when they under massive race conditions. So here 
 we propose that we should loop internally in read() to do best effort read.
 In the current implementation of pread(read(position, buffer, offset, 
 lenght)), it does loop internally to do best effort read. So we can refactor 
 to support this on normal read.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6596) Improve InputStream when read spans two blocks

2014-06-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047382#comment-14047382
 ] 

Hadoop QA commented on HDFS-6596:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12653082/HDFS-6596.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.
See 
https://builds.apache.org/job/PreCommit-HDFS-Build/7243//artifact/trunk/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.ha.TestZKFailoverControllerStress

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7243//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7243//console

This message is automatically generated.

 Improve InputStream when read spans two blocks
 --

 Key: HDFS-6596
 URL: https://issues.apache.org/jira/browse/HDFS-6596
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu
 Attachments: HDFS-6596.1.patch, HDFS-6596.2.patch, HDFS-6596.2.patch


 In the current implementation of DFSInputStream, read(buffer, offset, length) 
 is implemented as following:
 {code}
 int realLen = (int) Math.min(len, (blockEnd - pos + 1L));
 if (locatedBlocks.isLastBlockComplete()) {
   realLen = (int) Math.min(realLen, locatedBlocks.getFileLength());
 }
 int result = readBuffer(strategy, off, realLen, corruptedBlockMap);
 {code}
 From the above code, we can conclude that the read will return at most 
 (blockEnd - pos + 1) bytes. As a result, when read spans two blocks, the 
 caller must call read() second time to complete the request, and must wait 
 second time to acquire the DFSInputStream lock(read() is synchronized for 
 DFSInputStream). For latency sensitive applications, such as hbase, this will 
 result in latency pain point when they under massive race conditions. So here 
 we propose that we should loop internally in read() to do best effort read.
 In the current implementation of pread(read(position, buffer, offset, 
 lenght)), it does loop internally to do best effort read. So we can refactor 
 to support this on normal read.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6596) Improve InputStream when read spans two blocks

2014-06-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14045789#comment-14045789
 ] 

Hadoop QA commented on HDFS-6596:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652771/HDFS-6596.2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.
See 
https://builds.apache.org/job/PreCommit-HDFS-Build/7238//artifact/trunk/patchprocess/diffJavadocWarnings.txt
 for details.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7238//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7238//console

This message is automatically generated.

 Improve InputStream when read spans two blocks
 --

 Key: HDFS-6596
 URL: https://issues.apache.org/jira/browse/HDFS-6596
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu
 Attachments: HDFS-6596.1.patch, HDFS-6596.2.patch


 In the current implementation of DFSInputStream, read(buffer, offset, length) 
 is implemented as following:
 {code}
 int realLen = (int) Math.min(len, (blockEnd - pos + 1L));
 if (locatedBlocks.isLastBlockComplete()) {
   realLen = (int) Math.min(realLen, locatedBlocks.getFileLength());
 }
 int result = readBuffer(strategy, off, realLen, corruptedBlockMap);
 {code}
 From the above code, we can conclude that the read will return at most 
 (blockEnd - pos + 1) bytes. As a result, when read spans two blocks, the 
 caller must call read() second time to complete the request, and must wait 
 second time to acquire the DFSInputStream lock(read() is synchronized for 
 DFSInputStream). For latency sensitive applications, such as hbase, this will 
 result in latency pain point when they under massive race conditions. So here 
 we propose that we should loop internally in read() to do best effort read.
 In the current implementation of pread(read(position, buffer, offset, 
 lenght)), it does loop internally to do best effort read. So we can refactor 
 to support this on normal read.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6596) Improve InputStream when read spans two blocks

2014-06-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14043545#comment-14043545
 ] 

Hadoop QA commented on HDFS-6596:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12652382/HDFS-6596.1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/7233//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/7233//console

This message is automatically generated.

 Improve InputStream when read spans two blocks
 --

 Key: HDFS-6596
 URL: https://issues.apache.org/jira/browse/HDFS-6596
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu
 Attachments: HDFS-6596.1.patch


 In the current implementation of DFSInputStream, read(buffer, offset, length) 
 is implemented as following:
 {code}
 int realLen = (int) Math.min(len, (blockEnd - pos + 1L));
 if (locatedBlocks.isLastBlockComplete()) {
   realLen = (int) Math.min(realLen, locatedBlocks.getFileLength());
 }
 int result = readBuffer(strategy, off, realLen, corruptedBlockMap);
 {code}
 From the above code, we can conclude that the read will return at most 
 (blockEnd - pos + 1) bytes. As a result, when read spans two blocks, the 
 caller must call read() second time to complete the request, and must wait 
 second time to acquire the DFSInputStream lock(read() is synchronized for 
 DFSInputStream). For latency sensitive applications, such as hbase, this will 
 result in latency pain point when they under massive race conditions. So here 
 we propose that we should loop internally in read() to do best effort read.
 In the current implementation of pread(read(position, buffer, offset, 
 lenght)), it does loop internally to do best effort read. So we can refactor 
 to support this on normal read.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6596) Improve InputStream when read spans two blocks

2014-06-24 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042380#comment-14042380
 ] 

Colin Patrick McCabe commented on HDFS-6596:


What you are proposing is basically making every {{read}} into a {{readFully}}. 
 I don't think we want to increase the number of differences between how 
DFSInputStream works and how normal Java input streams work.  The normal 
java behavior also has a good reason behind it... clients who can deal with 
partial reads will get a faster response time if the stream just returns what 
it can rather than waiting for everything.  In the case of HDFS, waiting for 
everything might mean connecting to a remote DataNode.  This could be quite a 
lot of latency.

bq. From the above code, we can conclude that the read will return at most 
(blockEnd - pos + 1) bytes. As a result, when read spans two blocks, the caller 
must call read() second time to complete the request, and must wait second time 
to acquire the DFSInputStream lock(read() is synchronized for DFSInputStream). 
For latency sensitive applications, such as hbase, this will result in latency 
pain point when they under massive race conditions. So here we propose that we 
should loop internally in read() to do best effort read.

The simplest solution here is just to have code like this in HBase:

{code}
synchronized (stream) {
buf = stream.readFully(XYZ)
}
doStuff(buf);
{code}

Since monitors are re-entrant in Java, no other thread can take the stream lock 
while we are in our synchronized block.

Another solution would be to modify {{DFSInputStream#readFully}} so that it 
holds the lock the whole time.  This is basically the same as the previous 
solution, but done in Hadoop rather than HBase.

 Improve InputStream when read spans two blocks
 --

 Key: HDFS-6596
 URL: https://issues.apache.org/jira/browse/HDFS-6596
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu

 In the current implementation of DFSInputStream, read(buffer, offset, length) 
 is implemented as following:
 {code}
 int realLen = (int) Math.min(len, (blockEnd - pos + 1L));
 if (locatedBlocks.isLastBlockComplete()) {
   realLen = (int) Math.min(realLen, locatedBlocks.getFileLength());
 }
 int result = readBuffer(strategy, off, realLen, corruptedBlockMap);
 {code}
 From the above code, we can conclude that the read will return at most 
 (blockEnd - pos + 1) bytes. As a result, when read spans two blocks, the 
 caller must call read() second time to complete the request, and must wait 
 second time to acquire the DFSInputStream lock(read() is synchronized for 
 DFSInputStream). For latency sensitive applications, such as hbase, this will 
 result in latency pain point when they under massive race conditions. So here 
 we propose that we should loop internally in read() to do best effort read.
 In the current implementation of pread(read(position, buffer, offset, 
 lenght)), it does loop internally to do best effort read. So we can refactor 
 to support this on normal read.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6596) Improve InputStream when read spans two blocks

2014-06-24 Thread Zesheng Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042986#comment-14042986
 ] 

Zesheng Wu commented on HDFS-6596:
--

Thanks Colin.
bq.  What you are proposing is basically making every {{read}} into a 
{{readFully}}. I don't think we want to increase the number of differences 
between how DFSInputStream works and how normal Java input streams work. The 
normal java behavior also has a good reason behind it... clients who can deal 
with partial reads will get a faster response time if the stream just returns 
what it can rather than waiting for everything. In the case of HDFS, waiting 
for everything might mean connecting to a remote DataNode. This could be quite 
a lot of latency.
I agree with you that we shouldn't make every {{read}} into a {{readFully}}, 
and the current implementation of {{read}} has its advantage as you described.

About the solution, I think that we do it in Hadoop will be better, because all 
users will be benefited.
The current {{readFully}} for DFSInputStream is implemented as pread and 
inherits from FSInputStream, so I will a new {{readFully(buffer, offset, 
length)}} to figure this out.  Any thoughts?

 Improve InputStream when read spans two blocks
 --

 Key: HDFS-6596
 URL: https://issues.apache.org/jira/browse/HDFS-6596
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu

 In the current implementation of DFSInputStream, read(buffer, offset, length) 
 is implemented as following:
 {code}
 int realLen = (int) Math.min(len, (blockEnd - pos + 1L));
 if (locatedBlocks.isLastBlockComplete()) {
   realLen = (int) Math.min(realLen, locatedBlocks.getFileLength());
 }
 int result = readBuffer(strategy, off, realLen, corruptedBlockMap);
 {code}
 From the above code, we can conclude that the read will return at most 
 (blockEnd - pos + 1) bytes. As a result, when read spans two blocks, the 
 caller must call read() second time to complete the request, and must wait 
 second time to acquire the DFSInputStream lock(read() is synchronized for 
 DFSInputStream). For latency sensitive applications, such as hbase, this will 
 result in latency pain point when they under massive race conditions. So here 
 we propose that we should loop internally in read() to do best effort read.
 In the current implementation of pread(read(position, buffer, offset, 
 lenght)), it does loop internally to do best effort read. So we can refactor 
 to support this on normal read.



--
This message was sent by Atlassian JIRA
(v6.2#6252)