[jira] [Commented] (ACCUMULO-2353) Test improvments to java.io.InputStream.seek() for possible Hadoop patch

2016-10-20 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15592238#comment-15592238
 ] 

Josh Elser commented on ACCUMULO-2353:
--

IMO, close this and, if someone wants to follow through with this in Hadoop or 
some JVM vendor, fantastic.

> Test improvments to java.io.InputStream.seek() for possible Hadoop patch
> 
>
> Key: ACCUMULO-2353
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2353
> Project: Accumulo
>  Issue Type: Task
> Environment: Java 6 update 45 or later
> Hadoop 2.2.0
>Reporter: Dave Marion
>Priority: Minor
>
> At some point (early Java 7 I think, then backported to around Java 6 Update 
> 45), the java.io.InputStream.seek() method was changed from reading byte[512] 
> to byte[2048]. The difference can be seen in DeflaterInputStream, which has 
> not been updated:
> {noformat}
> public long skip(long n) throws IOException {
> if (n < 0) {
> throw new IllegalArgumentException("negative skip length");
> }
> ensureOpen();
> // Skip bytes by repeatedly decompressing small blocks
> if (rbuf.length < 512)
> rbuf = new byte[512];
> int total = (int)Math.min(n, Integer.MAX_VALUE);
> long cnt = 0;
> while (total > 0) {
> // Read a small block of uncompressed bytes
> int len = read(rbuf, 0, (total <= rbuf.length ? total : 
> rbuf.length));
> if (len < 0) {
> break;
> }
> cnt += len;
> total -= len;
> }
> return cnt;
> }
> {noformat}
> and java.io.InputStream in Java 6 Update 45:
> {noformat}
> // MAX_SKIP_BUFFER_SIZE is used to determine the maximum buffer skip to
> // use when skipping.
> private static final int MAX_SKIP_BUFFER_SIZE = 2048;
> public long skip(long n) throws IOException {
>   long remaining = n;
>   int nr;
>   if (n <= 0) {
>   return 0;
>   }
>   
>   int size = (int)Math.min(MAX_SKIP_BUFFER_SIZE, remaining);
>   byte[] skipBuffer = new byte[size];
>   while (remaining > 0) {
>   nr = read(skipBuffer, 0, (int)Math.min(size, remaining));
>   
>   if (nr < 0) {
>   break;
>   }
>   remaining -= nr;
>   }
>   
>   return n - remaining;
> }
> {noformat}
> In sample tests I saw about a 20% improvement in skip() when seeking towards 
> the end of a locally cached compressed file. Looking at the 
> DecompressorStream in HDFS, the seek method is a near copy of the old 
> InputStream method:
> {noformat}
>   private byte[] skipBytes = new byte[512];
>   @Override
>   public long skip(long n) throws IOException {
> // Sanity checks
> if (n < 0) {
>   throw new IllegalArgumentException("negative skip length");
> }
> checkStream();
> 
> // Read 'n' bytes
> int skipped = 0;
> while (skipped < n) {
>   int len = Math.min(((int)n - skipped), skipBytes.length);
>   len = read(skipBytes, 0, len);
>   if (len == -1) {
> eof = true;
> break;
>   }
>   skipped += len;
> }
> return skipped;
>   }
> {noformat}
> This task is to evaluate the changes to DecompressorStream with a possible 
> patch to HDFS and possible bug request to Oracle to port the InputStream.seek 
> changes to DeflaterInputStream.seek



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-2353) Test improvments to java.io.InputStream.seek() for possible Hadoop patch

2016-10-18 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586521#comment-15586521
 ] 

Dave Marion commented on ACCUMULO-2353:
---

Looks like this was fixed in Hadoop 2.8. What's the disposition for this ticket?

> Test improvments to java.io.InputStream.seek() for possible Hadoop patch
> 
>
> Key: ACCUMULO-2353
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2353
> Project: Accumulo
>  Issue Type: Task
> Environment: Java 6 update 45 or later
> Hadoop 2.2.0
>Reporter: Dave Marion
>Priority: Minor
>
> At some point (early Java 7 I think, then backported to around Java 6 Update 
> 45), the java.io.InputStream.seek() method was changed from reading byte[512] 
> to byte[2048]. The difference can be seen in DeflaterInputStream, which has 
> not been updated:
> {noformat}
> public long skip(long n) throws IOException {
> if (n < 0) {
> throw new IllegalArgumentException("negative skip length");
> }
> ensureOpen();
> // Skip bytes by repeatedly decompressing small blocks
> if (rbuf.length < 512)
> rbuf = new byte[512];
> int total = (int)Math.min(n, Integer.MAX_VALUE);
> long cnt = 0;
> while (total > 0) {
> // Read a small block of uncompressed bytes
> int len = read(rbuf, 0, (total <= rbuf.length ? total : 
> rbuf.length));
> if (len < 0) {
> break;
> }
> cnt += len;
> total -= len;
> }
> return cnt;
> }
> {noformat}
> and java.io.InputStream in Java 6 Update 45:
> {noformat}
> // MAX_SKIP_BUFFER_SIZE is used to determine the maximum buffer skip to
> // use when skipping.
> private static final int MAX_SKIP_BUFFER_SIZE = 2048;
> public long skip(long n) throws IOException {
>   long remaining = n;
>   int nr;
>   if (n <= 0) {
>   return 0;
>   }
>   
>   int size = (int)Math.min(MAX_SKIP_BUFFER_SIZE, remaining);
>   byte[] skipBuffer = new byte[size];
>   while (remaining > 0) {
>   nr = read(skipBuffer, 0, (int)Math.min(size, remaining));
>   
>   if (nr < 0) {
>   break;
>   }
>   remaining -= nr;
>   }
>   
>   return n - remaining;
> }
> {noformat}
> In sample tests I saw about a 20% improvement in skip() when seeking towards 
> the end of a locally cached compressed file. Looking at the 
> DecompressorStream in HDFS, the seek method is a near copy of the old 
> InputStream method:
> {noformat}
>   private byte[] skipBytes = new byte[512];
>   @Override
>   public long skip(long n) throws IOException {
> // Sanity checks
> if (n < 0) {
>   throw new IllegalArgumentException("negative skip length");
> }
> checkStream();
> 
> // Read 'n' bytes
> int skipped = 0;
> while (skipped < n) {
>   int len = Math.min(((int)n - skipped), skipBytes.length);
>   len = read(skipBytes, 0, len);
>   if (len == -1) {
> eof = true;
> break;
>   }
>   skipped += len;
> }
> return skipped;
>   }
> {noformat}
> This task is to evaluate the changes to DecompressorStream with a possible 
> patch to HDFS and possible bug request to Oracle to port the InputStream.seek 
> changes to DeflaterInputStream.seek



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-2353) Test improvments to java.io.InputStream.seek() for possible Hadoop patch

2014-02-11 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898651#comment-13898651
 ] 

Josh Elser commented on ACCUMULO-2353:
--

Why file the ticket here and not in Hadoop-Common, [~dlmarion]?

 Test improvments to java.io.InputStream.seek() for possible Hadoop patch
 

 Key: ACCUMULO-2353
 URL: https://issues.apache.org/jira/browse/ACCUMULO-2353
 Project: Accumulo
  Issue Type: Task
 Environment: Java 6 update 45 or later
 Hadoop 2.2.0
Reporter: Dave Marion
Priority: Minor

 At some point (early Java 7 I think, then backported to around Java 6 Update 
 45), the java.io.InputStream.seek() method was changed from reading byte[512] 
 to byte[2048]. The difference can be seen in DeflaterInputStream, which has 
 not been updated:
 {noformat}
 public long skip(long n) throws IOException {
 if (n  0) {
 throw new IllegalArgumentException(negative skip length);
 }
 ensureOpen();
 // Skip bytes by repeatedly decompressing small blocks
 if (rbuf.length  512)
 rbuf = new byte[512];
 int total = (int)Math.min(n, Integer.MAX_VALUE);
 long cnt = 0;
 while (total  0) {
 // Read a small block of uncompressed bytes
 int len = read(rbuf, 0, (total = rbuf.length ? total : 
 rbuf.length));
 if (len  0) {
 break;
 }
 cnt += len;
 total -= len;
 }
 return cnt;
 }
 {noformat}
 and java.io.InputStream in Java 6 Update 45:
 {noformat}
 // MAX_SKIP_BUFFER_SIZE is used to determine the maximum buffer skip to
 // use when skipping.
 private static final int MAX_SKIP_BUFFER_SIZE = 2048;
 public long skip(long n) throws IOException {
   long remaining = n;
   int nr;
   if (n = 0) {
   return 0;
   }
   
   int size = (int)Math.min(MAX_SKIP_BUFFER_SIZE, remaining);
   byte[] skipBuffer = new byte[size];
   while (remaining  0) {
   nr = read(skipBuffer, 0, (int)Math.min(size, remaining));
   
   if (nr  0) {
   break;
   }
   remaining -= nr;
   }
   
   return n - remaining;
 }
 {noformat}
 In sample tests I saw about a 20% improvement in skip() when seeking towards 
 the end of a locally cached compressed file. Looking at the 
 DecompressorStream in HDFS, the seek method is a near copy of the old 
 InputStream method:
 {noformat}
   private byte[] skipBytes = new byte[512];
   @Override
   public long skip(long n) throws IOException {
 // Sanity checks
 if (n  0) {
   throw new IllegalArgumentException(negative skip length);
 }
 checkStream();
 
 // Read 'n' bytes
 int skipped = 0;
 while (skipped  n) {
   int len = Math.min(((int)n - skipped), skipBytes.length);
   len = read(skipBytes, 0, len);
   if (len == -1) {
 eof = true;
 break;
   }
   skipped += len;
 }
 return skipped;
   }
 {noformat}
 This task is to evaluate the changes to DecompressorStream with a possible 
 patch to HDFS and possible bug request to Oracle to port the InputStream.seek 
 changes to DeflaterInputStream.seek



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (ACCUMULO-2353) Test improvments to java.io.InputStream.seek() for possible Hadoop patch

2014-02-11 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898724#comment-13898724
 ] 

Dave Marion commented on ACCUMULO-2353:
---

The rationale was to capture the issue and do some testing before cluttering up 
the other ticket systems.

 Test improvments to java.io.InputStream.seek() for possible Hadoop patch
 

 Key: ACCUMULO-2353
 URL: https://issues.apache.org/jira/browse/ACCUMULO-2353
 Project: Accumulo
  Issue Type: Task
 Environment: Java 6 update 45 or later
 Hadoop 2.2.0
Reporter: Dave Marion
Priority: Minor

 At some point (early Java 7 I think, then backported to around Java 6 Update 
 45), the java.io.InputStream.seek() method was changed from reading byte[512] 
 to byte[2048]. The difference can be seen in DeflaterInputStream, which has 
 not been updated:
 {noformat}
 public long skip(long n) throws IOException {
 if (n  0) {
 throw new IllegalArgumentException(negative skip length);
 }
 ensureOpen();
 // Skip bytes by repeatedly decompressing small blocks
 if (rbuf.length  512)
 rbuf = new byte[512];
 int total = (int)Math.min(n, Integer.MAX_VALUE);
 long cnt = 0;
 while (total  0) {
 // Read a small block of uncompressed bytes
 int len = read(rbuf, 0, (total = rbuf.length ? total : 
 rbuf.length));
 if (len  0) {
 break;
 }
 cnt += len;
 total -= len;
 }
 return cnt;
 }
 {noformat}
 and java.io.InputStream in Java 6 Update 45:
 {noformat}
 // MAX_SKIP_BUFFER_SIZE is used to determine the maximum buffer skip to
 // use when skipping.
 private static final int MAX_SKIP_BUFFER_SIZE = 2048;
 public long skip(long n) throws IOException {
   long remaining = n;
   int nr;
   if (n = 0) {
   return 0;
   }
   
   int size = (int)Math.min(MAX_SKIP_BUFFER_SIZE, remaining);
   byte[] skipBuffer = new byte[size];
   while (remaining  0) {
   nr = read(skipBuffer, 0, (int)Math.min(size, remaining));
   
   if (nr  0) {
   break;
   }
   remaining -= nr;
   }
   
   return n - remaining;
 }
 {noformat}
 In sample tests I saw about a 20% improvement in skip() when seeking towards 
 the end of a locally cached compressed file. Looking at the 
 DecompressorStream in HDFS, the seek method is a near copy of the old 
 InputStream method:
 {noformat}
   private byte[] skipBytes = new byte[512];
   @Override
   public long skip(long n) throws IOException {
 // Sanity checks
 if (n  0) {
   throw new IllegalArgumentException(negative skip length);
 }
 checkStream();
 
 // Read 'n' bytes
 int skipped = 0;
 while (skipped  n) {
   int len = Math.min(((int)n - skipped), skipBytes.length);
   len = read(skipBytes, 0, len);
   if (len == -1) {
 eof = true;
 break;
   }
   skipped += len;
 }
 return skipped;
   }
 {noformat}
 This task is to evaluate the changes to DecompressorStream with a possible 
 patch to HDFS and possible bug request to Oracle to port the InputStream.seek 
 changes to DeflaterInputStream.seek



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)