[jira] [Commented] (ACCUMULO-2353) Test improvments to java.io.InputStream.seek() for possible Hadoop patch

2016-10-18 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586521#comment-15586521
 ] 

Dave Marion commented on ACCUMULO-2353:
---

Looks like this was fixed in Hadoop 2.8. What's the disposition for this ticket?

> Test improvments to java.io.InputStream.seek() for possible Hadoop patch
> 
>
> Key: ACCUMULO-2353
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2353
> Project: Accumulo
>  Issue Type: Task
> Environment: Java 6 update 45 or later
> Hadoop 2.2.0
>Reporter: Dave Marion
>Priority: Minor
>
> At some point (early Java 7 I think, then backported to around Java 6 Update 
> 45), the java.io.InputStream.seek() method was changed from reading byte[512] 
> to byte[2048]. The difference can be seen in DeflaterInputStream, which has 
> not been updated:
> {noformat}
> public long skip(long n) throws IOException {
> if (n < 0) {
> throw new IllegalArgumentException("negative skip length");
> }
> ensureOpen();
> // Skip bytes by repeatedly decompressing small blocks
> if (rbuf.length < 512)
> rbuf = new byte[512];
> int total = (int)Math.min(n, Integer.MAX_VALUE);
> long cnt = 0;
> while (total > 0) {
> // Read a small block of uncompressed bytes
> int len = read(rbuf, 0, (total <= rbuf.length ? total : 
> rbuf.length));
> if (len < 0) {
> break;
> }
> cnt += len;
> total -= len;
> }
> return cnt;
> }
> {noformat}
> and java.io.InputStream in Java 6 Update 45:
> {noformat}
> // MAX_SKIP_BUFFER_SIZE is used to determine the maximum buffer skip to
> // use when skipping.
> private static final int MAX_SKIP_BUFFER_SIZE = 2048;
> public long skip(long n) throws IOException {
>   long remaining = n;
>   int nr;
>   if (n <= 0) {
>   return 0;
>   }
>   
>   int size = (int)Math.min(MAX_SKIP_BUFFER_SIZE, remaining);
>   byte[] skipBuffer = new byte[size];
>   while (remaining > 0) {
>   nr = read(skipBuffer, 0, (int)Math.min(size, remaining));
>   
>   if (nr < 0) {
>   break;
>   }
>   remaining -= nr;
>   }
>   
>   return n - remaining;
> }
> {noformat}
> In sample tests I saw about a 20% improvement in skip() when seeking towards 
> the end of a locally cached compressed file. Looking at the 
> DecompressorStream in HDFS, the seek method is a near copy of the old 
> InputStream method:
> {noformat}
>   private byte[] skipBytes = new byte[512];
>   @Override
>   public long skip(long n) throws IOException {
> // Sanity checks
> if (n < 0) {
>   throw new IllegalArgumentException("negative skip length");
> }
> checkStream();
> 
> // Read 'n' bytes
> int skipped = 0;
> while (skipped < n) {
>   int len = Math.min(((int)n - skipped), skipBytes.length);
>   len = read(skipBytes, 0, len);
>   if (len == -1) {
> eof = true;
> break;
>   }
>   skipped += len;
> }
> return skipped;
>   }
> {noformat}
> This task is to evaluate the changes to DecompressorStream with a possible 
> patch to HDFS and possible bug request to Oracle to port the InputStream.seek 
> changes to DeflaterInputStream.seek



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4469) ConcurrentModificationException while running MultiTable.xml node in Random Walk

2016-10-18 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586407#comment-15586407
 ] 

Josh Elser commented on ACCUMULO-4469:
--

Oh dear, my sincerest apologies for not asking the same question weeks ago, 
Dima. Your contributions are very appreciated!

> ConcurrentModificationException while running MultiTable.xml node in Random 
> Walk 
> -
>
> Key: ACCUMULO-4469
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4469
> Project: Accumulo
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.7.2
>Reporter: Dima Spivak
>Assignee: Dima Spivak
> Fix For: 1.7.3, 1.8.1, 2.0.0
>
> Attachments: ACCUMULO-4469_1.7_v1.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> After the resolution of ACCUMULO-4467, I got back to playing with Random Walk 
> and had a failure caused by a {{ConcurrentModificationException}}:
> {code}
> 23 01:03:04,316 [randomwalk.Framework] ERROR: Error during random walk
> java.lang.Exception: Error running node MultiTable.xml
> at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
> at 
> org.apache.accumulo.test.randomwalk.Framework.run(Framework.java:59)
> at 
> org.apache.accumulo.test.randomwalk.Framework.main(Framework.java:119)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.accumulo.start.Main$2.run(Main.java:157)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859)
> at java.util.ArrayList$Itr.next(ArrayList.java:831)
> at 
> org.apache.accumulo.test.randomwalk.multitable.MultiTableFixture.tearDown(MultiTableFixture.java:64)
> at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:365)
> at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:283)
> at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:278)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at 
> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
> ... 1 more
> {code}
> [This section of 
> code|https://github.com/apache/accumulo/blob/master/test/src/main/java/org/apache/accumulo/test/randomwalk/multitable/MultiTableFixture.java#L61-L71]
>  seems to be at fault. In particular, it looks like we're getting the table 
> list, but then instead of doing a deep copy to a new {{ArrayList}} 
> from which we choose tables to delete, we're looping through and deleting 
> tables while referring to the changing list, which has the effect of 
> modifying it and making Java unhappy. Am I missing something more complex or 
> can I fix this one myself by just doing the aforementioned deep copy of the 
> table list? Or is a better way to use the {{TableOperations.list()}} method 
> and iterate through the {{SortedSet}} it provides?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4469) ConcurrentModificationException while running MultiTable.xml node in Random Walk

2016-10-18 Thread Dima Spivak (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586377#comment-15586377
 ] 

Dima Spivak commented on ACCUMULO-4469:
---

Sure. I work Cloudera, Apache username is dimaspivak.

> ConcurrentModificationException while running MultiTable.xml node in Random 
> Walk 
> -
>
> Key: ACCUMULO-4469
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4469
> Project: Accumulo
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.7.2
>Reporter: Dima Spivak
>Assignee: Dima Spivak
> Fix For: 1.7.3, 1.8.1, 2.0.0
>
> Attachments: ACCUMULO-4469_1.7_v1.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> After the resolution of ACCUMULO-4467, I got back to playing with Random Walk 
> and had a failure caused by a {{ConcurrentModificationException}}:
> {code}
> 23 01:03:04,316 [randomwalk.Framework] ERROR: Error during random walk
> java.lang.Exception: Error running node MultiTable.xml
> at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
> at 
> org.apache.accumulo.test.randomwalk.Framework.run(Framework.java:59)
> at 
> org.apache.accumulo.test.randomwalk.Framework.main(Framework.java:119)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.accumulo.start.Main$2.run(Main.java:157)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859)
> at java.util.ArrayList$Itr.next(ArrayList.java:831)
> at 
> org.apache.accumulo.test.randomwalk.multitable.MultiTableFixture.tearDown(MultiTableFixture.java:64)
> at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:365)
> at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:283)
> at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:278)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at 
> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
> ... 1 more
> {code}
> [This section of 
> code|https://github.com/apache/accumulo/blob/master/test/src/main/java/org/apache/accumulo/test/randomwalk/multitable/MultiTableFixture.java#L61-L71]
>  seems to be at fault. In particular, it looks like we're getting the table 
> list, but then instead of doing a deep copy to a new {{ArrayList}} 
> from which we choose tables to delete, we're looping through and deleting 
> tables while referring to the changing list, which has the effect of 
> modifying it and making Java unhappy. Am I missing something more complex or 
> can I fix this one myself by just doing the aforementioned deep copy of the 
> table list? Or is a better way to use the {{TableOperations.list()}} method 
> and iterate through the {{SortedSet}} it provides?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4469) ConcurrentModificationException while running MultiTable.xml node in Random Walk

2016-10-18 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586370#comment-15586370
 ] 

Sean Busbey commented on ACCUMULO-4469:
---

qq [~dimaspivak], would you like to be listed on our [page of 
contributors|http://accumulo.apache.org/people]? If so, do you want your 
employer listed? Timezone?

> ConcurrentModificationException while running MultiTable.xml node in Random 
> Walk 
> -
>
> Key: ACCUMULO-4469
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4469
> Project: Accumulo
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.7.2
>Reporter: Dima Spivak
>Assignee: Dima Spivak
> Fix For: 1.7.3, 1.8.1, 2.0.0
>
> Attachments: ACCUMULO-4469_1.7_v1.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> After the resolution of ACCUMULO-4467, I got back to playing with Random Walk 
> and had a failure caused by a {{ConcurrentModificationException}}:
> {code}
> 23 01:03:04,316 [randomwalk.Framework] ERROR: Error during random walk
> java.lang.Exception: Error running node MultiTable.xml
> at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
> at 
> org.apache.accumulo.test.randomwalk.Framework.run(Framework.java:59)
> at 
> org.apache.accumulo.test.randomwalk.Framework.main(Framework.java:119)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.accumulo.start.Main$2.run(Main.java:157)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.ConcurrentModificationException
> at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859)
> at java.util.ArrayList$Itr.next(ArrayList.java:831)
> at 
> org.apache.accumulo.test.randomwalk.multitable.MultiTableFixture.tearDown(MultiTableFixture.java:64)
> at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:365)
> at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:283)
> at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:278)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at 
> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
> ... 1 more
> {code}
> [This section of 
> code|https://github.com/apache/accumulo/blob/master/test/src/main/java/org/apache/accumulo/test/randomwalk/multitable/MultiTableFixture.java#L61-L71]
>  seems to be at fault. In particular, it looks like we're getting the table 
> list, but then instead of doing a deep copy to a new {{ArrayList}} 
> from which we choose tables to delete, we're looping through and deleting 
> tables while referring to the changing list, which has the effect of 
> modifying it and making Java unhappy. Am I missing something more complex or 
> can I fix this one myself by just doing the aforementioned deep copy of the 
> table list? Or is a better way to use the {{TableOperations.list()}} method 
> and iterate through the {{SortedSet}} it provides?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (ACCUMULO-3891) Add link to user guide from monitor page

2016-10-18 Thread Christopher Tubbs (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher Tubbs resolved ACCUMULO-3891.
-
Resolution: Fixed

> Add link to user guide from monitor page
> 
>
> Key: ACCUMULO-3891
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3891
> Project: Accumulo
>  Issue Type: Improvement
>  Components: docs, monitor
>Reporter: Mike Drob
>Assignee: Luis Tavarez
>Priority: Minor
>  Labels: newbie
> Fix For: 2.0.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> It would be nice if there was a link directly to the user guide from the 
> monitor page. If necessary, we can also add an alternate URL property to 
> provide for local mirroring.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Accumulo-Pull-Requests - Build # 470 - Fixed

2016-10-18 Thread Apache Jenkins Server
The Apache Jenkins build system has built Accumulo-Pull-Requests (build #470)

Status: Fixed

Check console output at 
https://builds.apache.org/job/Accumulo-Pull-Requests/470/ to view the results.

[jira] [Commented] (ACCUMULO-4501) Add support to RFile to track and store the histogram

2016-10-18 Thread Keith Turner (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585370#comment-15585370
 ] 

Keith Turner commented on ACCUMULO-4501:


[~elserj] as promised on IRC, here is a write up.  This covers what [~rweeks] 
and I discussed at the Accumulo Summit Hackathon.

Users could configure, per table, an implementation of CompactionSummarizer.  

{code:java}
  interface Counters {
void increment(String counter, long amount);
void increment(ByteSequence counter, long amount);

   // I thought of use cases where I would want to append a prefix to the 
counter.  We could 
   // offer this as primitive so that each user does not have to figure out how 
to do this efficiently.  
   // Simple example of uses cases would be "fam:" and "vis:" prefixes for 
counting column 
   // families and visibility.
void increment(String prefix, ByteSequence counter, long amount);
  }
{code}

{code:java}
  interface CompactionSummarizer {
 void summarize(Key k, Value v, Counters counters);
  }
{code}

When a CompactionSummarizer is configured, Accumulo could do the following at 
compaction time.

 * Compute a histogram during compaction by calling CompactionSummarizer for 
each Key Value added to RFile
 * Limit the histogram to a max size
 * Store histogram in RFile
 * Store name of summarizer in RFile
 * Store if histogram exceeded max size in RFile
 
We could modify rfile-info to print this information when its present in an 
RFile.  We could also offer a use level API to fetch this information. The API 
could offer the following.

 * Require user to specify the name of the CompactionSummarizer they want 
histograms for.  This is so that RFiles containing histograms generated by a 
different CompactionSummarizer can be ignored.
 * Allow user to compute histogram for a row range.
 * Along with returned histogram, indicate if histograms were missing from 
RFiles or exceeded max size.

We discussed an implementation similar to the BatchScanner in that it would 
send request out to TabletServers to fetch info in parallel.  Histograms could 
be combined at the tablet, tablet server, and client.  Thinking about this a 
little more after the summit I realized this implementation may double count 
files that span multiple tablets.  Another possible implementation would be to 
gather the unique set of files in the range, and then farm out to the tablet 
servers aggregating the histograms.  This approach makes it hard to possibly 
cache the serialized histograms.  We also discussed if the in memory map should 
keep a histogram, but came to no conclusion on this.

> Add support to RFile to track and store the histogram
> -
>
> Key: ACCUMULO-4501
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4501
> Project: Accumulo
>  Issue Type: Sub-task
>  Components: client, tserver
>Reporter: Josh Elser
>Assignee: Josh Elser
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Modify RFile such that it can build the histogram and store it in an RFile.
> Reading the RFile would deserialize the histogram back into memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)