[jira] [Commented] (ACCUMULO-2353) Test improvments to java.io.InputStream.seek() for possible Hadoop patch
[ https://issues.apache.org/jira/browse/ACCUMULO-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586521#comment-15586521 ] Dave Marion commented on ACCUMULO-2353: --- Looks like this was fixed in Hadoop 2.8. What's the disposition for this ticket? > Test improvments to java.io.InputStream.seek() for possible Hadoop patch > > > Key: ACCUMULO-2353 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2353 > Project: Accumulo > Issue Type: Task > Environment: Java 6 update 45 or later > Hadoop 2.2.0 >Reporter: Dave Marion >Priority: Minor > > At some point (early Java 7 I think, then backported to around Java 6 Update > 45), the java.io.InputStream.seek() method was changed from reading byte[512] > to byte[2048]. The difference can be seen in DeflaterInputStream, which has > not been updated: > {noformat} > public long skip(long n) throws IOException { > if (n < 0) { > throw new IllegalArgumentException("negative skip length"); > } > ensureOpen(); > // Skip bytes by repeatedly decompressing small blocks > if (rbuf.length < 512) > rbuf = new byte[512]; > int total = (int)Math.min(n, Integer.MAX_VALUE); > long cnt = 0; > while (total > 0) { > // Read a small block of uncompressed bytes > int len = read(rbuf, 0, (total <= rbuf.length ? total : > rbuf.length)); > if (len < 0) { > break; > } > cnt += len; > total -= len; > } > return cnt; > } > {noformat} > and java.io.InputStream in Java 6 Update 45: > {noformat} > // MAX_SKIP_BUFFER_SIZE is used to determine the maximum buffer skip to > // use when skipping. > private static final int MAX_SKIP_BUFFER_SIZE = 2048; > public long skip(long n) throws IOException { > long remaining = n; > int nr; > if (n <= 0) { > return 0; > } > > int size = (int)Math.min(MAX_SKIP_BUFFER_SIZE, remaining); > byte[] skipBuffer = new byte[size]; > while (remaining > 0) { > nr = read(skipBuffer, 0, (int)Math.min(size, remaining)); > > if (nr < 0) { > break; > } > remaining -= nr; > } > > return n - remaining; > } > {noformat} > In sample tests I saw about a 20% improvement in skip() when seeking towards > the end of a locally cached compressed file. Looking at the > DecompressorStream in HDFS, the seek method is a near copy of the old > InputStream method: > {noformat} > private byte[] skipBytes = new byte[512]; > @Override > public long skip(long n) throws IOException { > // Sanity checks > if (n < 0) { > throw new IllegalArgumentException("negative skip length"); > } > checkStream(); > > // Read 'n' bytes > int skipped = 0; > while (skipped < n) { > int len = Math.min(((int)n - skipped), skipBytes.length); > len = read(skipBytes, 0, len); > if (len == -1) { > eof = true; > break; > } > skipped += len; > } > return skipped; > } > {noformat} > This task is to evaluate the changes to DecompressorStream with a possible > patch to HDFS and possible bug request to Oracle to port the InputStream.seek > changes to DeflaterInputStream.seek -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4469) ConcurrentModificationException while running MultiTable.xml node in Random Walk
[ https://issues.apache.org/jira/browse/ACCUMULO-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586407#comment-15586407 ] Josh Elser commented on ACCUMULO-4469: -- Oh dear, my sincerest apologies for not asking the same question weeks ago, Dima. Your contributions are very appreciated! > ConcurrentModificationException while running MultiTable.xml node in Random > Walk > - > > Key: ACCUMULO-4469 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4469 > Project: Accumulo > Issue Type: Bug > Components: test >Affects Versions: 1.7.2 >Reporter: Dima Spivak >Assignee: Dima Spivak > Fix For: 1.7.3, 1.8.1, 2.0.0 > > Attachments: ACCUMULO-4469_1.7_v1.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > After the resolution of ACCUMULO-4467, I got back to playing with Random Walk > and had a failure caused by a {{ConcurrentModificationException}}: > {code} > 23 01:03:04,316 [randomwalk.Framework] ERROR: Error during random walk > java.lang.Exception: Error running node MultiTable.xml > at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346) > at > org.apache.accumulo.test.randomwalk.Framework.run(Framework.java:59) > at > org.apache.accumulo.test.randomwalk.Framework.main(Framework.java:119) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.accumulo.start.Main$2.run(Main.java:157) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859) > at java.util.ArrayList$Itr.next(ArrayList.java:831) > at > org.apache.accumulo.test.randomwalk.multitable.MultiTableFixture.tearDown(MultiTableFixture.java:64) > at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:365) > at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:283) > at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:278) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > ... 1 more > {code} > [This section of > code|https://github.com/apache/accumulo/blob/master/test/src/main/java/org/apache/accumulo/test/randomwalk/multitable/MultiTableFixture.java#L61-L71] > seems to be at fault. In particular, it looks like we're getting the table > list, but then instead of doing a deep copy to a new {{ArrayList}} > from which we choose tables to delete, we're looping through and deleting > tables while referring to the changing list, which has the effect of > modifying it and making Java unhappy. Am I missing something more complex or > can I fix this one myself by just doing the aforementioned deep copy of the > table list? Or is a better way to use the {{TableOperations.list()}} method > and iterate through the {{SortedSet}} it provides? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4469) ConcurrentModificationException while running MultiTable.xml node in Random Walk
[ https://issues.apache.org/jira/browse/ACCUMULO-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586377#comment-15586377 ] Dima Spivak commented on ACCUMULO-4469: --- Sure. I work Cloudera, Apache username is dimaspivak. > ConcurrentModificationException while running MultiTable.xml node in Random > Walk > - > > Key: ACCUMULO-4469 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4469 > Project: Accumulo > Issue Type: Bug > Components: test >Affects Versions: 1.7.2 >Reporter: Dima Spivak >Assignee: Dima Spivak > Fix For: 1.7.3, 1.8.1, 2.0.0 > > Attachments: ACCUMULO-4469_1.7_v1.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > After the resolution of ACCUMULO-4467, I got back to playing with Random Walk > and had a failure caused by a {{ConcurrentModificationException}}: > {code} > 23 01:03:04,316 [randomwalk.Framework] ERROR: Error during random walk > java.lang.Exception: Error running node MultiTable.xml > at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346) > at > org.apache.accumulo.test.randomwalk.Framework.run(Framework.java:59) > at > org.apache.accumulo.test.randomwalk.Framework.main(Framework.java:119) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.accumulo.start.Main$2.run(Main.java:157) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859) > at java.util.ArrayList$Itr.next(ArrayList.java:831) > at > org.apache.accumulo.test.randomwalk.multitable.MultiTableFixture.tearDown(MultiTableFixture.java:64) > at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:365) > at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:283) > at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:278) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > ... 1 more > {code} > [This section of > code|https://github.com/apache/accumulo/blob/master/test/src/main/java/org/apache/accumulo/test/randomwalk/multitable/MultiTableFixture.java#L61-L71] > seems to be at fault. In particular, it looks like we're getting the table > list, but then instead of doing a deep copy to a new {{ArrayList}} > from which we choose tables to delete, we're looping through and deleting > tables while referring to the changing list, which has the effect of > modifying it and making Java unhappy. Am I missing something more complex or > can I fix this one myself by just doing the aforementioned deep copy of the > table list? Or is a better way to use the {{TableOperations.list()}} method > and iterate through the {{SortedSet}} it provides? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ACCUMULO-4469) ConcurrentModificationException while running MultiTable.xml node in Random Walk
[ https://issues.apache.org/jira/browse/ACCUMULO-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15586370#comment-15586370 ] Sean Busbey commented on ACCUMULO-4469: --- qq [~dimaspivak], would you like to be listed on our [page of contributors|http://accumulo.apache.org/people]? If so, do you want your employer listed? Timezone? > ConcurrentModificationException while running MultiTable.xml node in Random > Walk > - > > Key: ACCUMULO-4469 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4469 > Project: Accumulo > Issue Type: Bug > Components: test >Affects Versions: 1.7.2 >Reporter: Dima Spivak >Assignee: Dima Spivak > Fix For: 1.7.3, 1.8.1, 2.0.0 > > Attachments: ACCUMULO-4469_1.7_v1.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > After the resolution of ACCUMULO-4467, I got back to playing with Random Walk > and had a failure caused by a {{ConcurrentModificationException}}: > {code} > 23 01:03:04,316 [randomwalk.Framework] ERROR: Error during random walk > java.lang.Exception: Error running node MultiTable.xml > at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346) > at > org.apache.accumulo.test.randomwalk.Framework.run(Framework.java:59) > at > org.apache.accumulo.test.randomwalk.Framework.main(Framework.java:119) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.accumulo.start.Main$2.run(Main.java:157) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.ConcurrentModificationException > at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:859) > at java.util.ArrayList$Itr.next(ArrayList.java:831) > at > org.apache.accumulo.test.randomwalk.multitable.MultiTableFixture.tearDown(MultiTableFixture.java:64) > at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:365) > at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:283) > at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:278) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at > org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35) > ... 1 more > {code} > [This section of > code|https://github.com/apache/accumulo/blob/master/test/src/main/java/org/apache/accumulo/test/randomwalk/multitable/MultiTableFixture.java#L61-L71] > seems to be at fault. In particular, it looks like we're getting the table > list, but then instead of doing a deep copy to a new {{ArrayList}} > from which we choose tables to delete, we're looping through and deleting > tables while referring to the changing list, which has the effect of > modifying it and making Java unhappy. Am I missing something more complex or > can I fix this one myself by just doing the aforementioned deep copy of the > table list? Or is a better way to use the {{TableOperations.list()}} method > and iterate through the {{SortedSet}} it provides? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (ACCUMULO-3891) Add link to user guide from monitor page
[ https://issues.apache.org/jira/browse/ACCUMULO-3891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Tubbs resolved ACCUMULO-3891. - Resolution: Fixed > Add link to user guide from monitor page > > > Key: ACCUMULO-3891 > URL: https://issues.apache.org/jira/browse/ACCUMULO-3891 > Project: Accumulo > Issue Type: Improvement > Components: docs, monitor >Reporter: Mike Drob >Assignee: Luis Tavarez >Priority: Minor > Labels: newbie > Fix For: 2.0.0 > > Time Spent: 20m > Remaining Estimate: 0h > > It would be nice if there was a link directly to the user guide from the > monitor page. If necessary, we can also add an alternate URL property to > provide for local mirroring. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Accumulo-Pull-Requests - Build # 470 - Fixed
The Apache Jenkins build system has built Accumulo-Pull-Requests (build #470) Status: Fixed Check console output at https://builds.apache.org/job/Accumulo-Pull-Requests/470/ to view the results.
[jira] [Commented] (ACCUMULO-4501) Add support to RFile to track and store the histogram
[ https://issues.apache.org/jira/browse/ACCUMULO-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585370#comment-15585370 ] Keith Turner commented on ACCUMULO-4501: [~elserj] as promised on IRC, here is a write up. This covers what [~rweeks] and I discussed at the Accumulo Summit Hackathon. Users could configure, per table, an implementation of CompactionSummarizer. {code:java} interface Counters { void increment(String counter, long amount); void increment(ByteSequence counter, long amount); // I thought of use cases where I would want to append a prefix to the counter. We could // offer this as primitive so that each user does not have to figure out how to do this efficiently. // Simple example of uses cases would be "fam:" and "vis:" prefixes for counting column // families and visibility. void increment(String prefix, ByteSequence counter, long amount); } {code} {code:java} interface CompactionSummarizer { void summarize(Key k, Value v, Counters counters); } {code} When a CompactionSummarizer is configured, Accumulo could do the following at compaction time. * Compute a histogram during compaction by calling CompactionSummarizer for each Key Value added to RFile * Limit the histogram to a max size * Store histogram in RFile * Store name of summarizer in RFile * Store if histogram exceeded max size in RFile We could modify rfile-info to print this information when its present in an RFile. We could also offer a use level API to fetch this information. The API could offer the following. * Require user to specify the name of the CompactionSummarizer they want histograms for. This is so that RFiles containing histograms generated by a different CompactionSummarizer can be ignored. * Allow user to compute histogram for a row range. * Along with returned histogram, indicate if histograms were missing from RFiles or exceeded max size. We discussed an implementation similar to the BatchScanner in that it would send request out to TabletServers to fetch info in parallel. Histograms could be combined at the tablet, tablet server, and client. Thinking about this a little more after the summit I realized this implementation may double count files that span multiple tablets. Another possible implementation would be to gather the unique set of files in the range, and then farm out to the tablet servers aggregating the histograms. This approach makes it hard to possibly cache the serialized histograms. We also discussed if the in memory map should keep a histogram, but came to no conclusion on this. > Add support to RFile to track and store the histogram > - > > Key: ACCUMULO-4501 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4501 > Project: Accumulo > Issue Type: Sub-task > Components: client, tserver >Reporter: Josh Elser >Assignee: Josh Elser > Time Spent: 1h > Remaining Estimate: 0h > > Modify RFile such that it can build the histogram and store it in an RFile. > Reading the RFile would deserialize the histogram back into memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)