[jira] [Commented] (ACCUMULO-4468) accumulo.core.data.Key.equals(Key, PartialKey) improvement

2016-09-21 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511949#comment-15511949
 ] 

Josh Elser commented on ACCUMULO-4468:
--

Reran Will's original test and can vouch for the results. I also modified the 
benchmark to do a full equality check (Row to Deletion flag) with the same data 
generation:

{noformat}
Benchmark   Mode  Cnt   Score   Error  Units
MyBenchmark.customVanilla  thrpt  200  81.514 ± 3.989  ops/s
MyBenchmark.customWill thrpt  200  96.185 ± 1.736  ops/s
{noformat}

I'm not super happy with the data generation actually being representative, but 
I am warming up to these changes having a positive net effect.

I commonly think of the following representation. For each row (which would be 
relatively close to each other):

* a few column families
* 10 to 15 qualifiers spread across the families
* A few timestamps spread across the keys in one row

This models attributes on some "object" which is stored in one row. There is 
some logical partitioning of the attributes. Most attributes are written once, 
some are updated over time.

> accumulo.core.data.Key.equals(Key, PartialKey) improvement
> --
>
> Key: ACCUMULO-4468
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4468
> Project: Accumulo
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.8.0
>Reporter: Will Murnane
>Priority: Trivial
>  Labels: newbie, performance
> Attachments: benchmark.tar.gz, key_comparison.patch
>
>
> In the Key.equals(Key, PartialKey) overload, the current method compares 
> starting at the beginning of the key, and works its way toward the end. This 
> functions correctly, of course, but one of the typical uses of this method is 
> to compare adjacent rows to break them into larger chunks. For example, 
> accumulo.core.iterators.Combiner repeatedly calls this method with subsequent 
> pairs of keys.
> I have a patch which reverses the comparison order. That is, if the method is 
> called with ROW_COLFAM_COLQUAL_COLVIS, it will compare visibility, cq, cf, 
> and finally row. This (marginally) improves the speed of comparisons in the 
> relatively common case where only the last part is changing, with less 
> complex code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4467) Random Walk broken because of unmet dependency on commons-math

2016-09-21 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511923#comment-15511923
 ] 

Josh Elser commented on ACCUMULO-4467:
--

+1

> Random Walk broken because of unmet dependency on commons-math
> --
>
> Key: ACCUMULO-4467
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4467
> Project: Accumulo
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.6.6, 1.7.2
>Reporter: Dima Spivak
>Assignee: Sean Busbey
> Fix For: 1.7.3, 1.8.1, 2.0.0
>
> Attachments: ACCUMULO-4467-1.6.v1.patch, ACCUMULO-4467-1.7.v1.patch, 
> ACCUMULO-4467-1.8.v1.patch
>
>
> When trying to run the Random Walk with {{LongEach.xml}} module, I hit a 
> failure once we reach the {{Shard.xml}} step:
> {code}
> 16 19:52:05,146 [randomwalk.Framework] ERROR: Error during random walk
> java.lang.Exception: Error running node Shard.xml
>   at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
>   at org.apache.accumulo.test.randomwalk.Framework.run(Framework.java:59)
>   at 
> org.apache.accumulo.test.randomwalk.Framework.main(Framework.java:119)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.accumulo.start.Main$2.run(Main.java:157)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Error running node shard.BulkInsert
>   at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
>   at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:283)
>   at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:278)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at 
> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
>   ... 1 more
> Caused by: java.lang.Exception: Failed to run map/red verify
>   at 
> org.apache.accumulo.test.randomwalk.shard.BulkInsert.sort(BulkInsert.java:186)
>   at 
> org.apache.accumulo.test.randomwalk.shard.BulkInsert.visit(BulkInsert.java:132)
>   ... 9 more
> {code}
> Digging into YARN to see why the MR job became unhappy, I see the following:
> {code}
> Error: java.lang.ClassNotFoundException: 
> org.apache.commons.math.stat.descriptive.SummaryStatistics at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:366) at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:355) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> java.net.URLClassLoader.findClass(URLClassLoader.java:354) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:425) at 
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:358) at 
> org.apache.accumulo.core.file.rfile.RFile$Writer.(RFile.java:310) at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:127)
>  at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:106)
>  at 
> org.apache.accumulo.core.file.DispatchingFileFactory.openWriter(DispatchingFileFactory.java:78)
>  at 
> org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:172)
>  at 
> org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:152)
>  at 
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:558)
>  at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>  at 
> org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105)
>  at org.apache.hadoop.mapreduce.Reducer.reduce(Reducer.java:150) at 
> org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) at 
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at 
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:415) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>  at 

[jira] [Commented] (ACCUMULO-4468) accumulo.core.data.Key.equals(Key, PartialKey) improvement

2016-09-21 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511430#comment-15511430
 ] 

Josh Elser commented on ACCUMULO-4468:
--

Looking at the repo, it's important that the code is not doing a 
{{previous.equals(next)}} but actually {{previous.equals(next, 
PartialKey.ROW_COLFAM)}}.

> accumulo.core.data.Key.equals(Key, PartialKey) improvement
> --
>
> Key: ACCUMULO-4468
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4468
> Project: Accumulo
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.8.0
>Reporter: Will Murnane
>Priority: Trivial
>  Labels: newbie, performance
> Attachments: benchmark.tar.gz, key_comparison.patch
>
>
> In the Key.equals(Key, PartialKey) overload, the current method compares 
> starting at the beginning of the key, and works its way toward the end. This 
> functions correctly, of course, but one of the typical uses of this method is 
> to compare adjacent rows to break them into larger chunks. For example, 
> accumulo.core.iterators.Combiner repeatedly calls this method with subsequent 
> pairs of keys.
> I have a patch which reverses the comparison order. That is, if the method is 
> called with ROW_COLFAM_COLQUAL_COLVIS, it will compare visibility, cq, cf, 
> and finally row. This (marginally) improves the speed of comparisons in the 
> relatively common case where only the last part is changing, with less 
> complex code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (ACCUMULO-4468) accumulo.core.data.Key.equals(Key, PartialKey) improvement

2016-09-21 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511430#comment-15511430
 ] 

Josh Elser edited comment on ACCUMULO-4468 at 9/21/16 10:44 PM:


--Looking at the repo, it's important that the code is not doing a 
{{previous.equals(next)}} but actually {{previous.equals(next, 
PartialKey.ROW_COLFAM)}}.--

Jk, I see you did mention this and I just didn't read closely.




was (Author: elserj):
Looking at the repo, it's important that the code is not doing a 
{{previous.equals(next)}} but actually {{previous.equals(next, 
PartialKey.ROW_COLFAM)}}.

> accumulo.core.data.Key.equals(Key, PartialKey) improvement
> --
>
> Key: ACCUMULO-4468
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4468
> Project: Accumulo
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.8.0
>Reporter: Will Murnane
>Priority: Trivial
>  Labels: newbie, performance
> Attachments: benchmark.tar.gz, key_comparison.patch
>
>
> In the Key.equals(Key, PartialKey) overload, the current method compares 
> starting at the beginning of the key, and works its way toward the end. This 
> functions correctly, of course, but one of the typical uses of this method is 
> to compare adjacent rows to break them into larger chunks. For example, 
> accumulo.core.iterators.Combiner repeatedly calls this method with subsequent 
> pairs of keys.
> I have a patch which reverses the comparison order. That is, if the method is 
> called with ROW_COLFAM_COLQUAL_COLVIS, it will compare visibility, cq, cf, 
> and finally row. This (marginally) improves the speed of comparisons in the 
> relatively common case where only the last part is changing, with less 
> complex code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4468) accumulo.core.data.Key.equals(Key, PartialKey) improvement

2016-09-21 Thread Keith Turner (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511418#comment-15511418
 ] 

Keith Turner commented on ACCUMULO-4468:


[~wmurnane] I like the change.  I wrote the comment and made the changes in Key 
that you referenced.  I remember doing the performance testing for that.  When 
I first experimented with the change, I tried comparing the byte arrays in 
reveres order.  That was much slower than comparing forward. So I avoiding that 
and found another strategy that worked well.  I think the concept is sound, but 
the performance testing is definitely needed to make sure it works as intended.

I also like the switch statement fall through, I think its slick.  If it 
doesn't exists, It would be nice to add a unit test that checks for correctness 
for keys that only differ by one field.  Basically ensure that each of the key 
field comparisons is tested.

> accumulo.core.data.Key.equals(Key, PartialKey) improvement
> --
>
> Key: ACCUMULO-4468
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4468
> Project: Accumulo
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.8.0
>Reporter: Will Murnane
>Priority: Trivial
>  Labels: newbie, performance
> Attachments: benchmark.tar.gz, key_comparison.patch
>
>
> In the Key.equals(Key, PartialKey) overload, the current method compares 
> starting at the beginning of the key, and works its way toward the end. This 
> functions correctly, of course, but one of the typical uses of this method is 
> to compare adjacent rows to break them into larger chunks. For example, 
> accumulo.core.iterators.Combiner repeatedly calls this method with subsequent 
> pairs of keys.
> I have a patch which reverses the comparison order. That is, if the method is 
> called with ROW_COLFAM_COLQUAL_COLVIS, it will compare visibility, cq, cf, 
> and finally row. This (marginally) improves the speed of comparisons in the 
> relatively common case where only the last part is changing, with less 
> complex code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4468) accumulo.core.data.Key.equals(Key, PartialKey) improvement

2016-09-21 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511398#comment-15511398
 ] 

Josh Elser commented on ACCUMULO-4468:
--

Great stuff, [~wmurnane]! A few thoughts:

bq. the original equals() copied to the new class is called customVanilla, and 
the original equals() in the original Key class is called standardEquals. The 
numbers to compare are really the two custom* ones; the standardEquals value is 
given just to show that it's in the same ballpark. 

I'm not grok'ing the difference between customVanilla and standardEquals. Why 
the variance in these two? Shouldn't they be essentially equivalent?

bq. As a user of the API, I'd rather not have to think about equalsForward() 
versus equalsBackward().

I concur with you here.

bq. I agree that any change should start with prejudice against it. However, I 
think the numbers above prove my case: when keys are presented in sorted order, 
which happens often in Accumulo, the proposed method of comparing is slightly 
but noticeably faster. The degree of improvement depends on the data, but it 
doesn't perform worse than the current solution in any case that I tested.

Great, I hoped you didn't take my initial prejudice badly :). This is a great 
start. I'm curious trying to tweak a couple of other things. Sharing your 
project was super useful.

Ultimately, if these numbers are as they appear (better in some cases, no worse 
in others), this is a great improvement. Expecting large contiguous blocks of 
keys where row or row+cf change very infrequently makes sense to optimize. It 
appears that {{compareTo(Object}} is also using a separate code path, so I 
don't think this would have a big affect on things like creating RFiles for 
bulk imports. I need to search through usages though.

> accumulo.core.data.Key.equals(Key, PartialKey) improvement
> --
>
> Key: ACCUMULO-4468
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4468
> Project: Accumulo
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.8.0
>Reporter: Will Murnane
>Priority: Trivial
>  Labels: newbie, performance
> Attachments: benchmark.tar.gz, key_comparison.patch
>
>
> In the Key.equals(Key, PartialKey) overload, the current method compares 
> starting at the beginning of the key, and works its way toward the end. This 
> functions correctly, of course, but one of the typical uses of this method is 
> to compare adjacent rows to break them into larger chunks. For example, 
> accumulo.core.iterators.Combiner repeatedly calls this method with subsequent 
> pairs of keys.
> I have a patch which reverses the comparison order. That is, if the method is 
> called with ROW_COLFAM_COLQUAL_COLVIS, it will compare visibility, cq, cf, 
> and finally row. This (marginally) improves the speed of comparisons in the 
> relatively common case where only the last part is changing, with less 
> complex code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (ACCUMULO-4467) Random Walk broken because of unmet dependency on commons-math

2016-09-21 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1552#comment-1552
 ] 

Sean Busbey edited comment on ACCUMULO-4467 at 9/21/16 9:12 PM:


patch for branch 1.8, with a merge {{sours}} from 1.7 on the first commit. 
(which I think means just look at the second commit)


was (Author: busbey):
patch for branch 1.8, with a merge {{sours} from 1.7 on the first commit. 
(which I think means just look at the second commit)

> Random Walk broken because of unmet dependency on commons-math
> --
>
> Key: ACCUMULO-4467
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4467
> Project: Accumulo
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.6.6, 1.7.2
>Reporter: Dima Spivak
>Assignee: Sean Busbey
> Fix For: 1.7.3, 1.8.1, 2.0.0
>
> Attachments: ACCUMULO-4467-1.6.v1.patch, ACCUMULO-4467-1.7.v1.patch, 
> ACCUMULO-4467-1.8.v1.patch
>
>
> When trying to run the Random Walk with {{LongEach.xml}} module, I hit a 
> failure once we reach the {{Shard.xml}} step:
> {code}
> 16 19:52:05,146 [randomwalk.Framework] ERROR: Error during random walk
> java.lang.Exception: Error running node Shard.xml
>   at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
>   at org.apache.accumulo.test.randomwalk.Framework.run(Framework.java:59)
>   at 
> org.apache.accumulo.test.randomwalk.Framework.main(Framework.java:119)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.accumulo.start.Main$2.run(Main.java:157)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Error running node shard.BulkInsert
>   at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
>   at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:283)
>   at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:278)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at 
> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
>   ... 1 more
> Caused by: java.lang.Exception: Failed to run map/red verify
>   at 
> org.apache.accumulo.test.randomwalk.shard.BulkInsert.sort(BulkInsert.java:186)
>   at 
> org.apache.accumulo.test.randomwalk.shard.BulkInsert.visit(BulkInsert.java:132)
>   ... 9 more
> {code}
> Digging into YARN to see why the MR job became unhappy, I see the following:
> {code}
> Error: java.lang.ClassNotFoundException: 
> org.apache.commons.math.stat.descriptive.SummaryStatistics at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:366) at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:355) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> java.net.URLClassLoader.findClass(URLClassLoader.java:354) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:425) at 
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:358) at 
> org.apache.accumulo.core.file.rfile.RFile$Writer.(RFile.java:310) at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:127)
>  at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:106)
>  at 
> org.apache.accumulo.core.file.DispatchingFileFactory.openWriter(DispatchingFileFactory.java:78)
>  at 
> org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:172)
>  at 
> org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:152)
>  at 
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:558)
>  at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>  at 
> org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105)
>  at org.apache.hadoop.mapreduce.Reducer.reduce(Reducer.java:150) at 
> org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) at 
> 

[jira] [Updated] (ACCUMULO-4468) accumulo.core.data.Key.equals(Key, PartialKey) improvement

2016-09-21 Thread Will Murnane (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Murnane updated ACCUMULO-4468:
---
Attachment: benchmark.tar.gz

mvn clean install
java -jar target/benchmark.jar -help

> accumulo.core.data.Key.equals(Key, PartialKey) improvement
> --
>
> Key: ACCUMULO-4468
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4468
> Project: Accumulo
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.8.0
>Reporter: Will Murnane
>Priority: Trivial
>  Labels: newbie, performance
> Attachments: benchmark.tar.gz, key_comparison.patch
>
>
> In the Key.equals(Key, PartialKey) overload, the current method compares 
> starting at the beginning of the key, and works its way toward the end. This 
> functions correctly, of course, but one of the typical uses of this method is 
> to compare adjacent rows to break them into larger chunks. For example, 
> accumulo.core.iterators.Combiner repeatedly calls this method with subsequent 
> pairs of keys.
> I have a patch which reverses the comparison order. That is, if the method is 
> called with ROW_COLFAM_COLQUAL_COLVIS, it will compare visibility, cq, cf, 
> and finally row. This (marginally) improves the speed of comparisons in the 
> relatively common case where only the last part is changing, with less 
> complex code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4468) accumulo.core.data.Key.equals(Key, PartialKey) improvement

2016-09-21 Thread Will Murnane (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511154#comment-15511154
 ] 

Will Murnane commented on ACCUMULO-4468:


Responding to [~ctubbsii]'s points first:

# Most things in Accumulo are dealt with in sorted order, so I think it's 
reasonable to optimize for the case where things are sorted. Whether this is 
true in the general case is a function of how one calls the function, but it's 
not a change to the functionality of the code, just its performance. All the 
places I can see it used in Accumulo, the keys that are being compared are 
coming from SortedKeyValueIterator, so it's reasonable that we're comparing 
keys which are sorted, and the new behavior will be an improvement.
# I ran some tests with JMH to see what the difference is like. I can post the 
whole project, but here are some selected results. There are three functions 
being compared. I implemented a subclass of Key which has new methods: my 
proposed equals() function is called customWill, the original equals() copied 
to the new class is called customVanilla, and the original equals() in the 
original Key class is called standardEquals. The numbers to compare are really 
the two custom* ones; the standardEquals value is given just to show that it's 
in the same ballpark. \\ The test data set is generated before the benchmark 
begins, and contains 1m keys to go through, calling previous.equals(next, 
ROW_COLFAM). This is a worst-case scenario for sorted input, because the most 
that it can manage to avoid doing (as compared to the current implementation) 
is comparing the row values.
## I generated rows whose rowID are all equal, and the column family changes 
every key.
{noformat}
BenchmarkMode  Cnt   Score   Error  Units
MyBenchmark.customVanilla   thrpt   30  46.320 ± 3.803  ops/s
MyBenchmark.customWill  thrpt   30  88.349 ± 2.723  ops/s
MyBenchmark.standardEquals  thrpt   30  36.736 ± 0.883  ops/s
{noformat}
## I generated rows whose rowID are all equal, and the column family changes 
every 3 keys.
{noformat}
BenchmarkMode  Cnt   Score   Error  Units
MyBenchmark.customVanilla   thrpt   30  30.684 ± 1.258  ops/s
MyBenchmark.customWill  thrpt   30  34.292 ± 1.339  ops/s
MyBenchmark.standardEquals  thrpt   30  27.277 ± 0.984  ops/s
{noformat}
## I generated keys whose rowID are all equal, and the column family changes 
every 5 keys. 
{noformat}
BenchmarkMode  Cnt   Score   Error  Units
MyBenchmark.customVanilla   thrpt   30  27.195 ± 0.895  ops/s
MyBenchmark.customWill  thrpt   30  30.048 ± 0.838  ops/s
MyBenchmark.standardEquals  thrpt   30  25.044 ± 0.731  ops/s
{noformat}
## Finally, I generated keys whose rowID are all equal, and the column family 
changes every 1000 keys.
{noformat}
BenchmarkMode  Cnt   Score   Error  Units
MyBenchmark.customVanilla   thrpt   30  23.447 ± 1.010  ops/s
MyBenchmark.customWill  thrpt   30  23.427 ± 0.371  ops/s
MyBenchmark.standardEquals  thrpt   30  22.356 ± 0.192  ops/s
{noformat}
# As a user of the API, I'd rather not have to think about equalsForward() 
versus equalsBackward(). Maybe add an optional flag to specify direction of 
comparison, for 
# I agree in general, but I would argue that the code of the current equals() 
method is messier and harder to read. It repeats the same code quite a bit. I 
wrote a comment in the patch remarking on the fact that there's fallthrough and 
it's used intentionally, to try to prevent future confusion.

[~elserj] I agree that any change should start with prejudice against it. 
However, I think the numbers above prove my case: when keys are presented in 
sorted order, which happens often in Accumulo, the proposed method of comparing 
is slightly but noticeably faster. The degree of improvement depends on the 
data, but it doesn't perform worse than the current solution in any case that I 
tested.

> accumulo.core.data.Key.equals(Key, PartialKey) improvement
> --
>
> Key: ACCUMULO-4468
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4468
> Project: Accumulo
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.8.0
>Reporter: Will Murnane
>Priority: Trivial
>  Labels: newbie, performance
> Attachments: key_comparison.patch
>
>
> In the Key.equals(Key, PartialKey) overload, the current method compares 
> starting at the beginning of the key, and works its way toward the end. This 
> functions correctly, of course, but one of the typical uses of this method is 
> to compare adjacent rows to break them into larger chunks. For example, 
> accumulo.core.iterators.Combiner repeatedly calls this method with subsequent 
> pairs of keys.
> I have a patch which reverses 

[jira] [Commented] (ACCUMULO-4467) Random Walk broken because of unmet dependency on commons-math

2016-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511131#comment-15511131
 ] 

Hadoop QA commented on ACCUMULO-4467:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 3s {color} 
| {color:red} ACCUMULO-4467 does not apply to 1.8. Rebase required? Wrong 
Branch? See http://accumulo.apache.org/git.html#contributors for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12829679/ACCUMULO-4467-1.8.v1.patch
 |
| JIRA Issue | ACCUMULO-4467 |
| Console output | 
https://builds.apache.org/job/PreCommit-ACCUMULO-Build/44/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Random Walk broken because of unmet dependency on commons-math
> --
>
> Key: ACCUMULO-4467
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4467
> Project: Accumulo
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.6.6, 1.7.2
>Reporter: Dima Spivak
>Assignee: Sean Busbey
> Fix For: 1.7.3, 1.8.1, 2.0.0
>
> Attachments: ACCUMULO-4467-1.6.v1.patch, ACCUMULO-4467-1.7.v1.patch, 
> ACCUMULO-4467-1.8.v1.patch
>
>
> When trying to run the Random Walk with {{LongEach.xml}} module, I hit a 
> failure once we reach the {{Shard.xml}} step:
> {code}
> 16 19:52:05,146 [randomwalk.Framework] ERROR: Error during random walk
> java.lang.Exception: Error running node Shard.xml
>   at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
>   at org.apache.accumulo.test.randomwalk.Framework.run(Framework.java:59)
>   at 
> org.apache.accumulo.test.randomwalk.Framework.main(Framework.java:119)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.accumulo.start.Main$2.run(Main.java:157)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Error running node shard.BulkInsert
>   at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
>   at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:283)
>   at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:278)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at 
> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
>   ... 1 more
> Caused by: java.lang.Exception: Failed to run map/red verify
>   at 
> org.apache.accumulo.test.randomwalk.shard.BulkInsert.sort(BulkInsert.java:186)
>   at 
> org.apache.accumulo.test.randomwalk.shard.BulkInsert.visit(BulkInsert.java:132)
>   ... 9 more
> {code}
> Digging into YARN to see why the MR job became unhappy, I see the following:
> {code}
> Error: java.lang.ClassNotFoundException: 
> org.apache.commons.math.stat.descriptive.SummaryStatistics at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:366) at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:355) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> java.net.URLClassLoader.findClass(URLClassLoader.java:354) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:425) at 
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:358) at 
> org.apache.accumulo.core.file.rfile.RFile$Writer.(RFile.java:310) at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:127)
>  at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:106)
>  at 
> org.apache.accumulo.core.file.DispatchingFileFactory.openWriter(DispatchingFileFactory.java:78)
>  at 
> org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:172)
>  at 
> org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:152)
>  at 
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:558)
>  at 
> 

[jira] [Updated] (ACCUMULO-4467) Random Walk broken because of unmet dependency on commons-math

2016-09-21 Thread Sean Busbey (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated ACCUMULO-4467:
--
Attachment: ACCUMULO-4467-1.8.v1.patch

patch for branch 1.8, with a merge {{sours} from 1.7 on the first commit. 
(which I think means just look at the second commit)

> Random Walk broken because of unmet dependency on commons-math
> --
>
> Key: ACCUMULO-4467
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4467
> Project: Accumulo
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.6.6, 1.7.2
>Reporter: Dima Spivak
>Assignee: Sean Busbey
> Fix For: 1.7.3, 1.8.1, 2.0.0
>
> Attachments: ACCUMULO-4467-1.6.v1.patch, ACCUMULO-4467-1.7.v1.patch, 
> ACCUMULO-4467-1.8.v1.patch
>
>
> When trying to run the Random Walk with {{LongEach.xml}} module, I hit a 
> failure once we reach the {{Shard.xml}} step:
> {code}
> 16 19:52:05,146 [randomwalk.Framework] ERROR: Error during random walk
> java.lang.Exception: Error running node Shard.xml
>   at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
>   at org.apache.accumulo.test.randomwalk.Framework.run(Framework.java:59)
>   at 
> org.apache.accumulo.test.randomwalk.Framework.main(Framework.java:119)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.accumulo.start.Main$2.run(Main.java:157)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Error running node shard.BulkInsert
>   at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
>   at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:283)
>   at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:278)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at 
> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
>   ... 1 more
> Caused by: java.lang.Exception: Failed to run map/red verify
>   at 
> org.apache.accumulo.test.randomwalk.shard.BulkInsert.sort(BulkInsert.java:186)
>   at 
> org.apache.accumulo.test.randomwalk.shard.BulkInsert.visit(BulkInsert.java:132)
>   ... 9 more
> {code}
> Digging into YARN to see why the MR job became unhappy, I see the following:
> {code}
> Error: java.lang.ClassNotFoundException: 
> org.apache.commons.math.stat.descriptive.SummaryStatistics at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:366) at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:355) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> java.net.URLClassLoader.findClass(URLClassLoader.java:354) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:425) at 
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:358) at 
> org.apache.accumulo.core.file.rfile.RFile$Writer.(RFile.java:310) at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:127)
>  at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:106)
>  at 
> org.apache.accumulo.core.file.DispatchingFileFactory.openWriter(DispatchingFileFactory.java:78)
>  at 
> org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:172)
>  at 
> org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:152)
>  at 
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:558)
>  at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>  at 
> org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105)
>  at org.apache.hadoop.mapreduce.Reducer.reduce(Reducer.java:150) at 
> org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) at 
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at 
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> 

[jira] [Commented] (ACCUMULO-4467) Random Walk broken because of unmet dependency on commons-math

2016-09-21 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511100#comment-15511100
 ] 

Sean Busbey commented on ACCUMULO-4467:
---

that's a good point. hopefully conflicts on commons-math3 version compatibility 
are rare.

> Random Walk broken because of unmet dependency on commons-math
> --
>
> Key: ACCUMULO-4467
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4467
> Project: Accumulo
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.6.6, 1.7.2
>Reporter: Dima Spivak
>Assignee: Sean Busbey
> Fix For: 1.7.3, 1.8.1, 2.0.0
>
> Attachments: ACCUMULO-4467-1.6.v1.patch, ACCUMULO-4467-1.7.v1.patch
>
>
> When trying to run the Random Walk with {{LongEach.xml}} module, I hit a 
> failure once we reach the {{Shard.xml}} step:
> {code}
> 16 19:52:05,146 [randomwalk.Framework] ERROR: Error during random walk
> java.lang.Exception: Error running node Shard.xml
>   at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
>   at org.apache.accumulo.test.randomwalk.Framework.run(Framework.java:59)
>   at 
> org.apache.accumulo.test.randomwalk.Framework.main(Framework.java:119)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.accumulo.start.Main$2.run(Main.java:157)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Error running node shard.BulkInsert
>   at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
>   at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:283)
>   at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:278)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at 
> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
>   ... 1 more
> Caused by: java.lang.Exception: Failed to run map/red verify
>   at 
> org.apache.accumulo.test.randomwalk.shard.BulkInsert.sort(BulkInsert.java:186)
>   at 
> org.apache.accumulo.test.randomwalk.shard.BulkInsert.visit(BulkInsert.java:132)
>   ... 9 more
> {code}
> Digging into YARN to see why the MR job became unhappy, I see the following:
> {code}
> Error: java.lang.ClassNotFoundException: 
> org.apache.commons.math.stat.descriptive.SummaryStatistics at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:366) at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:355) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> java.net.URLClassLoader.findClass(URLClassLoader.java:354) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:425) at 
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:358) at 
> org.apache.accumulo.core.file.rfile.RFile$Writer.(RFile.java:310) at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:127)
>  at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:106)
>  at 
> org.apache.accumulo.core.file.DispatchingFileFactory.openWriter(DispatchingFileFactory.java:78)
>  at 
> org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:172)
>  at 
> org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:152)
>  at 
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:558)
>  at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>  at 
> org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105)
>  at org.apache.hadoop.mapreduce.Reducer.reduce(Reducer.java:150) at 
> org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) at 
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at 
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:415) at 
> 

[jira] [Commented] (ACCUMULO-4467) Random Walk broken because of unmet dependency on commons-math

2016-09-21 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511094#comment-15511094
 ] 

Josh Elser commented on ACCUMULO-4467:
--

bq. would you prefer I expressly include it in our libjars?

I think this would be less error prone. e.g. what happens if some user happens 
to be using some weirdo version of YARN that doesn't provide this jar. Happy to 
be told why this isn't accurate though. I have not dug into this to the depth 
that I assume you and Dima have.

> Random Walk broken because of unmet dependency on commons-math
> --
>
> Key: ACCUMULO-4467
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4467
> Project: Accumulo
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.6.6, 1.7.2
>Reporter: Dima Spivak
>Assignee: Sean Busbey
> Fix For: 1.7.3, 1.8.1, 2.0.0
>
> Attachments: ACCUMULO-4467-1.6.v1.patch, ACCUMULO-4467-1.7.v1.patch
>
>
> When trying to run the Random Walk with {{LongEach.xml}} module, I hit a 
> failure once we reach the {{Shard.xml}} step:
> {code}
> 16 19:52:05,146 [randomwalk.Framework] ERROR: Error during random walk
> java.lang.Exception: Error running node Shard.xml
>   at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
>   at org.apache.accumulo.test.randomwalk.Framework.run(Framework.java:59)
>   at 
> org.apache.accumulo.test.randomwalk.Framework.main(Framework.java:119)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.accumulo.start.Main$2.run(Main.java:157)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Error running node shard.BulkInsert
>   at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
>   at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:283)
>   at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:278)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at 
> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
>   ... 1 more
> Caused by: java.lang.Exception: Failed to run map/red verify
>   at 
> org.apache.accumulo.test.randomwalk.shard.BulkInsert.sort(BulkInsert.java:186)
>   at 
> org.apache.accumulo.test.randomwalk.shard.BulkInsert.visit(BulkInsert.java:132)
>   ... 9 more
> {code}
> Digging into YARN to see why the MR job became unhappy, I see the following:
> {code}
> Error: java.lang.ClassNotFoundException: 
> org.apache.commons.math.stat.descriptive.SummaryStatistics at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:366) at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:355) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> java.net.URLClassLoader.findClass(URLClassLoader.java:354) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:425) at 
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:358) at 
> org.apache.accumulo.core.file.rfile.RFile$Writer.(RFile.java:310) at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:127)
>  at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:106)
>  at 
> org.apache.accumulo.core.file.DispatchingFileFactory.openWriter(DispatchingFileFactory.java:78)
>  at 
> org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:172)
>  at 
> org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:152)
>  at 
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:558)
>  at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>  at 
> org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105)
>  at org.apache.hadoop.mapreduce.Reducer.reduce(Reducer.java:150) at 
> org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) at 
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) 

[jira] [Commented] (ACCUMULO-4467) Random Walk broken because of unmet dependency on commons-math

2016-09-21 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511069#comment-15511069
 ] 

Sean Busbey commented on ACCUMULO-4467:
---

{quote}
-1 for >=1.8 on the verbatim patch because we start shipping commons-math3 (the 
patch won't work, the spirit of the change is still +1).
{quote}

on 1.8+ I was going to merge with {{-sours}} and then rely on MR including 
commons-math3 in the classpath. would you prefer I expressly include it in our 
libjars?

> Random Walk broken because of unmet dependency on commons-math
> --
>
> Key: ACCUMULO-4467
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4467
> Project: Accumulo
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.6.6, 1.7.2
>Reporter: Dima Spivak
>Assignee: Sean Busbey
> Fix For: 1.7.3, 1.8.1, 2.0.0
>
> Attachments: ACCUMULO-4467-1.6.v1.patch, ACCUMULO-4467-1.7.v1.patch
>
>
> When trying to run the Random Walk with {{LongEach.xml}} module, I hit a 
> failure once we reach the {{Shard.xml}} step:
> {code}
> 16 19:52:05,146 [randomwalk.Framework] ERROR: Error during random walk
> java.lang.Exception: Error running node Shard.xml
>   at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
>   at org.apache.accumulo.test.randomwalk.Framework.run(Framework.java:59)
>   at 
> org.apache.accumulo.test.randomwalk.Framework.main(Framework.java:119)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.accumulo.start.Main$2.run(Main.java:157)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Error running node shard.BulkInsert
>   at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
>   at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:283)
>   at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:278)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at 
> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
>   ... 1 more
> Caused by: java.lang.Exception: Failed to run map/red verify
>   at 
> org.apache.accumulo.test.randomwalk.shard.BulkInsert.sort(BulkInsert.java:186)
>   at 
> org.apache.accumulo.test.randomwalk.shard.BulkInsert.visit(BulkInsert.java:132)
>   ... 9 more
> {code}
> Digging into YARN to see why the MR job became unhappy, I see the following:
> {code}
> Error: java.lang.ClassNotFoundException: 
> org.apache.commons.math.stat.descriptive.SummaryStatistics at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:366) at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:355) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> java.net.URLClassLoader.findClass(URLClassLoader.java:354) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:425) at 
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:358) at 
> org.apache.accumulo.core.file.rfile.RFile$Writer.(RFile.java:310) at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:127)
>  at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:106)
>  at 
> org.apache.accumulo.core.file.DispatchingFileFactory.openWriter(DispatchingFileFactory.java:78)
>  at 
> org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:172)
>  at 
> org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:152)
>  at 
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:558)
>  at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>  at 
> org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105)
>  at org.apache.hadoop.mapreduce.Reducer.reduce(Reducer.java:150) at 
> org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) at 
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at 
> 

[jira] [Commented] (ACCUMULO-4467) Random Walk broken because of unmet dependency on commons-math

2016-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511021#comment-15511021
 ] 

Hadoop QA commented on ACCUMULO-4467:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 6s {color} 
| {color:red} ACCUMULO-4467 does not apply to 1.7. Rebase required? Wrong 
Branch? See http://accumulo.apache.org/git.html#contributors for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12829670/ACCUMULO-4467-1.7.v1.patch
 |
| JIRA Issue | ACCUMULO-4467 |
| Console output | 
https://builds.apache.org/job/PreCommit-ACCUMULO-Build/43/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Random Walk broken because of unmet dependency on commons-math
> --
>
> Key: ACCUMULO-4467
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4467
> Project: Accumulo
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.6.6, 1.7.2
>Reporter: Dima Spivak
>Assignee: Sean Busbey
> Fix For: 1.7.3, 1.8.1, 2.0.0
>
> Attachments: ACCUMULO-4467-1.6.v1.patch, ACCUMULO-4467-1.7.v1.patch
>
>
> When trying to run the Random Walk with {{LongEach.xml}} module, I hit a 
> failure once we reach the {{Shard.xml}} step:
> {code}
> 16 19:52:05,146 [randomwalk.Framework] ERROR: Error during random walk
> java.lang.Exception: Error running node Shard.xml
>   at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
>   at org.apache.accumulo.test.randomwalk.Framework.run(Framework.java:59)
>   at 
> org.apache.accumulo.test.randomwalk.Framework.main(Framework.java:119)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.accumulo.start.Main$2.run(Main.java:157)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Error running node shard.BulkInsert
>   at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
>   at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:283)
>   at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:278)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at 
> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
>   ... 1 more
> Caused by: java.lang.Exception: Failed to run map/red verify
>   at 
> org.apache.accumulo.test.randomwalk.shard.BulkInsert.sort(BulkInsert.java:186)
>   at 
> org.apache.accumulo.test.randomwalk.shard.BulkInsert.visit(BulkInsert.java:132)
>   ... 9 more
> {code}
> Digging into YARN to see why the MR job became unhappy, I see the following:
> {code}
> Error: java.lang.ClassNotFoundException: 
> org.apache.commons.math.stat.descriptive.SummaryStatistics at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:366) at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:355) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> java.net.URLClassLoader.findClass(URLClassLoader.java:354) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:425) at 
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:358) at 
> org.apache.accumulo.core.file.rfile.RFile$Writer.(RFile.java:310) at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:127)
>  at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:106)
>  at 
> org.apache.accumulo.core.file.DispatchingFileFactory.openWriter(DispatchingFileFactory.java:78)
>  at 
> org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:172)
>  at 
> org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:152)
>  at 
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:558)
>  at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>  at 
> 

[jira] [Commented] (ACCUMULO-4467) Random Walk broken because of unmet dependency on commons-math

2016-09-21 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510994#comment-15510994
 ] 

Josh Elser commented on ACCUMULO-4467:
--

bq. My plan for this was to just fix the immediate problem of RandomWalk not 
including commons-math, since AFAICT we hadn't centralized the needed 
dependencies anywhere

That's fine. My point was that we should consolidate it somewhere (this isn't 
the first time we've had problems like this). NBD, if these are separate tasks.

-0 for 1.6 (let it die)
+1 for 1.7
-1 for >=1.8 on the verbatim patch because we start shipping commons-math3 (the 
patch won't work, the spirit of the change is still +1).

> Random Walk broken because of unmet dependency on commons-math
> --
>
> Key: ACCUMULO-4467
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4467
> Project: Accumulo
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.6.6, 1.7.2
>Reporter: Dima Spivak
>Assignee: Sean Busbey
> Fix For: 1.7.3, 1.8.1, 2.0.0
>
> Attachments: ACCUMULO-4467-1.6.v1.patch, ACCUMULO-4467-1.7.v1.patch
>
>
> When trying to run the Random Walk with {{LongEach.xml}} module, I hit a 
> failure once we reach the {{Shard.xml}} step:
> {code}
> 16 19:52:05,146 [randomwalk.Framework] ERROR: Error during random walk
> java.lang.Exception: Error running node Shard.xml
>   at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
>   at org.apache.accumulo.test.randomwalk.Framework.run(Framework.java:59)
>   at 
> org.apache.accumulo.test.randomwalk.Framework.main(Framework.java:119)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.accumulo.start.Main$2.run(Main.java:157)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Error running node shard.BulkInsert
>   at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
>   at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:283)
>   at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:278)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at 
> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
>   ... 1 more
> Caused by: java.lang.Exception: Failed to run map/red verify
>   at 
> org.apache.accumulo.test.randomwalk.shard.BulkInsert.sort(BulkInsert.java:186)
>   at 
> org.apache.accumulo.test.randomwalk.shard.BulkInsert.visit(BulkInsert.java:132)
>   ... 9 more
> {code}
> Digging into YARN to see why the MR job became unhappy, I see the following:
> {code}
> Error: java.lang.ClassNotFoundException: 
> org.apache.commons.math.stat.descriptive.SummaryStatistics at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:366) at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:355) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> java.net.URLClassLoader.findClass(URLClassLoader.java:354) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:425) at 
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:358) at 
> org.apache.accumulo.core.file.rfile.RFile$Writer.(RFile.java:310) at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:127)
>  at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:106)
>  at 
> org.apache.accumulo.core.file.DispatchingFileFactory.openWriter(DispatchingFileFactory.java:78)
>  at 
> org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:172)
>  at 
> org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:152)
>  at 
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:558)
>  at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>  at 
> org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105)
>  at org.apache.hadoop.mapreduce.Reducer.reduce(Reducer.java:150) at 
> 

[jira] [Updated] (ACCUMULO-4467) Random Walk broken because of unmet dependency on commons-math

2016-09-21 Thread Sean Busbey (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated ACCUMULO-4467:
--
Attachment: ACCUMULO-4467-1.7.v1.patch

attaching a patch for 1.6 and 1.7, depending on which branch folks want to test.

> Random Walk broken because of unmet dependency on commons-math
> --
>
> Key: ACCUMULO-4467
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4467
> Project: Accumulo
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.6.6, 1.7.2
>Reporter: Dima Spivak
>Assignee: Sean Busbey
> Fix For: 1.7.3, 1.8.1, 2.0.0
>
> Attachments: ACCUMULO-4467-1.6.v1.patch, ACCUMULO-4467-1.7.v1.patch
>
>
> When trying to run the Random Walk with {{LongEach.xml}} module, I hit a 
> failure once we reach the {{Shard.xml}} step:
> {code}
> 16 19:52:05,146 [randomwalk.Framework] ERROR: Error during random walk
> java.lang.Exception: Error running node Shard.xml
>   at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
>   at org.apache.accumulo.test.randomwalk.Framework.run(Framework.java:59)
>   at 
> org.apache.accumulo.test.randomwalk.Framework.main(Framework.java:119)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.accumulo.start.Main$2.run(Main.java:157)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Error running node shard.BulkInsert
>   at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
>   at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:283)
>   at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:278)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at 
> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
>   ... 1 more
> Caused by: java.lang.Exception: Failed to run map/red verify
>   at 
> org.apache.accumulo.test.randomwalk.shard.BulkInsert.sort(BulkInsert.java:186)
>   at 
> org.apache.accumulo.test.randomwalk.shard.BulkInsert.visit(BulkInsert.java:132)
>   ... 9 more
> {code}
> Digging into YARN to see why the MR job became unhappy, I see the following:
> {code}
> Error: java.lang.ClassNotFoundException: 
> org.apache.commons.math.stat.descriptive.SummaryStatistics at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:366) at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:355) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> java.net.URLClassLoader.findClass(URLClassLoader.java:354) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:425) at 
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:358) at 
> org.apache.accumulo.core.file.rfile.RFile$Writer.(RFile.java:310) at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:127)
>  at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:106)
>  at 
> org.apache.accumulo.core.file.DispatchingFileFactory.openWriter(DispatchingFileFactory.java:78)
>  at 
> org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:172)
>  at 
> org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:152)
>  at 
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:558)
>  at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>  at 
> org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105)
>  at org.apache.hadoop.mapreduce.Reducer.reduce(Reducer.java:150) at 
> org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) at 
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at 
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:415) at 
> 

[jira] [Commented] (ACCUMULO-4467) Random Walk broken because of unmet dependency on commons-math

2016-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510974#comment-15510974
 ] 

Hadoop QA commented on ACCUMULO-4467:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} 
| {color:red} ACCUMULO-4467 does not apply to master. Rebase required? Wrong 
Branch? See http://accumulo.apache.org/git.html#contributors for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12829666/ACCUMULO-4467-1.6.v1.patch
 |
| JIRA Issue | ACCUMULO-4467 |
| Console output | 
https://builds.apache.org/job/PreCommit-ACCUMULO-Build/42/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Random Walk broken because of unmet dependency on commons-math
> --
>
> Key: ACCUMULO-4467
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4467
> Project: Accumulo
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.6.6, 1.7.2
>Reporter: Dima Spivak
>Assignee: Sean Busbey
> Fix For: 1.7.3, 1.8.1, 2.0.0
>
> Attachments: ACCUMULO-4467-1.6.v1.patch, ACCUMULO-4467-1.7.v1.patch
>
>
> When trying to run the Random Walk with {{LongEach.xml}} module, I hit a 
> failure once we reach the {{Shard.xml}} step:
> {code}
> 16 19:52:05,146 [randomwalk.Framework] ERROR: Error during random walk
> java.lang.Exception: Error running node Shard.xml
>   at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
>   at org.apache.accumulo.test.randomwalk.Framework.run(Framework.java:59)
>   at 
> org.apache.accumulo.test.randomwalk.Framework.main(Framework.java:119)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.accumulo.start.Main$2.run(Main.java:157)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Error running node shard.BulkInsert
>   at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
>   at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:283)
>   at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:278)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at 
> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
>   ... 1 more
> Caused by: java.lang.Exception: Failed to run map/red verify
>   at 
> org.apache.accumulo.test.randomwalk.shard.BulkInsert.sort(BulkInsert.java:186)
>   at 
> org.apache.accumulo.test.randomwalk.shard.BulkInsert.visit(BulkInsert.java:132)
>   ... 9 more
> {code}
> Digging into YARN to see why the MR job became unhappy, I see the following:
> {code}
> Error: java.lang.ClassNotFoundException: 
> org.apache.commons.math.stat.descriptive.SummaryStatistics at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:366) at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:355) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> java.net.URLClassLoader.findClass(URLClassLoader.java:354) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:425) at 
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:358) at 
> org.apache.accumulo.core.file.rfile.RFile$Writer.(RFile.java:310) at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:127)
>  at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:106)
>  at 
> org.apache.accumulo.core.file.DispatchingFileFactory.openWriter(DispatchingFileFactory.java:78)
>  at 
> org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:172)
>  at 
> org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:152)
>  at 
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:558)
>  at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>  at 
> 

[jira] [Updated] (ACCUMULO-4467) Random Walk broken because of unmet dependency on commons-math

2016-09-21 Thread Sean Busbey (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated ACCUMULO-4467:
--
Status: Patch Available  (was: In Progress)

> Random Walk broken because of unmet dependency on commons-math
> --
>
> Key: ACCUMULO-4467
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4467
> Project: Accumulo
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.7.2, 1.6.6
>Reporter: Dima Spivak
>Assignee: Sean Busbey
> Fix For: 1.7.3, 1.8.1, 2.0.0
>
> Attachments: ACCUMULO-4467-1.6.v1.patch
>
>
> When trying to run the Random Walk with {{LongEach.xml}} module, I hit a 
> failure once we reach the {{Shard.xml}} step:
> {code}
> 16 19:52:05,146 [randomwalk.Framework] ERROR: Error during random walk
> java.lang.Exception: Error running node Shard.xml
>   at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
>   at org.apache.accumulo.test.randomwalk.Framework.run(Framework.java:59)
>   at 
> org.apache.accumulo.test.randomwalk.Framework.main(Framework.java:119)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.accumulo.start.Main$2.run(Main.java:157)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Error running node shard.BulkInsert
>   at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
>   at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:283)
>   at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:278)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at 
> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
>   ... 1 more
> Caused by: java.lang.Exception: Failed to run map/red verify
>   at 
> org.apache.accumulo.test.randomwalk.shard.BulkInsert.sort(BulkInsert.java:186)
>   at 
> org.apache.accumulo.test.randomwalk.shard.BulkInsert.visit(BulkInsert.java:132)
>   ... 9 more
> {code}
> Digging into YARN to see why the MR job became unhappy, I see the following:
> {code}
> Error: java.lang.ClassNotFoundException: 
> org.apache.commons.math.stat.descriptive.SummaryStatistics at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:366) at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:355) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> java.net.URLClassLoader.findClass(URLClassLoader.java:354) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:425) at 
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:358) at 
> org.apache.accumulo.core.file.rfile.RFile$Writer.(RFile.java:310) at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:127)
>  at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:106)
>  at 
> org.apache.accumulo.core.file.DispatchingFileFactory.openWriter(DispatchingFileFactory.java:78)
>  at 
> org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:172)
>  at 
> org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:152)
>  at 
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:558)
>  at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>  at 
> org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105)
>  at org.apache.hadoop.mapreduce.Reducer.reduce(Reducer.java:150) at 
> org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) at 
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at 
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:415) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 
> 

[jira] [Updated] (ACCUMULO-4467) Random Walk broken because of unmet dependency on commons-math

2016-09-21 Thread Sean Busbey (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated ACCUMULO-4467:
--
Attachment: ACCUMULO-4467-1.6.v1.patch

> Random Walk broken because of unmet dependency on commons-math
> --
>
> Key: ACCUMULO-4467
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4467
> Project: Accumulo
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.6.6, 1.7.2
>Reporter: Dima Spivak
>Assignee: Sean Busbey
> Fix For: 1.7.3, 1.8.1, 2.0.0
>
> Attachments: ACCUMULO-4467-1.6.v1.patch
>
>
> When trying to run the Random Walk with {{LongEach.xml}} module, I hit a 
> failure once we reach the {{Shard.xml}} step:
> {code}
> 16 19:52:05,146 [randomwalk.Framework] ERROR: Error during random walk
> java.lang.Exception: Error running node Shard.xml
>   at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
>   at org.apache.accumulo.test.randomwalk.Framework.run(Framework.java:59)
>   at 
> org.apache.accumulo.test.randomwalk.Framework.main(Framework.java:119)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.accumulo.start.Main$2.run(Main.java:157)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Error running node shard.BulkInsert
>   at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
>   at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:283)
>   at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:278)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at 
> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
>   ... 1 more
> Caused by: java.lang.Exception: Failed to run map/red verify
>   at 
> org.apache.accumulo.test.randomwalk.shard.BulkInsert.sort(BulkInsert.java:186)
>   at 
> org.apache.accumulo.test.randomwalk.shard.BulkInsert.visit(BulkInsert.java:132)
>   ... 9 more
> {code}
> Digging into YARN to see why the MR job became unhappy, I see the following:
> {code}
> Error: java.lang.ClassNotFoundException: 
> org.apache.commons.math.stat.descriptive.SummaryStatistics at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:366) at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:355) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> java.net.URLClassLoader.findClass(URLClassLoader.java:354) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:425) at 
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:358) at 
> org.apache.accumulo.core.file.rfile.RFile$Writer.(RFile.java:310) at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:127)
>  at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:106)
>  at 
> org.apache.accumulo.core.file.DispatchingFileFactory.openWriter(DispatchingFileFactory.java:78)
>  at 
> org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:172)
>  at 
> org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:152)
>  at 
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:558)
>  at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>  at 
> org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105)
>  at org.apache.hadoop.mapreduce.Reducer.reduce(Reducer.java:150) at 
> org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) at 
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at 
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:415) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 
> {code}
> 

[jira] [Commented] (ACCUMULO-4467) Random Walk broken because of unmet dependency on commons-math

2016-09-21 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510957#comment-15510957
 ] 

Sean Busbey commented on ACCUMULO-4467:
---

My plan for this was to just fix the immediate problem of RandomWalk not 
including commons-math, since AFAICT we hadn't centralized the needed 
dependencies anywhere. I'm not sure of the full needed scope of that command 
(e.g. it probably needs to handle classpath for configured iterators for 
offline table scans) so I'd much rather that kind of improvement go in a follow 
on.

> Random Walk broken because of unmet dependency on commons-math
> --
>
> Key: ACCUMULO-4467
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4467
> Project: Accumulo
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.6.6, 1.7.2
>Reporter: Dima Spivak
>Assignee: Sean Busbey
> Fix For: 1.7.3, 1.8.1, 2.0.0
>
>
> When trying to run the Random Walk with {{LongEach.xml}} module, I hit a 
> failure once we reach the {{Shard.xml}} step:
> {code}
> 16 19:52:05,146 [randomwalk.Framework] ERROR: Error during random walk
> java.lang.Exception: Error running node Shard.xml
>   at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
>   at org.apache.accumulo.test.randomwalk.Framework.run(Framework.java:59)
>   at 
> org.apache.accumulo.test.randomwalk.Framework.main(Framework.java:119)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.accumulo.start.Main$2.run(Main.java:157)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Error running node shard.BulkInsert
>   at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
>   at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:283)
>   at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:278)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at 
> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
>   ... 1 more
> Caused by: java.lang.Exception: Failed to run map/red verify
>   at 
> org.apache.accumulo.test.randomwalk.shard.BulkInsert.sort(BulkInsert.java:186)
>   at 
> org.apache.accumulo.test.randomwalk.shard.BulkInsert.visit(BulkInsert.java:132)
>   ... 9 more
> {code}
> Digging into YARN to see why the MR job became unhappy, I see the following:
> {code}
> Error: java.lang.ClassNotFoundException: 
> org.apache.commons.math.stat.descriptive.SummaryStatistics at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:366) at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:355) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> java.net.URLClassLoader.findClass(URLClassLoader.java:354) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:425) at 
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:358) at 
> org.apache.accumulo.core.file.rfile.RFile$Writer.(RFile.java:310) at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:127)
>  at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:106)
>  at 
> org.apache.accumulo.core.file.DispatchingFileFactory.openWriter(DispatchingFileFactory.java:78)
>  at 
> org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:172)
>  at 
> org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:152)
>  at 
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:558)
>  at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>  at 
> org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105)
>  at org.apache.hadoop.mapreduce.Reducer.reduce(Reducer.java:150) at 
> org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) at 
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at 
> 

[jira] [Commented] (ACCUMULO-4467) Random Walk broken because of unmet dependency on commons-math

2016-09-21 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510904#comment-15510904
 ] 

Josh Elser commented on ACCUMULO-4467:
--

Ack, I forgot to write this up yesterday. I was musing about how to fix this 
once and for all. I think I stole the idea from {{hbase mapredcp}}. We can 
encapsulate what our runtime dependencies are for mapreduce in one place, and 
replace all other occurrences with a call to {{accumulo mapredcp}}. I would 
guess that you probably had the same though though, [~busbey] :)

> Random Walk broken because of unmet dependency on commons-math
> --
>
> Key: ACCUMULO-4467
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4467
> Project: Accumulo
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.6.6, 1.7.2
>Reporter: Dima Spivak
>Assignee: Sean Busbey
> Fix For: 1.7.3, 1.8.1, 2.0.0
>
>
> When trying to run the Random Walk with {{LongEach.xml}} module, I hit a 
> failure once we reach the {{Shard.xml}} step:
> {code}
> 16 19:52:05,146 [randomwalk.Framework] ERROR: Error during random walk
> java.lang.Exception: Error running node Shard.xml
>   at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
>   at org.apache.accumulo.test.randomwalk.Framework.run(Framework.java:59)
>   at 
> org.apache.accumulo.test.randomwalk.Framework.main(Framework.java:119)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.accumulo.start.Main$2.run(Main.java:157)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Error running node shard.BulkInsert
>   at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
>   at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:283)
>   at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:278)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at 
> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
>   ... 1 more
> Caused by: java.lang.Exception: Failed to run map/red verify
>   at 
> org.apache.accumulo.test.randomwalk.shard.BulkInsert.sort(BulkInsert.java:186)
>   at 
> org.apache.accumulo.test.randomwalk.shard.BulkInsert.visit(BulkInsert.java:132)
>   ... 9 more
> {code}
> Digging into YARN to see why the MR job became unhappy, I see the following:
> {code}
> Error: java.lang.ClassNotFoundException: 
> org.apache.commons.math.stat.descriptive.SummaryStatistics at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:366) at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:355) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> java.net.URLClassLoader.findClass(URLClassLoader.java:354) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:425) at 
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:358) at 
> org.apache.accumulo.core.file.rfile.RFile$Writer.(RFile.java:310) at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:127)
>  at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:106)
>  at 
> org.apache.accumulo.core.file.DispatchingFileFactory.openWriter(DispatchingFileFactory.java:78)
>  at 
> org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:172)
>  at 
> org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:152)
>  at 
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:558)
>  at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>  at 
> org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105)
>  at org.apache.hadoop.mapreduce.Reducer.reduce(Reducer.java:150) at 
> org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) at 
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at 
> 

[jira] [Assigned] (ACCUMULO-4467) Random Walk broken because of unmet dependency on commons-math

2016-09-21 Thread Sean Busbey (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey reassigned ACCUMULO-4467:
-

Assignee: Sean Busbey

> Random Walk broken because of unmet dependency on commons-math
> --
>
> Key: ACCUMULO-4467
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4467
> Project: Accumulo
>  Issue Type: Bug
>  Components: test
>Affects Versions: 1.6.6, 1.7.2
>Reporter: Dima Spivak
>Assignee: Sean Busbey
> Fix For: 1.7.3, 1.8.1, 2.0.0
>
>
> When trying to run the Random Walk with {{LongEach.xml}} module, I hit a 
> failure once we reach the {{Shard.xml}} step:
> {code}
> 16 19:52:05,146 [randomwalk.Framework] ERROR: Error during random walk
> java.lang.Exception: Error running node Shard.xml
>   at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
>   at org.apache.accumulo.test.randomwalk.Framework.run(Framework.java:59)
>   at 
> org.apache.accumulo.test.randomwalk.Framework.main(Framework.java:119)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at org.apache.accumulo.start.Main$2.run(Main.java:157)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.Exception: Error running node shard.BulkInsert
>   at org.apache.accumulo.test.randomwalk.Module.visit(Module.java:346)
>   at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:283)
>   at org.apache.accumulo.test.randomwalk.Module$1.call(Module.java:278)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at 
> org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
>   ... 1 more
> Caused by: java.lang.Exception: Failed to run map/red verify
>   at 
> org.apache.accumulo.test.randomwalk.shard.BulkInsert.sort(BulkInsert.java:186)
>   at 
> org.apache.accumulo.test.randomwalk.shard.BulkInsert.visit(BulkInsert.java:132)
>   ... 9 more
> {code}
> Digging into YARN to see why the MR job became unhappy, I see the following:
> {code}
> Error: java.lang.ClassNotFoundException: 
> org.apache.commons.math.stat.descriptive.SummaryStatistics at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:366) at 
> java.net.URLClassLoader$1.run(URLClassLoader.java:355) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> java.net.URLClassLoader.findClass(URLClassLoader.java:354) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:425) at 
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at 
> java.lang.ClassLoader.loadClass(ClassLoader.java:358) at 
> org.apache.accumulo.core.file.rfile.RFile$Writer.(RFile.java:310) at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:127)
>  at 
> org.apache.accumulo.core.file.rfile.RFileOperations.openWriter(RFileOperations.java:106)
>  at 
> org.apache.accumulo.core.file.DispatchingFileFactory.openWriter(DispatchingFileFactory.java:78)
>  at 
> org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:172)
>  at 
> org.apache.accumulo.core.client.mapreduce.AccumuloFileOutputFormat$1.write(AccumuloFileOutputFormat.java:152)
>  at 
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:558)
>  at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>  at 
> org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105)
>  at org.apache.hadoop.mapreduce.Reducer.reduce(Reducer.java:150) at 
> org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171) at 
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) at 
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) at 
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:415) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) 
> {code}
> It looks like [this 
> 

[jira] [Updated] (ACCUMULO-1266) Automatically determine when a full major compaction would benefit scans

2016-09-21 Thread Michael Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Miller updated ACCUMULO-1266:
-
Assignee: (was: Michael Miller)

> Automatically determine when a full major compaction would benefit scans
> 
>
> Key: ACCUMULO-1266
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1266
> Project: Accumulo
>  Issue Type: New Feature
>Reporter: Keith Turner
>
> For the following situation, there is a tipping point where it becomes 
> beneficial to do a full major compaction.
>  * a tablet is frequently scanned
>  * scan time iterators supress a lot of data
>  * a full major compaction would also supress that data 
> Examples of this are tablets with lots of deletes, versions that are 
> suppressed, data thats combined, and data thats filtered.   
> If tablet servers kept track of statistics about scans, could this be used to 
> determine when its beneficial to automatically compact?  In the following 
> simple example, it seems obvious that a major compaction would be beneficial. 
> In this example scans over the last hour have had to examine and throw away 
> 20 million uneeded keys.  Alot of scan work could have been saved by doing a 
> major compaction.
>  * all scans over tabletA within the last hour have read 30 million keys and 
> returned 10 million keys 
>  * TabletA has 3 million keys
>  * a major compaction would reduce tabletA to 1 million keys and result in 
> future scans returning all keys read.
> One complicating factor is that major compaction may have a different set of 
> iterators configured.  Therefore its possible that scan may filter a lot of 
> data, and major compactions may not.   Could possibly keep track of ratio of 
> data dropped by compactions and the ratio of data dropped by scans.  This 
> could be used when deciding if a major compaction should be done to improve 
> scan performance.
> What other situation can cause unnecessary major compactions and need to be 
> defended against?
> In the case where a compaction of just the data in memory would benefit 
> scans, ACCUMULO-519 may solve the problem that this ticket is looking to 
> solve.
> So what should the formula be?  
> {code:java}
>   // k/v : key values
>   // recentlyRead: total number of k/v read before applying iterators by 
> recent scans (recentlyRead - recentlyDropped equals # of k/v returned to 
> users)
>   // majcDropRatio   : ratio of k/v dropped by recent major compactions
>   // totalKeyValues  : total # of k/v in tablet
>   // R a user configurable ratio, like the current major compaction ratio 
> that is based on files
>   if((recentlyRead * majcDropRatio > R * totalKeyValues)){
>  doFullMajorCompaction()
>  resetScanStats()
>   }
> {code}
> The example formula above has an issue, it may initiate a major compaction 
> when scans are not reading a part of the tablet that drops data.  The formula 
> below tries to remedy this.
> {code:java}
>   // k/v : key values
>   // recentlyDropped : number of k/v dropped by recent scans
>   // recentlyRead: total number of k/v read before applying iterators by 
> recent scans (recentlyRead - recentlyDropped equals # of k/v returned to 
> users)
>   // majcDropRatio   : ratio of k/v dropped by recent major compactions
>   // totalKeyValues  : total # of k/v in tablet
>   // R a user configurable ratio, like the current major compaction ratio 
> that is based on files
>   if((recentlyDropped > R * totalKeyValues) && (recentlyRead * majcDropRatio 
> > R * totalKeyValues)){
>  doFullMajorCompaction()
>  resetScanStats()
>   }
> {code}
> An issue with the above is that the total # of key values for a tablet may 
> not be accurate because of bulk import and splits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ACCUMULO-4415) Tracer requires instance.secret

2016-09-21 Thread Christopher Tubbs (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher Tubbs updated ACCUMULO-4415:

Fix Version/s: (was: 1.8.1)
   2.0.0

> Tracer requires instance.secret
> ---
>
> Key: ACCUMULO-4415
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4415
> Project: Accumulo
>  Issue Type: Bug
>  Components: trace
>Reporter: Christopher Tubbs
> Fix For: 2.0.0
>
>
> Tracer incorrectly uses instance.secret for its /tracers area in ZooKeeper.
> The tracer does not use the Accumulo system credentials, and instead uses a 
> specific tracer username and password. It should also not use the 
> instance.secret (which is for the system credentials).
> A side effect of this bug is that ChangeSecret does not update the /tracers 
> ACLs in ZooKeeper, preventing the tracer from working entirely after the 
> instance.secret is changed.
> The following error will be seen in the monitor after the ChangeSecret tool 
> is run.
> {code}
> Thread 'tracer' died.
>   org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = 
> NoAuth for /tracers/trace-
>   at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>   at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>   at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
>   at 
> org.apache.accumulo.fate.zookeeper.ZooUtil.putEphemeralSequential(ZooUtil.java:464)
>   at 
> org.apache.accumulo.fate.zookeeper.ZooReaderWriter.putEphemeralSequential(ZooReaderWriter.java:99)
>   at 
> org.apache.accumulo.tracer.TraceServer.registerInZooKeeper(TraceServer.java:318)
>   at 
> org.apache.accumulo.tracer.TraceServer.(TraceServer.java:255)
>   at 
> org.apache.accumulo.tracer.TraceServer.main(TraceServer.java:360)
>   at 
> org.apache.accumulo.tracer.TracerExecutable.execute(TracerExecutable.java:33)
>   at org.apache.accumulo.start.Main$1.run(Main.java:120)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> This affects at least the current 1.8 branch (1.8.0-SNAPSHOT), but I haven't 
> checked earlier versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4415) Tracer requires instance.secret

2016-09-21 Thread Christopher Tubbs (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510812#comment-15510812
 ] 

Christopher Tubbs commented on ACCUMULO-4415:
-

I think we shouldn't try to fix this before 2.0... it's too complicated. 
Somebody can change the secret using the workaround above... or more simply do 
the following after using ChangeSecret: kill the tracer, delete /tracers, 
restart the tracer.

For 2.0+, we can (maybe) provide a {{tracer.secret}}, which defaults to 
{{instance.secret}} (with a warning), and provide a new {{ChangeTracerSecret}} 
tool.

> Tracer requires instance.secret
> ---
>
> Key: ACCUMULO-4415
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4415
> Project: Accumulo
>  Issue Type: Bug
>  Components: trace
>Reporter: Christopher Tubbs
> Fix For: 2.0.0
>
>
> Tracer incorrectly uses instance.secret for its /tracers area in ZooKeeper.
> The tracer does not use the Accumulo system credentials, and instead uses a 
> specific tracer username and password. It should also not use the 
> instance.secret (which is for the system credentials).
> A side effect of this bug is that ChangeSecret does not update the /tracers 
> ACLs in ZooKeeper, preventing the tracer from working entirely after the 
> instance.secret is changed.
> The following error will be seen in the monitor after the ChangeSecret tool 
> is run.
> {code}
> Thread 'tracer' died.
>   org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = 
> NoAuth for /tracers/trace-
>   at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>   at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>   at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
>   at 
> org.apache.accumulo.fate.zookeeper.ZooUtil.putEphemeralSequential(ZooUtil.java:464)
>   at 
> org.apache.accumulo.fate.zookeeper.ZooReaderWriter.putEphemeralSequential(ZooReaderWriter.java:99)
>   at 
> org.apache.accumulo.tracer.TraceServer.registerInZooKeeper(TraceServer.java:318)
>   at 
> org.apache.accumulo.tracer.TraceServer.(TraceServer.java:255)
>   at 
> org.apache.accumulo.tracer.TraceServer.main(TraceServer.java:360)
>   at 
> org.apache.accumulo.tracer.TracerExecutable.execute(TracerExecutable.java:33)
>   at org.apache.accumulo.start.Main$1.run(Main.java:120)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> This affects at least the current 1.8 branch (1.8.0-SNAPSHOT), but I haven't 
> checked earlier versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4415) Tracer requires instance.secret

2016-09-21 Thread Christopher Tubbs (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510790#comment-15510790
 ] 

Christopher Tubbs commented on ACCUMULO-4415:
-

I'm not sure we can do anything about this right now. It's not really safe for 
ChangeSecret to change /tracers (because that could be in use by more than one 
cluster). Maybe we should just document it as a known issue until we decide 
where the tracer service is ultimately going to live.

> Tracer requires instance.secret
> ---
>
> Key: ACCUMULO-4415
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4415
> Project: Accumulo
>  Issue Type: Bug
>  Components: trace
>Reporter: Christopher Tubbs
> Fix For: 1.8.1
>
>
> Tracer incorrectly uses instance.secret for its /tracers area in ZooKeeper.
> The tracer does not use the Accumulo system credentials, and instead uses a 
> specific tracer username and password. It should also not use the 
> instance.secret (which is for the system credentials).
> A side effect of this bug is that ChangeSecret does not update the /tracers 
> ACLs in ZooKeeper, preventing the tracer from working entirely after the 
> instance.secret is changed.
> The following error will be seen in the monitor after the ChangeSecret tool 
> is run.
> {code}
> Thread 'tracer' died.
>   org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = 
> NoAuth for /tracers/trace-
>   at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>   at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>   at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
>   at 
> org.apache.accumulo.fate.zookeeper.ZooUtil.putEphemeralSequential(ZooUtil.java:464)
>   at 
> org.apache.accumulo.fate.zookeeper.ZooReaderWriter.putEphemeralSequential(ZooReaderWriter.java:99)
>   at 
> org.apache.accumulo.tracer.TraceServer.registerInZooKeeper(TraceServer.java:318)
>   at 
> org.apache.accumulo.tracer.TraceServer.(TraceServer.java:255)
>   at 
> org.apache.accumulo.tracer.TraceServer.main(TraceServer.java:360)
>   at 
> org.apache.accumulo.tracer.TracerExecutable.execute(TracerExecutable.java:33)
>   at org.apache.accumulo.start.Main$1.run(Main.java:120)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> This affects at least the current 1.8 branch (1.8.0-SNAPSHOT), but I haven't 
> checked earlier versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4468) accumulo.core.data.Key.equals(Key, PartialKey) improvement

2016-09-21 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510743#comment-15510743
 ] 

Josh Elser commented on ACCUMULO-4468:
--

{quote}
bq.Do you have other examples of where this might be used in a tight loop?

I think there are lots of other examples in Accumulo itself; for example, in 
iterators.system.DeletingIterator and iterators.user.TransformingIterator, this 
method is called in a loop. In our use case, we're doing a similar thing: 
building a larger object out of multiple rows, by finding groups of rows which 
are equal under ROW_COLFAM. When each object is built from only a few rows, the 
CF equality comparison returns false pretty often (which is to be expected), 
but only after comparing row IDs, which are always the same in practice.
{quote}

Perhaps my concern didn't come across. From my perspective: I am concerned with 
a performance change that makes one case better. We need to understand if the 
case you outlined is "the norm" or "the exception". I was hoping you had 
context on this.

{quote}
bq.  How did you test this? What types of numbers did you see?

I haven't been able to install it on a cluster to test. The test suite does 
pass with this patch applied. I think it's a minor change; in the "rows are 
equal" case the same amount of work is done as with the existing code, although 
the parts are accessed in the opposite order. They're still compared 
mostly-in-order, as isEqual does, but the comment in that function was 
inspiration to try reversing the comparison order.

Aside from performance, the code seems cleaner to me: there's no more 
repetition of e.g. the check of row equality. The bytecode (with Oracle javac 
1.8.0_92) is substantially smaller: 389 bytes versus 167.
{quote}

At risk of being anti-social, I am -1 on any change for performance without 
numbers coming with it. There are many great tools out there to benchmark 
changes and don't necessarily require use of a cluster (it might actually be 
harder to test on a cluster). 
[JMH|http://openjdk.java.net/projects/code-tools/jmh/] and [Google 
Caliper|https://github.com/google/caliper] are two good tools for 
micro-benchmarking.

> accumulo.core.data.Key.equals(Key, PartialKey) improvement
> --
>
> Key: ACCUMULO-4468
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4468
> Project: Accumulo
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.8.0
>Reporter: Will Murnane
>Priority: Trivial
>  Labels: newbie, performance
> Attachments: key_comparison.patch
>
>
> In the Key.equals(Key, PartialKey) overload, the current method compares 
> starting at the beginning of the key, and works its way toward the end. This 
> functions correctly, of course, but one of the typical uses of this method is 
> to compare adjacent rows to break them into larger chunks. For example, 
> accumulo.core.iterators.Combiner repeatedly calls this method with subsequent 
> pairs of keys.
> I have a patch which reverses the comparison order. That is, if the method is 
> called with ROW_COLFAM_COLQUAL_COLVIS, it will compare visibility, cq, cf, 
> and finally row. This (marginally) improves the speed of comparisons in the 
> relatively common case where only the last part is changing, with less 
> complex code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (ACCUMULO-4468) accumulo.core.data.Key.equals(Key, PartialKey) improvement

2016-09-21 Thread Christopher Tubbs (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510736#comment-15510736
 ] 

Christopher Tubbs edited comment on ACCUMULO-4468 at 9/21/16 6:18 PM:
--

Hi [~wmurnane]. Thanks for the patch!

After reviewing, I had a few comments:

# This may speed things up only if the probability of not being equal is 
greater in the lower dimensions of the key than the higher ones. I'm not sure 
this is the case, as I don't have a strong sense of when this method is called, 
or how frequently. What analysis have you done to determine these relative 
probabilities?
# Have you run any experiments to determine the performance differences in 
various use cases?
# If this method is primarily used in user code, would a new method to compare 
in reverse order be better, to account for both cases? Perhaps it'd be better 
to optimize the Combiner code, rather than change the default behavior for all 
cases?
# I think relying on fall-through behavior in switch statements should be 
avoided. It's prone to error, especially as code is refactored over time. I 
think it's better to avoid it than to suppress the warning. This may be a style 
choice, but it's a preference that the java compiler weighs in on (by making 
fallthrough a default compiler warning), and I'd prefer to avoid behavior which 
results in compiler warnings whenever possible.



was (Author: ctubbsii):
Hi [~wmurnane]. Thanks for the patch!

After reviewing, I had a few comments:

1. This may speed things up only if the probability of not being equal is 
greater in the lower dimensions of the key than the higher ones. I'm not sure 
this is the case, as I don't have a strong sense of when this method is called, 
or how frequently. What analysis have you done to determine these relative 
probabilities?
2. Have you run any experiments to determine the performance differences in 
various use cases?
3. If this method is primarily used in user code, would a new method to compare 
in reverse order be better, to account for both cases? Perhaps it'd be better 
to optimize the Combiner code, rather than change the default behavior for all 
cases?
4. I think relying on fall-through behavior in switch statements should be 
avoided. It's prone to error, especially as code is refactored over time. I 
think it's better to avoid it than to suppress the warning. This may be a style 
choice, but it's a preference that the java compiler weighs in on (by making 
fallthrough a default compiler warning), and I'd prefer to avoid behavior which 
results in compiler warnings whenever possible.


> accumulo.core.data.Key.equals(Key, PartialKey) improvement
> --
>
> Key: ACCUMULO-4468
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4468
> Project: Accumulo
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.8.0
>Reporter: Will Murnane
>Priority: Trivial
>  Labels: newbie, performance
> Attachments: key_comparison.patch
>
>
> In the Key.equals(Key, PartialKey) overload, the current method compares 
> starting at the beginning of the key, and works its way toward the end. This 
> functions correctly, of course, but one of the typical uses of this method is 
> to compare adjacent rows to break them into larger chunks. For example, 
> accumulo.core.iterators.Combiner repeatedly calls this method with subsequent 
> pairs of keys.
> I have a patch which reverses the comparison order. That is, if the method is 
> called with ROW_COLFAM_COLQUAL_COLVIS, it will compare visibility, cq, cf, 
> and finally row. This (marginally) improves the speed of comparisons in the 
> relatively common case where only the last part is changing, with less 
> complex code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4468) accumulo.core.data.Key.equals(Key, PartialKey) improvement

2016-09-21 Thread Christopher Tubbs (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510736#comment-15510736
 ] 

Christopher Tubbs commented on ACCUMULO-4468:
-

Hi [~wmurnane]. Thanks for the patch!

After reviewing, I had a few comments:

1. This may speed things up only if the probability of not being equal is 
greater in the lower dimensions of the key than the higher ones. I'm not sure 
this is the case, as I don't have a strong sense of when this method is called, 
or how frequently. What analysis have you done to determine these relative 
probabilities?
2. Have you run any experiments to determine the performance differences in 
various use cases?
3. If this method is primarily used in user code, would a new method to compare 
in reverse order be better, to account for both cases? Perhaps it'd be better 
to optimize the Combiner code, rather than change the default behavior for all 
cases?
4. I think relying on fall-through behavior in switch statements should be 
avoided. It's prone to error, especially as code is refactored over time. I 
think it's better to avoid it than to suppress the warning. This may be a style 
choice, but it's a preference that the java compiler weighs in on (by making 
fallthrough a default compiler warning), and I'd prefer to avoid behavior which 
results in compiler warnings whenever possible.


> accumulo.core.data.Key.equals(Key, PartialKey) improvement
> --
>
> Key: ACCUMULO-4468
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4468
> Project: Accumulo
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.8.0
>Reporter: Will Murnane
>Priority: Trivial
>  Labels: newbie, performance
> Attachments: key_comparison.patch
>
>
> In the Key.equals(Key, PartialKey) overload, the current method compares 
> starting at the beginning of the key, and works its way toward the end. This 
> functions correctly, of course, but one of the typical uses of this method is 
> to compare adjacent rows to break them into larger chunks. For example, 
> accumulo.core.iterators.Combiner repeatedly calls this method with subsequent 
> pairs of keys.
> I have a patch which reverses the comparison order. That is, if the method is 
> called with ROW_COLFAM_COLQUAL_COLVIS, it will compare visibility, cq, cf, 
> and finally row. This (marginally) improves the speed of comparisons in the 
> relatively common case where only the last part is changing, with less 
> complex code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (ACCUMULO-1280) Add close method to iterators

2016-09-21 Thread Michael Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Miller reassigned ACCUMULO-1280:


Assignee: Michael Miller

> Add close method to iterators
> -
>
> Key: ACCUMULO-1280
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1280
> Project: Accumulo
>  Issue Type: Improvement
>Reporter: Keith Turner
>Assignee: Michael Miller
> Fix For: 2.0.0
>
>
> It would be useful if Accumulo iterators had a close method.  Accumulo would 
> call this when its finished using the iterator stack.
> How would this work w/ isolation?
> Is it ok to break the iterator API?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4468) accumulo.core.data.Key.equals(Key, PartialKey) improvement

2016-09-21 Thread Will Murnane (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510679#comment-15510679
 ] 

Will Murnane commented on ACCUMULO-4468:


bq. Do you have other examples of where this might be used in a tight loop?
I think there are lots of other examples in Accumulo itself; for example, in 
iterators.system.DeletingIterator and iterators.user.TransformingIterator, this 
method is called in a loop. In our use case, we're doing a similar thing: 
building a larger object out of multiple rows, by finding groups of rows which 
are equal under ROW_COLFAM. When each object is built from only a few rows, the 
CF equality comparison returns false pretty often (which is to be expected), 
but only after comparing row IDs, which are always the same in practice.

bq. How did you test this? What types of numbers did you see?
I haven't been able to install it on a cluster to test. The test suite does 
pass with this patch applied. I think it's a minor change; in the "rows are 
equal" case the same amount of work is done as with the existing code, although 
the parts are accessed in the opposite order. They're still compared 
mostly-in-order, as isEqual does, but the comment in that function was 
inspiration to try reversing the comparison order.

Aside from performance, the code seems cleaner to me: there's no more 
repetition of e.g. the check of row equality. The bytecode (with Oracle javac 
1.8.0_92) is substantially smaller: 389 bytes versus 167.

bq. Why not consolidate this to:
That's fine with me. I just wrote all the cases to look the same, instead of 
having a "special case" for the last comparison made. If some special work were 
required in the compare-equal case, it could go before the return statement.

> accumulo.core.data.Key.equals(Key, PartialKey) improvement
> --
>
> Key: ACCUMULO-4468
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4468
> Project: Accumulo
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.8.0
>Reporter: Will Murnane
>Priority: Trivial
>  Labels: newbie, performance
> Attachments: key_comparison.patch
>
>
> In the Key.equals(Key, PartialKey) overload, the current method compares 
> starting at the beginning of the key, and works its way toward the end. This 
> functions correctly, of course, but one of the typical uses of this method is 
> to compare adjacent rows to break them into larger chunks. For example, 
> accumulo.core.iterators.Combiner repeatedly calls this method with subsequent 
> pairs of keys.
> I have a patch which reverses the comparison order. That is, if the method is 
> called with ROW_COLFAM_COLQUAL_COLVIS, it will compare visibility, cq, cf, 
> and finally row. This (marginally) improves the speed of comparisons in the 
> relatively common case where only the last part is changing, with less 
> complex code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Accumulo-Pull-Requests - Build # 448 - Unstable

2016-09-21 Thread Apache Jenkins Server
The Apache Jenkins build system has built Accumulo-Pull-Requests (build #448)

Status: Unstable

Check console output at 
https://builds.apache.org/job/Accumulo-Pull-Requests/448/ to view the results.

[jira] [Commented] (ACCUMULO-4468) accumulo.core.data.Key.equals(Key, PartialKey) improvement

2016-09-21 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510541#comment-15510541
 ] 

Josh Elser commented on ACCUMULO-4468:
--

Hi [~wmurnane]. Thanks for the patch.

bq. This functions correctly, of course, but one of the typical uses of this 
method is to compare adjacent rows to break them into larger chunks. For 
example, accumulo.core.iterators.Combiner repeatedly calls this method with 
subsequent pairs of keys.

Do you have other examples of where this might be used in a tight loop? 

bq. This (marginally) improves the speed of comparisons in the relatively 
common case where only the last part is changing, with less complex code.

How did you test this? What types of numbers did you see?

{code}
+  case ROW:
+if (!isEqual(row, other.row))
+  return false;
+break;
   default:
 throw new IllegalArgumentException("Unrecognized partial key 
specification " + part);
 }
+return true;
{code}

Why not consolidate this to:

{code}
case ROW:
  return isEquals(row, other.row);
{code}

> accumulo.core.data.Key.equals(Key, PartialKey) improvement
> --
>
> Key: ACCUMULO-4468
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4468
> Project: Accumulo
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.8.0
>Reporter: Will Murnane
>Priority: Trivial
>  Labels: newbie, performance
> Attachments: key_comparison.patch
>
>
> In the Key.equals(Key, PartialKey) overload, the current method compares 
> starting at the beginning of the key, and works its way toward the end. This 
> functions correctly, of course, but one of the typical uses of this method is 
> to compare adjacent rows to break them into larger chunks. For example, 
> accumulo.core.iterators.Combiner repeatedly calls this method with subsequent 
> pairs of keys.
> I have a patch which reverses the comparison order. That is, if the method is 
> called with ROW_COLFAM_COLQUAL_COLVIS, it will compare visibility, cq, cf, 
> and finally row. This (marginally) improves the speed of comparisons in the 
> relatively common case where only the last part is changing, with less 
> complex code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Accumulo-Pull-Requests - Build # 447 - Still Failing

2016-09-21 Thread Apache Jenkins Server
The Apache Jenkins build system has built Accumulo-Pull-Requests (build #447)

Status: Still Failing

Check console output at 
https://builds.apache.org/job/Accumulo-Pull-Requests/447/ to view the results.

Accumulo-Pull-Requests - Build # 446 - Still Failing

2016-09-21 Thread Apache Jenkins Server
The Apache Jenkins build system has built Accumulo-Pull-Requests (build #446)

Status: Still Failing

Check console output at 
https://builds.apache.org/job/Accumulo-Pull-Requests/446/ to view the results.

Accumulo-Pull-Requests - Build # 445 - Failure

2016-09-21 Thread Apache Jenkins Server
The Apache Jenkins build system has built Accumulo-Pull-Requests (build #445)

Status: Failure

Check console output at 
https://builds.apache.org/job/Accumulo-Pull-Requests/445/ to view the results.

[jira] [Commented] (ACCUMULO-4415) Tracer requires instance.secret

2016-09-21 Thread Michael Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510239#comment-15510239
 ] 

Michael Miller commented on ACCUMULO-4415:
--

I'd like to try and resolve this issue, since I helped bring it to light. Did 
you folks agree on a solution? I can't really decipher the outcome from the 
discussion...

> Tracer requires instance.secret
> ---
>
> Key: ACCUMULO-4415
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4415
> Project: Accumulo
>  Issue Type: Bug
>  Components: trace
>Reporter: Christopher Tubbs
> Fix For: 1.8.1
>
>
> Tracer incorrectly uses instance.secret for its /tracers area in ZooKeeper.
> The tracer does not use the Accumulo system credentials, and instead uses a 
> specific tracer username and password. It should also not use the 
> instance.secret (which is for the system credentials).
> A side effect of this bug is that ChangeSecret does not update the /tracers 
> ACLs in ZooKeeper, preventing the tracer from working entirely after the 
> instance.secret is changed.
> The following error will be seen in the monitor after the ChangeSecret tool 
> is run.
> {code}
> Thread 'tracer' died.
>   org.apache.zookeeper.KeeperException$NoAuthException: KeeperErrorCode = 
> NoAuth for /tracers/trace-
>   at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:113)
>   at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>   at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
>   at 
> org.apache.accumulo.fate.zookeeper.ZooUtil.putEphemeralSequential(ZooUtil.java:464)
>   at 
> org.apache.accumulo.fate.zookeeper.ZooReaderWriter.putEphemeralSequential(ZooReaderWriter.java:99)
>   at 
> org.apache.accumulo.tracer.TraceServer.registerInZooKeeper(TraceServer.java:318)
>   at 
> org.apache.accumulo.tracer.TraceServer.(TraceServer.java:255)
>   at 
> org.apache.accumulo.tracer.TraceServer.main(TraceServer.java:360)
>   at 
> org.apache.accumulo.tracer.TracerExecutable.execute(TracerExecutable.java:33)
>   at org.apache.accumulo.start.Main$1.run(Main.java:120)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> This affects at least the current 1.8 branch (1.8.0-SNAPSHOT), but I haven't 
> checked earlier versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ACCUMULO-4468) accumulo.core.data.Key.equals(Key, PartialKey) improvement

2016-09-21 Thread Will Murnane (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Will Murnane updated ACCUMULO-4468:
---
Attachment: key_comparison.patch

> accumulo.core.data.Key.equals(Key, PartialKey) improvement
> --
>
> Key: ACCUMULO-4468
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4468
> Project: Accumulo
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.8.0
>Reporter: Will Murnane
>Priority: Trivial
>  Labels: newbie, performance
> Attachments: key_comparison.patch
>
>
> In the Key.equals(Key, PartialKey) overload, the current method compares 
> starting at the beginning of the key, and works its way toward the end. This 
> functions correctly, of course, but one of the typical uses of this method is 
> to compare adjacent rows to break them into larger chunks. For example, 
> accumulo.core.iterators.Combiner repeatedly calls this method with subsequent 
> pairs of keys.
> I have a patch which reverses the comparison order. That is, if the method is 
> called with ROW_COLFAM_COLQUAL_COLVIS, it will compare visibility, cq, cf, 
> and finally row. This (marginally) improves the speed of comparisons in the 
> relatively common case where only the last part is changing, with less 
> complex code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ACCUMULO-4468) accumulo.core.data.Key.equals(Key, PartialKey) improvement

2016-09-21 Thread Will Murnane (JIRA)
Will Murnane created ACCUMULO-4468:
--

 Summary: accumulo.core.data.Key.equals(Key, PartialKey) improvement
 Key: ACCUMULO-4468
 URL: https://issues.apache.org/jira/browse/ACCUMULO-4468
 Project: Accumulo
  Issue Type: Improvement
  Components: core
Affects Versions: 1.8.0
Reporter: Will Murnane
Priority: Trivial


In the Key.equals(Key, PartialKey) overload, the current method compares 
starting at the beginning of the key, and works its way toward the end. This 
functions correctly, of course, but one of the typical uses of this method is 
to compare adjacent rows to break them into larger chunks. For example, 
accumulo.core.iterators.Combiner repeatedly calls this method with subsequent 
pairs of keys.

I have a patch which reverses the comparison order. That is, if the method is 
called with ROW_COLFAM_COLQUAL_COLVIS, it will compare visibility, cq, cf, and 
finally row. This (marginally) improves the speed of comparisons in the 
relatively common case where only the last part is changing, with less complex 
code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)