Accumulo-Pull-Requests - Build # 473 - Aborted

2016-10-20 Thread Apache Jenkins Server
The Apache Jenkins build system has built Accumulo-Pull-Requests (build #473)

Status: Aborted

Check console output at 
https://builds.apache.org/job/Accumulo-Pull-Requests/473/ to view the results.

[jira] [Commented] (ACCUMULO-4468) accumulo.core.data.Key.equals(Key, PartialKey) improvement

2016-10-20 Thread Keith Turner (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15592497#comment-15592497
 ] 

Keith Turner commented on ACCUMULO-4468:


i am in favor of merging this in

> accumulo.core.data.Key.equals(Key, PartialKey) improvement
> --
>
> Key: ACCUMULO-4468
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4468
> Project: Accumulo
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.8.0
>Reporter: Will Murnane
>Priority: Trivial
>  Labels: newbie, performance
> Attachments: benchmark.tar.gz, key_comparison.patch
>
>
> In the Key.equals(Key, PartialKey) overload, the current method compares 
> starting at the beginning of the key, and works its way toward the end. This 
> functions correctly, of course, but one of the typical uses of this method is 
> to compare adjacent rows to break them into larger chunks. For example, 
> accumulo.core.iterators.Combiner repeatedly calls this method with subsequent 
> pairs of keys.
> I have a patch which reverses the comparison order. That is, if the method is 
> called with ROW_COLFAM_COLQUAL_COLVIS, it will compare visibility, cq, cf, 
> and finally row. This (marginally) improves the speed of comparisons in the 
> relatively common case where only the last part is changing, with less 
> complex code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-3944) Tablet attempts to split or major compact after every bulk file import

2016-10-20 Thread Dave Marion (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-3944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15592281#comment-15592281
 ] 

Dave Marion commented on ACCUMULO-3944:
---

I think we should remove the else block from the code in the description. 
Thoughts? Regarding the system being idle and not compacting, maybe the 
TABLE_MAJC_COMPACTALL_IDLETIME property should be lowered?

> Tablet attempts to split or major compact after every bulk file import
> --
>
> Key: ACCUMULO-3944
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3944
> Project: Accumulo
>  Issue Type: Improvement
>  Components: tserver
>Reporter: Eric Newton
>Priority: Blocker
> Fix For: 2.0.0
>
>
> I noticed this bit of code in tablet, after it has bulk imported a file, but 
> before the bulk import is finished:
> {code}
>   if (needsSplit()) {
> getTabletServer().executeSplit(this);
>   } else {
> initiateMajorCompaction(MajorCompactionReason.NORMAL);
>   }
> {code}
> I'm pretty sure we can leave this to the normal tablet server mechanism for 
> deciding when to split or compact.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-2353) Test improvments to java.io.InputStream.seek() for possible Hadoop patch

2016-10-20 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15592238#comment-15592238
 ] 

Josh Elser commented on ACCUMULO-2353:
--

IMO, close this and, if someone wants to follow through with this in Hadoop or 
some JVM vendor, fantastic.

> Test improvments to java.io.InputStream.seek() for possible Hadoop patch
> 
>
> Key: ACCUMULO-2353
> URL: https://issues.apache.org/jira/browse/ACCUMULO-2353
> Project: Accumulo
>  Issue Type: Task
> Environment: Java 6 update 45 or later
> Hadoop 2.2.0
>Reporter: Dave Marion
>Priority: Minor
>
> At some point (early Java 7 I think, then backported to around Java 6 Update 
> 45), the java.io.InputStream.seek() method was changed from reading byte[512] 
> to byte[2048]. The difference can be seen in DeflaterInputStream, which has 
> not been updated:
> {noformat}
> public long skip(long n) throws IOException {
> if (n < 0) {
> throw new IllegalArgumentException("negative skip length");
> }
> ensureOpen();
> // Skip bytes by repeatedly decompressing small blocks
> if (rbuf.length < 512)
> rbuf = new byte[512];
> int total = (int)Math.min(n, Integer.MAX_VALUE);
> long cnt = 0;
> while (total > 0) {
> // Read a small block of uncompressed bytes
> int len = read(rbuf, 0, (total <= rbuf.length ? total : 
> rbuf.length));
> if (len < 0) {
> break;
> }
> cnt += len;
> total -= len;
> }
> return cnt;
> }
> {noformat}
> and java.io.InputStream in Java 6 Update 45:
> {noformat}
> // MAX_SKIP_BUFFER_SIZE is used to determine the maximum buffer skip to
> // use when skipping.
> private static final int MAX_SKIP_BUFFER_SIZE = 2048;
> public long skip(long n) throws IOException {
>   long remaining = n;
>   int nr;
>   if (n <= 0) {
>   return 0;
>   }
>   
>   int size = (int)Math.min(MAX_SKIP_BUFFER_SIZE, remaining);
>   byte[] skipBuffer = new byte[size];
>   while (remaining > 0) {
>   nr = read(skipBuffer, 0, (int)Math.min(size, remaining));
>   
>   if (nr < 0) {
>   break;
>   }
>   remaining -= nr;
>   }
>   
>   return n - remaining;
> }
> {noformat}
> In sample tests I saw about a 20% improvement in skip() when seeking towards 
> the end of a locally cached compressed file. Looking at the 
> DecompressorStream in HDFS, the seek method is a near copy of the old 
> InputStream method:
> {noformat}
>   private byte[] skipBytes = new byte[512];
>   @Override
>   public long skip(long n) throws IOException {
> // Sanity checks
> if (n < 0) {
>   throw new IllegalArgumentException("negative skip length");
> }
> checkStream();
> 
> // Read 'n' bytes
> int skipped = 0;
> while (skipped < n) {
>   int len = Math.min(((int)n - skipped), skipBytes.length);
>   len = read(skipBytes, 0, len);
>   if (len == -1) {
> eof = true;
> break;
>   }
>   skipped += len;
> }
> return skipped;
>   }
> {noformat}
> This task is to evaluate the changes to DecompressorStream with a possible 
> patch to HDFS and possible bug request to Oracle to port the InputStream.seek 
> changes to DeflaterInputStream.seek



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4468) accumulo.core.data.Key.equals(Key, PartialKey) improvement

2016-10-20 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15592208#comment-15592208
 ] 

Josh Elser commented on ACCUMULO-4468:
--

[~kturner], what do you think we should do here?

> accumulo.core.data.Key.equals(Key, PartialKey) improvement
> --
>
> Key: ACCUMULO-4468
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4468
> Project: Accumulo
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.8.0
>Reporter: Will Murnane
>Priority: Trivial
>  Labels: newbie, performance
> Attachments: benchmark.tar.gz, key_comparison.patch
>
>
> In the Key.equals(Key, PartialKey) overload, the current method compares 
> starting at the beginning of the key, and works its way toward the end. This 
> functions correctly, of course, but one of the typical uses of this method is 
> to compare adjacent rows to break them into larger chunks. For example, 
> accumulo.core.iterators.Combiner repeatedly calls this method with subsequent 
> pairs of keys.
> I have a patch which reverses the comparison order. That is, if the method is 
> called with ROW_COLFAM_COLQUAL_COLVIS, it will compare visibility, cq, cf, 
> and finally row. This (marginally) improves the speed of comparisons in the 
> relatively common case where only the last part is changing, with less 
> complex code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ACCUMULO-4500) Implement visibility histograms as a table feature

2016-10-20 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15592188#comment-15592188
 ] 

Josh Elser commented on ACCUMULO-4500:
--

bq. A histogram has particular semantics where different values of a single 
variable are on one axis, and the frequency of those values are shown on the 
other. What we have here is more general than even that, because we're not 
necessarily referring to a single variable, nor do the magnitudes necessarily 
represent frequencies or anything like frequencies. Calling it a histogram 
implies semantics we don't necessarily need to impose.

I really don't want to get into an argument over semantics.

bq. What we have here is more general than even that, because we're not 
necessarily referring to a single variable

I don't know this means.

{code}
public NamedCounters getCounters(Range range, Function combiner);
{code}

I still don't understand what the {{Function}} is accomplishing. 
Combination of numbers is (typically) addition. Why do we need a function here? 
Also, this doesn't let me fetch just the visibility data. If there are multiple 
types of counters stored in the rfiles, how do I refer to just one? How do I 
know which exist for some table?

bq. Re passing in an instance of a Function, how can we run the function on 
remote JVMs? We could take the class name from the instance and use that to 
instantiate on remote JVMs

This is 110% what I wanted to avoid. What was suggested as a *simple* database 
primitive is now being exploded into a "wholly generalized user-configured 
framework". I have no interest in working on that. If either of you do, I'm 
happy to help sketch out such a system, but I will lose all interest in 
building this simple feature if it's being stipulated that I have to build 
something so largely different to build the proposed *simple* feature.

> Implement visibility histograms as a table feature
> --
>
> Key: ACCUMULO-4500
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4500
> Project: Accumulo
>  Issue Type: New Feature
>  Components: client, tserver
>Reporter: Josh Elser
>
> Add support to quickly extract a histogram of all of the visibilities stored 
> in an Accumulo table.
> DISCUSS: 
> https://lists.apache.org/thread.html/df5e764362a95277344fd2731a432e9fafc60595e7d30015d9a56b9c@%3Cdev.accumulo.apache.org%3E



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (ACCUMULO-1480) Organize tables on monitor page by namespace

2016-10-20 Thread Luis Tavarez (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luis Tavarez reassigned ACCUMULO-1480:
--

Assignee: Luis Tavarez

> Organize tables on monitor page by namespace
> 
>
> Key: ACCUMULO-1480
> URL: https://issues.apache.org/jira/browse/ACCUMULO-1480
> Project: Accumulo
>  Issue Type: Improvement
>  Components: monitor
>Reporter: Christopher Tubbs
>Assignee: Luis Tavarez
> Fix For: 2.0.0
>
>
> Tables in the monitor page should be organized by their namespace prefix. 
> Either separate tabs per namespace, with a tab for "All", or a filter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (ACCUMULO-3470) Upgrade to Commons VFS 2.1

2016-10-20 Thread Christopher Tubbs (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-3470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher Tubbs updated ACCUMULO-3470:

Fix Version/s: (was: 1.6.6)

> Upgrade to Commons VFS 2.1
> --
>
> Key: ACCUMULO-3470
> URL: https://issues.apache.org/jira/browse/ACCUMULO-3470
> Project: Accumulo
>  Issue Type: Task
>Reporter: Dave Marion
>Assignee: Dave Marion
> Fix For: 1.7.2, 1.8.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Commons VFS 2.1 is nearing release. When released we need to remove the VFS 
> related classes in the start module, update the imports, and update the 
> version in the pom. Will set fixVersions when VFS is released.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)