Accumulo-1.7 - Build # 358 - Unstable

2017-06-27 Thread Apache Jenkins Server
The Apache Jenkins build system has built Accumulo-1.7 (build #358)

Status: Unstable

Check console output at https://builds.apache.org/job/Accumulo-1.7/358/ to view 
the results.

[jira] [Resolved] (ACCUMULO-4666) Clean up assertions on construction of KerberosToken

2017-06-27 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved ACCUMULO-4666.
--
Resolution: Fixed

> Clean up assertions on construction of KerberosToken
> 
>
> Key: ACCUMULO-4666
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4666
> Project: Accumulo
>  Issue Type: Sub-task
>  Components: client
>Reporter: Josh Elser
>Assignee: Josh Elser
> Fix For: 1.7.4, 1.8.2, 2.0.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Follow-on from ACCUMULO-4665 per 
> https://github.com/apache/accumulo/pull/273#discussion_r124141590
> We can do a better verification on construction of the KerberosToken and also 
> improve the javadoc to make the implementation details a bit more apparent to 
> the user.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (ACCUMULO-4668) Remove/update 1.6.6 on https://accumulo.apache.org/downloads/

2017-06-27 Thread Josh Elser (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Elser resolved ACCUMULO-4668.
--
Resolution: Fixed

> Remove/update 1.6.6 on https://accumulo.apache.org/downloads/
> -
>
> Key: ACCUMULO-4668
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4668
> Project: Accumulo
>  Issue Type: Task
>  Components: website
>Reporter: Josh Elser
>Assignee: Josh Elser
>Priority: Minor
>
> 1.6.x development is dead, but we still list it in the downloads page. I 
> think we should remove it completely to dissuade people from using it.
> If people don't like this, I'm also happy to add a disclaimer on the 
> downloads page instructing them to not use it "at their own risk"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ACCUMULO-4667) LocalityGroupIterator very inefficient with large locality groups

2017-06-27 Thread Ivan Bella (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065362#comment-16065362
 ] 

Ivan Bella commented on ACCUMULO-4667:
--

[~kturner] You are correct.  I believe that is what the count is used for in 
the map passed into the seek call.  I will used that to pre-filter the locality 
groups as is currently being done in the seek.

> LocalityGroupIterator very inefficient with large locality groups
> -
>
> Key: ACCUMULO-4667
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4667
> Project: Accumulo
>  Issue Type: Improvement
>  Components: tserver
>Affects Versions: 1.6.6, 1.7.3, 1.8.1, 2.0.0
>Reporter: Ivan Bella
>Assignee: Ivan Bella
> Fix For: 1.8.2, 2.0.0
>
>
> On one of our systems we tracked some scans that were taking an extremely 
> long time to complete (many hours).  As it turns out the scan was relatively 
> simple in that it was scanning a tablet for all keys that had a specific 
> column family.  Note that there was very little data that actually matched 
> this column familiy.  Upon tracing the code we found that it was spending a 
> large amount of time in the LocalityGroupIterator.  Stack traces continually 
> found the code to be at line 128 or 129 of the LocalityGroupIterator.  Those 
> line numbers are consistent from the 1.6 series all the way to 2.0.0 
> (master).  In this case the column family being searched for was included in 
> one of a dozen or so locality groups on that table, and the locality group 
> itself had 40 or so column families.  We see several things that can be done 
> here:
> 1) The code that checks the group column families against those being 
> searched for can quickly exit once if finds a match
> 2) The code that checks the group column families against those being 
> searched for can look at the relative size of those two groups an invert the 
> logic appropriately for a more efficient loop.
> 3) We could create a cached map of column families to locality groups 
> allowing us to avoid examining each locality group every time we seek.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ACCUMULO-4668) Remove/update 1.6.6 on https://accumulo.apache.org/downloads/

2017-06-27 Thread Josh Elser (JIRA)
Josh Elser created ACCUMULO-4668:


 Summary: Remove/update 1.6.6 on 
https://accumulo.apache.org/downloads/
 Key: ACCUMULO-4668
 URL: https://issues.apache.org/jira/browse/ACCUMULO-4668
 Project: Accumulo
  Issue Type: Task
  Components: website
Reporter: Josh Elser
Assignee: Josh Elser
Priority: Minor


1.6.x development is dead, but we still list it in the downloads page. I think 
we should remove it completely to dissuade people from using it.

If people don't like this, I'm also happy to add a disclaimer on the downloads 
page instructing them to not use it "at their own risk"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ACCUMULO-4667) LocalityGroupIterator very inefficient with large locality groups

2017-06-27 Thread Keith Turner (JIRA)

[ 
https://issues.apache.org/jira/browse/ACCUMULO-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16065064#comment-16065064
 ] 

Keith Turner commented on ACCUMULO-4667:


A column family can exist in a locality group but not be present in a RFile.  
It seems the code is filtering column familes that are not present at seek 
time.  Maybe this filtering could be done once when the LocalityGroup is 
opened.  

> LocalityGroupIterator very inefficient with large locality groups
> -
>
> Key: ACCUMULO-4667
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4667
> Project: Accumulo
>  Issue Type: Improvement
>  Components: tserver
>Affects Versions: 1.6.6, 1.7.3, 1.8.1, 2.0.0
>Reporter: Ivan Bella
>Assignee: Ivan Bella
> Fix For: 1.8.2, 2.0.0
>
>
> On one of our systems we tracked some scans that were taking an extremely 
> long time to complete (many hours).  As it turns out the scan was relatively 
> simple in that it was scanning a tablet for all keys that had a specific 
> column family.  Note that there was very little data that actually matched 
> this column familiy.  Upon tracing the code we found that it was spending a 
> large amount of time in the LocalityGroupIterator.  Stack traces continually 
> found the code to be at line 128 or 129 of the LocalityGroupIterator.  Those 
> line numbers are consistent from the 1.6 series all the way to 2.0.0 
> (master).  In this case the column family being searched for was included in 
> one of a dozen or so locality groups on that table, and the locality group 
> itself had 40 or so column families.  We see several things that can be done 
> here:
> 1) The code that checks the group column families against those being 
> searched for can quickly exit once if finds a match
> 2) The code that checks the group column families against those being 
> searched for can look at the relative size of those two groups an invert the 
> logic appropriately for a more efficient loop.
> 3) We could create a cached map of column families to locality groups 
> allowing us to avoid examining each locality group every time we seek.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (ACCUMULO-4667) LocalityGroupIterator very inefficient with large locality groups

2017-06-27 Thread Ivan Bella (JIRA)

 [ 
https://issues.apache.org/jira/browse/ACCUMULO-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Bella updated ACCUMULO-4667:
-
Description: 
On one of our systems we tracked some scans that were taking an extremely long 
time to complete (many hours).  As it turns out the scan was relatively simple 
in that it was scanning a tablet for all keys that had a specific column 
family.  Note that there was very little data that actually matched this column 
familiy.  Upon tracing the code we found that it was spending a large amount of 
time in the LocalityGroupIterator.  Stack traces continually found the code to 
be at line 128 or 129 of the LocalityGroupIterator.  Those line numbers are 
consistent from the 1.6 series all the way to 2.0.0 (master).  In this case the 
column family being searched for was included in one of a dozen or so locality 
groups on that table, and the locality group itself had 40 or so column 
families.  We see several things that can be done here:

1) The code that checks the group column families against those being searched 
for can quickly exit once if finds a match
2) The code that checks the group column families against those being searched 
for can look at the relative size of those two groups an invert the logic 
appropriately for a more efficient loop.
3) We could create a cached map of column families to locality groups allowing 
us to avoid examining each locality group every time we seek.

  was:
On one of our systems we tracked some scans that were taking an extremely long 
time to complete (many hours).  As it turns out the scan was relatively simple 
in that it was scanning a tablet for all keys that had a specific column 
family.  Note that there was very little data that actually matched this column 
familiy.  Upon tracing the code we found that it was spending a large amount of 
time in the LocalityGroupIterator.  Stack traces continually found the code to 
be at list 128 or 129 of the LocalityGroupIterator.  Those line numbers are 
consistent from the 1.6 series all the way to 2.0.0 (master).  In this case the 
column family being searched for was included in one of a dozen or so locality 
groups on that table, and the locality group itself had 40 or so column 
families.  We see several things that can be done here:

1) The code that checks the group column families against those being searched 
for can quickly exit once if finds a match
2) The code that checks the group column families against those being searched 
for can look at the relative size of those two groups an invert the logic 
appropriately for a more efficient loop.
3) We could create a cached map of column families to locality groups allowing 
us to avoid examining each locality group every time we seek.


> LocalityGroupIterator very inefficient with large locality groups
> -
>
> Key: ACCUMULO-4667
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4667
> Project: Accumulo
>  Issue Type: Improvement
>  Components: tserver
>Affects Versions: 1.6.6, 1.7.3, 1.8.1, 2.0.0
>Reporter: Ivan Bella
>Assignee: Ivan Bella
> Fix For: 1.8.2, 2.0.0
>
>
> On one of our systems we tracked some scans that were taking an extremely 
> long time to complete (many hours).  As it turns out the scan was relatively 
> simple in that it was scanning a tablet for all keys that had a specific 
> column family.  Note that there was very little data that actually matched 
> this column familiy.  Upon tracing the code we found that it was spending a 
> large amount of time in the LocalityGroupIterator.  Stack traces continually 
> found the code to be at line 128 or 129 of the LocalityGroupIterator.  Those 
> line numbers are consistent from the 1.6 series all the way to 2.0.0 
> (master).  In this case the column family being searched for was included in 
> one of a dozen or so locality groups on that table, and the locality group 
> itself had 40 or so column families.  We see several things that can be done 
> here:
> 1) The code that checks the group column families against those being 
> searched for can quickly exit once if finds a match
> 2) The code that checks the group column families against those being 
> searched for can look at the relative size of those two groups an invert the 
> logic appropriately for a more efficient loop.
> 3) We could create a cached map of column families to locality groups 
> allowing us to avoid examining each locality group every time we seek.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ACCUMULO-4667) LocalityGroupIterator very inefficient with large locality groups

2017-06-27 Thread Ivan Bella (JIRA)
Ivan Bella created ACCUMULO-4667:


 Summary: LocalityGroupIterator very inefficient with large 
locality groups
 Key: ACCUMULO-4667
 URL: https://issues.apache.org/jira/browse/ACCUMULO-4667
 Project: Accumulo
  Issue Type: Improvement
  Components: tserver
Affects Versions: 1.8.1, 1.7.3, 1.6.6, 2.0.0
Reporter: Ivan Bella
Assignee: Ivan Bella
 Fix For: 1.8.2, 2.0.0


On one of our systems we tracked some scans that were taking an extremely long 
time to complete (many hours).  As it turns out the scan was relatively simple 
in that it was scanning a tablet for all keys that had a specific column 
family.  Note that there was very little data that actually matched this column 
familiy.  Upon tracing the code we found that it was spending a large amount of 
time in the LocalityGroupIterator.  Stack traces continually found the code to 
be at list 128 or 129 of the LocalityGroupIterator.  Those line numbers are 
consistent from the 1.6 series all the way to 2.0.0 (master).  In this case the 
column family being searched for was included in one of a dozen or so locality 
groups on that table, and the locality group itself had 40 or so column 
families.  We see several things that can be done here:

1) The code that checks the group column families against those being searched 
for can quickly exit once if finds a match
2) The code that checks the group column families against those being searched 
for can look at the relative size of those two groups an invert the logic 
appropriately for a more efficient loop.
3) We could create a cached map of column families to locality groups allowing 
us to avoid examining each locality group every time we seek.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ACCUMULO-4666) Clean up assertions on construction of KerberosToken

2017-06-27 Thread Josh Elser (JIRA)
Josh Elser created ACCUMULO-4666:


 Summary: Clean up assertions on construction of KerberosToken
 Key: ACCUMULO-4666
 URL: https://issues.apache.org/jira/browse/ACCUMULO-4666
 Project: Accumulo
  Issue Type: Sub-task
  Components: client
Reporter: Josh Elser
Assignee: Josh Elser
 Fix For: 1.7.4, 1.8.2, 2.0.0


Follow-on from ACCUMULO-4665 per 
https://github.com/apache/accumulo/pull/273#discussion_r124141590

We can do a better verification on construction of the KerberosToken and also 
improve the javadoc to make the implementation details a bit more apparent to 
the user.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)