[
https://issues.apache.org/jira/browse/ACCUMULO-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan Bella resolved ACCUMULO-4667.
--
Resolution: Fixed
> LocalityGroupIterator very inefficient with large locality groups
> -
>
> Key: ACCUMULO-4667
> URL: https://issues.apache.org/jira/browse/ACCUMULO-4667
> Project: Accumulo
> Issue Type: Improvement
> Components: tserver
>Affects Versions: 1.6.6, 1.7.3, 1.8.1, 2.0.0
>Reporter: Ivan Bella
>Assignee: Ivan Bella
> Fix For: 1.8.2, 2.0.0
>
> Time Spent: 7.5h
> Remaining Estimate: 0h
>
> On one of our systems we tracked some scans that were taking an extremely
> long time to complete (many hours). As it turns out the scan was relatively
> simple in that it was scanning a tablet for all keys that had a specific
> column family. Note that there was very little data that actually matched
> this column familiy. Upon tracing the code we found that it was spending a
> large amount of time in the LocalityGroupIterator. Stack traces continually
> found the code to be at line 128 or 129 of the LocalityGroupIterator. Those
> line numbers are consistent from the 1.6 series all the way to 2.0.0
> (master). In this case the column family being searched for was included in
> one of a dozen or so locality groups on that table, and the locality group
> itself had 40 or so column families. We see several things that can be done
> here:
> 1) The code that checks the group column families against those being
> searched for can quickly exit once if finds a match
> 2) The code that checks the group column families against those being
> searched for can look at the relative size of those two groups an invert the
> logic appropriately for a more efficient loop.
> 3) We could create a cached map of column families to locality groups
> allowing us to avoid examining each locality group every time we seek.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)