Couldn't we do this in the 1.6 line as an optimization when we meet the
constraints on scanners?

That would let us avoid exposing TabletLocator and get something out sooner.

-- 
Sean
On Feb 16, 2015 2:48 PM, "Josh Elser" <[email protected]> wrote:

> Eugene,
>
> First off, thanks so much for writing this up. This is definitely a "hot
> topic" that comes up for users and appears to have a lot of relevance to
> people right now.
>
> I think the first thing that needs to happen is that we "lift"
> TabletLocator (or some class which serves the purpose that TabletLocator
> currently fulfills) into the public API. TabletLocator is currently treated
> as "internal implementation" meaning that you
> don't have any guarantees on its use.
>
> I think step 1 would be to add a TabletLocator class into the public API
> (and hide the implementation in a TabletLocatorImpl). We could only do this
> for 1.7.0 given our adoption of semver. You are more than welcome to look
> at this and try to work on a PR.
>
> Feel free to open an issue on JIRA as well (I can make sure it gets
> assigned to you after you do), and we can work with you to get a good
> design in place.
>
> - JOsh
>
> Eugene Cheipesh wrote:
>
>> Hello,
>>
>> This is more of a use-case report and a request for comment.
>>
>> I am using Accumulo as a source for Spark RDDs through
>> AccumuloInputFormat. My index is based on a z-order space filing curve.
>> When I decompose a bounding box into index ranges I can end up with a
>> large number of Ranges, 3k+ is not too unusual. Getting a fast response
>> from Accumulo is not at all an issue. It would be possible to generate
>> approximate ranges and use a Filter to refine them on server side but
>> that only delays the problem.
>>
>> The ideal scenario is for Spark executors to be co-located with Accumulo
>> tservers and number of splits per server to be roughly equal to the
>> number of cores on the machine.
>>
>> However, AccumuloInputFormat maps each range to a Split and Spark maps
>> every split to a Task. It is nature of z-order curve that some of these
>> ranges contain only a few tiles while others contain a pretty big chunk.
>> Having significantly more splits than cores prevents good concurrency on
>> fetches. This is a problem that BatchScanner is designed to fix but it’s
>> not used in AccumuloInputFormat.
>>
>> I noticed that TabletLocator which is used by AccumuloInputFormat
>> returns a structure that looks like it breaks down ranges by host and
>> then by tablet. I hacked together an InputFormat that generates a split
>> per tablet and a Reader that uses a BatchScanner. The performance for
>> spark use-case was orders of magnitude better. I end up with about 50
>> splits for the same dataset. I can’t give exact numbers because I gave
>> up timing the original source. This seems is a pretty good compromise
>> since the number of splits can be dynamically controlled to tune the
>> distribution and granularity of calculation batches.
>>
>> A drawback is that most modes can not support this operation directly:
>> client side, offline, and isolated scans require a single range
>> iterator. So some additional code would be required for juggling them.
>>
>> What are your thoughts on this use case and its requirements? Is this a
>> legitimate use of TabletLocator?
>>
>> It would be nice if AccumuloInputFormat was able to use BatchScanner,
>> perhaps as an option. Accumulo is designed to crunch through large
>> number of ranges so I would guess this to be a common issue. I’d be
>> willing to take a stab at a PR if there is agreement on that.
>>
>> Thanks,
>> --
>> Eugene Cheipesh
>>
>

Reply via email to