Hello,

This is more of a use-case report and a request for comment.

I am using Accumulo as a source for Spark RDDs through AccumuloInputFormat. My 
index is based on a z-order space filing curve. When I decompose  a bounding 
box into index ranges I can end up with a large number of Ranges, 3k+ is not 
too unusual. Getting a fast response from Accumulo is not at all an issue. It 
would be possible to generate approximate ranges and use a Filter to refine 
them on server side but that only delays the problem.

The ideal scenario is for Spark executors to be co-located with Accumulo 
tservers and number of splits per server to be roughly equal to the number of 
cores on the machine. 

However, AccumuloInputFormat maps each range to a Split and Spark maps every 
split to a Task. It is nature of z-order curve that some of these ranges 
contain only a few tiles while others contain a pretty big chunk. Having 
significantly more splits than cores prevents good concurrency on fetches. This 
is a problem that BatchScanner is designed to fix but it’s not used in 
AccumuloInputFormat.

I noticed that TabletLocator which is used by AccumuloInputFormat returns a 
structure that looks like it breaks down ranges by host and then by tablet. I 
hacked together an InputFormat that generates a split per tablet and a Reader 
that uses a BatchScanner. The performance for spark use-case was orders of 
magnitude better. I end up with about 50 splits for the same dataset.  I can’t 
give exact numbers because I gave up timing the original source. This seems is 
a pretty good compromise since the number of splits can be dynamically 
controlled to tune the distribution and granularity of calculation batches.

A drawback is that most modes can not support this operation directly: client 
side, offline, and isolated scans require a single range iterator. So some 
additional code would be required for juggling them.

What are your thoughts on this use case and its requirements? Is this a 
legitimate use of TabletLocator? 

It would be nice if AccumuloInputFormat was able to use BatchScanner, perhaps 
as an option. Accumulo is designed to crunch through large number of ranges so 
I would guess this to be a common issue. I’d be willing to take a stab at a PR 
if there is agreement on that.

Thanks,
-- 
Eugene Cheipesh

Reply via email to