Hello, This is more of a use-case report and a request for comment.
I am using Accumulo as a source for Spark RDDs through AccumuloInputFormat. My index is based on a z-order space filing curve. When I decompose a bounding box into index ranges I can end up with a large number of Ranges, 3k+ is not too unusual. Getting a fast response from Accumulo is not at all an issue. It would be possible to generate approximate ranges and use a Filter to refine them on server side but that only delays the problem. The ideal scenario is for Spark executors to be co-located with Accumulo tservers and number of splits per server to be roughly equal to the number of cores on the machine. However, AccumuloInputFormat maps each range to a Split and Spark maps every split to a Task. It is nature of z-order curve that some of these ranges contain only a few tiles while others contain a pretty big chunk. Having significantly more splits than cores prevents good concurrency on fetches. This is a problem that BatchScanner is designed to fix but it’s not used in AccumuloInputFormat. I noticed that TabletLocator which is used by AccumuloInputFormat returns a structure that looks like it breaks down ranges by host and then by tablet. I hacked together an InputFormat that generates a split per tablet and a Reader that uses a BatchScanner. The performance for spark use-case was orders of magnitude better. I end up with about 50 splits for the same dataset. I can’t give exact numbers because I gave up timing the original source. This seems is a pretty good compromise since the number of splits can be dynamically controlled to tune the distribution and granularity of calculation batches. A drawback is that most modes can not support this operation directly: client side, offline, and isolated scans require a single range iterator. So some additional code would be required for juggling them. What are your thoughts on this use case and its requirements? Is this a legitimate use of TabletLocator? It would be nice if AccumuloInputFormat was able to use BatchScanner, perhaps as an option. Accumulo is designed to crunch through large number of ranges so I would guess this to be a common issue. I’d be willing to take a stab at a PR if there is agreement on that. Thanks, -- Eugene Cheipesh
