Could the Accumulo Map Reduce input format and enable scanning an offline table. This will read the tables rfiles directly excluding any data falling outside of tablet boundaries. Since this is a Hadoop input format, it should work easily with Spark. I can point to examples of this if interested.
Another option is using the RFile class (added in 1.8) in the public API to directly read individual RFiles, this is useful when tables and tablets are not a concern. I have not used this with Spark, but I think it would work easily by partitioning a list of files into task and having each task read a set of rfiles directly. On Mon, Aug 3, 2020 at 4:46 PM Bulldog20630405 <[email protected]> wrote: > > > we would like to read rfiles directly outside an active accumulo instance > using spark. is there a example to do this? > > note: i know there is an utility to print rfiles and i could start there and > build my own; but was hoping to leverage something already there.
