yes; that is more what i want to do; i wish there was an AccumuloFileInputFormat; but there isnt... maybe i need to create one...thanx... i will look into the rfile class (i am using 1.9 so we should be good)
On Tue, Aug 4, 2020 at 12:20 PM Keith Turner <[email protected]> wrote: > Could the Accumulo Map Reduce input format and enable scanning an > offline table. This will read the tables rfiles directly excluding any > data falling outside of tablet boundaries. Since this is a Hadoop > input format, it should work easily with Spark. I can point to > examples of this if interested. > > Another option is using the RFile class (added in 1.8) in the public > API to directly read individual RFiles, this is useful when tables and > tablets are not a concern. I have not used this with Spark, but I > think it would work easily by partitioning a list of files into task > and having each task read a set of rfiles directly. > > On Mon, Aug 3, 2020 at 4:46 PM Bulldog20630405 > <[email protected]> wrote: > > > > > > we would like to read rfiles directly outside an active accumulo > instance using spark. is there a example to do this? > > > > note: i know there is an utility to print rfiles and i could start there > and build my own; but was hoping to leverage something already there. >
