Re: reading rfiles directly

Bulldog20630405 Wed, 05 Aug 2020 09:48:05 -0700

i looked at the RFile class (there are two of them in core ... one is
core.client.rfile.RFile and the other is core.file.rfile.RFile)


in both cases most of the capability is private or package protected and
you cannot access the functionality.

am i missing something?



On Tue, Aug 4, 2020 at 1:07 PM Bulldog20630405 <[email protected]>
wrote:

> yes; that is more what i want to do; i wish there was an
> AccumuloFileInputFormat; but there isnt... maybe i need to create
> one...thanx... i will look into the rfile class (i am using 1.9 so we
> should be good)
>
> On Tue, Aug 4, 2020 at 12:20 PM Keith Turner <[email protected]> wrote:
>
>> Could the Accumulo Map Reduce input format and enable scanning an
>> offline table. This will read the tables rfiles directly excluding any
>> data falling outside of tablet boundaries.  Since this is a Hadoop
>> input format, it should work easily with Spark.  I can point to
>> examples of this if interested.
>>
>> Another option is using the RFile class (added in 1.8) in the public
>> API to directly read individual RFiles, this is useful when tables and
>> tablets are not a concern.  I have not used this with Spark, but I
>> think it would work easily by  partitioning a list of files into task
>> and having each task read a set of rfiles directly.
>>
>> On Mon, Aug 3, 2020 at 4:46 PM Bulldog20630405
>> <[email protected]> wrote:
>> >
>> >
>> > we would like to read rfiles directly outside an active accumulo
>> instance using spark.  is there a example to do this?
>> >
>> > note: i know there is an utility to print rfiles and i could start
>> there and build my own; but was hoping to leverage something already there.
>>
>

Re: reading rfiles directly

Reply via email to