Re: Caching layer using arrow

Wes McKinney Thu, 28 Mar 2019 18:49:21 -0700

The Arrow APIs are batch-based, so if you want to go record-by-record
you would need to develop an interface on top of the
arrow::RecordBatch data structure


On Wed, Mar 27, 2019 at 2:06 AM Nirmala S <[email protected]> wrote:
>
> Now I see there is a ORC adaptor for Arrow which can read ORC file as a 
> table. With this in place, I intend to use TableBatchReader to read it.
>
> How to get a single record from TableBatchReader ?
>
>
> > On 22-Mar-2019, at 12:18 AM, Wes McKinney <[email protected]> wrote:
> >
> > hi Nirmala,
> >
> > There aren't any tools in the libraries to help you "out of the box",
> > so you'll probably have to devise your own metadata storage and state
> > management scheme for such a system.
> >
> > best
> > Wes
> >
> > On Thu, Mar 21, 2019 at 9:53 AM Nirmala S <[email protected]> 
> > wrote:
> >>
> >> Hi,
> >>
> >>        I am trying to build a caching layer using Arrow on top of ORC 
> >> files. The application will ask for a column(which can be of any data type 
> >> - fixed, variable length) of data from the cache, the cache needs to check 
> >> it’s metadata to see if the column is already present. If yes, it can 
> >> return the data to application. If not the data needs to be fetched from 
> >> ORC files, cached and then returned to application. The application is 
> >> multi-threaded and is based on C++. Application has a read-only workload.
> >>
> >>        This being the case what is the best method to maintain the 
> >> metadata and the data in Arrow, is there any good practise ?
> >>
> >>        If cache size is smaller than the ORC file size, should I be 
> >> putting in a logic to swap the data using some algorithm like LRU or is 
> >> this already present in Arrow ?
> >>
> >>
> >> Thanks in advance
> >> Nirmala
> >>
> >>
> >>
> >>
>

Re: Caching layer using arrow

Reply via email to