The Arrow APIs are batch-based, so if you want to go record-by-record you would need to develop an interface on top of the arrow::RecordBatch data structure
On Wed, Mar 27, 2019 at 2:06 AM Nirmala S <[email protected]> wrote: > > Now I see there is a ORC adaptor for Arrow which can read ORC file as a > table. With this in place, I intend to use TableBatchReader to read it. > > How to get a single record from TableBatchReader ? > > > > On 22-Mar-2019, at 12:18 AM, Wes McKinney <[email protected]> wrote: > > > > hi Nirmala, > > > > There aren't any tools in the libraries to help you "out of the box", > > so you'll probably have to devise your own metadata storage and state > > management scheme for such a system. > > > > best > > Wes > > > > On Thu, Mar 21, 2019 at 9:53 AM Nirmala S <[email protected]> > > wrote: > >> > >> Hi, > >> > >> I am trying to build a caching layer using Arrow on top of ORC > >> files. The application will ask for a column(which can be of any data type > >> - fixed, variable length) of data from the cache, the cache needs to check > >> it’s metadata to see if the column is already present. If yes, it can > >> return the data to application. If not the data needs to be fetched from > >> ORC files, cached and then returned to application. The application is > >> multi-threaded and is based on C++. Application has a read-only workload. > >> > >> This being the case what is the best method to maintain the > >> metadata and the data in Arrow, is there any good practise ? > >> > >> If cache size is smaller than the ORC file size, should I be > >> putting in a logic to swap the data using some algorithm like LRU or is > >> this already present in Arrow ? > >> > >> > >> Thanks in advance > >> Nirmala > >> > >> > >> > >> >
