hi, On Fri, Mar 29, 2019 at 9:49 AM Nirmala S <[email protected]> wrote: > > Thanks Wes. I do have couple more questions, > - When a table is read using ORC adaptor, it gets read into a memory pool(in > my case default_memory_pool). How to free this area once the file is > processed ?
With the default memory pool, the memory is freed automatically when the RecordBatch data structures are destructed. > - Is there any way to read the ORC file metadata from adaptor ? Doesn't look like it yet. This would be a nice contribution to the library > > > > On 29-Mar-2019, at 7:18 AM, Wes McKinney <[email protected]> wrote: > > > > The Arrow APIs are batch-based, so if you want to go record-by-record > > you would need to develop an interface on top of the > > arrow::RecordBatch data structure > > > > On Wed, Mar 27, 2019 at 2:06 AM Nirmala S <[email protected]> > > wrote: > >> > >> Now I see there is a ORC adaptor for Arrow which can read ORC file as a > >> table. With this in place, I intend to use TableBatchReader to read it. > >> > >> How to get a single record from TableBatchReader ? > >> > >> > >>> On 22-Mar-2019, at 12:18 AM, Wes McKinney <[email protected]> wrote: > >>> > >>> hi Nirmala, > >>> > >>> There aren't any tools in the libraries to help you "out of the box", > >>> so you'll probably have to devise your own metadata storage and state > >>> management scheme for such a system. > >>> > >>> best > >>> Wes > >>> > >>> On Thu, Mar 21, 2019 at 9:53 AM Nirmala S <[email protected]> > >>> wrote: > >>>> > >>>> Hi, > >>>> > >>>> I am trying to build a caching layer using Arrow on top of ORC > >>>> files. The application will ask for a column(which can be of any data > >>>> type - fixed, variable length) of data from the cache, the cache needs > >>>> to check it’s metadata to see if the column is already present. If yes, > >>>> it can return the data to application. If not the data needs to be > >>>> fetched from ORC files, cached and then returned to application. The > >>>> application is multi-threaded and is based on C++. Application has a > >>>> read-only workload. > >>>> > >>>> This being the case what is the best method to maintain the > >>>> metadata and the data in Arrow, is there any good practise ? > >>>> > >>>> If cache size is smaller than the ORC file size, should I be > >>>> putting in a logic to swap the data using some algorithm like LRU or is > >>>> this already present in Arrow ? > >>>> > >>>> > >>>> Thanks in advance > >>>> Nirmala > >>>> > >>>> > >>>> > >>>> > >> >
