Now I see there is a ORC adaptor for Arrow which can read ORC file as a table. 
With this in place, I intend to use TableBatchReader to read it. 

How to get a single record from TableBatchReader ? 


> On 22-Mar-2019, at 12:18 AM, Wes McKinney <[email protected]> wrote:
> 
> hi Nirmala,
> 
> There aren't any tools in the libraries to help you "out of the box",
> so you'll probably have to devise your own metadata storage and state
> management scheme for such a system.
> 
> best
> Wes
> 
> On Thu, Mar 21, 2019 at 9:53 AM Nirmala S <[email protected]> wrote:
>> 
>> Hi,
>> 
>>        I am trying to build a caching layer using Arrow on top of ORC files. 
>> The application will ask for a column(which can be of any data type - fixed, 
>> variable length) of data from the cache, the cache needs to check it’s 
>> metadata to see if the column is already present. If yes, it can return the 
>> data to application. If not the data needs to be fetched from ORC files, 
>> cached and then returned to application. The application is multi-threaded 
>> and is based on C++. Application has a read-only workload.
>> 
>>        This being the case what is the best method to maintain the metadata 
>> and the data in Arrow, is there any good practise ?
>> 
>>        If cache size is smaller than the ORC file size, should I be putting 
>> in a logic to swap the data using some algorithm like LRU or is this already 
>> present in Arrow ?
>> 
>> 
>> Thanks in advance
>> Nirmala
>> 
>> 
>> 
>> 

Reply via email to