Hi,

        I am trying to build a caching layer using Arrow on top of ORC files. 
The application will ask for a column(which can be of any data type - fixed, 
variable length) of data from the cache, the cache needs to check it’s metadata 
to see if the column is already present. If yes, it can return the data to 
application. If not the data needs to be fetched from ORC files, cached and 
then returned to application. The application is multi-threaded and is based on 
C++. Application has a read-only workload.
        
        This being the case what is the best method to maintain the metadata 
and the data in Arrow, is there any good practise ? 

        If cache size is smaller than the ORC file size, should I be putting in 
a logic to swap the data using some algorithm like LRU or is this already 
present in Arrow ?


Thanks in advance
Nirmala




Reply via email to