Hi Fernando, Keeping the data in memory as `RecordBatch`es sounds like the way to go if you want it all to be in memory.
Another way to work in Rust with data from parquet files is to use the `DataFusion` library; Depending on your needs it might save you some time building up your analytics (e.g. it has aggregations, filtering and sorting built it). Here are some examples of how to use DataFusion with a parquet file (with the dataframe and the SQL api): https://github.com/apache/arrow/blob/master/rust/datafusion/examples/dataframe.rs https://github.com/apache/arrow/blob/master/rust/datafusion/examples/parquet_sql.rs If you already have RecordBatches you can register an in memory table as well. Hope that helps, Andrew On Sat, Jan 23, 2021 at 7:33 AM Fernando Herrera < [email protected]> wrote: > Hi all, > > A quick question regarding reading a parquet file. What is the best way to > read a parquet file and keep it in memory to do data analysis? > > What I'm doing now is using the record reader from the > ParquetFileArrowReader and then I read all the record batches from the > file. I keep the batches in memory in a vector of record batches. This way > I have access to them to do some aggregations I need from the file. > > Is there another way to do this? > > Thanks, > Fernando >
