Thanks Andrew, I did read the examples that you mentioned and I don't think they will help me with what I want to do. I need to create two hash maps from the parquet file to do further comparisons on those maps. In both cases I need to create a set of unique ngrams from strings stored in the parquet file.
By the way, would it make sense to create a struct Table similar to the one in pyarrow to collect several Record Batches? Also, how is an object that implements Array <dyn Array> downcasted to other types of Arrays. I'm doing it now using as_any and then down ref to the type I want. But I have to write the type in the code and I want to find a way for it to be done automatically. Thanks, Fernando On Sun, 24 Jan 2021, 12:01 Andrew Lamb, <[email protected]> wrote: > Hi Fernando, > > Keeping the data in memory as `RecordBatch`es sounds like the way to go if > you want it all to be in memory. > > Another way to work in Rust with data from parquet files is to use the > `DataFusion` library; Depending on your needs it might save you some time > building up your analytics (e.g. it has aggregations, filtering and sorting > built it). > > Here are some examples of how to use DataFusion with a parquet file (with > the dataframe and the SQL api): > > https://github.com/apache/arrow/blob/master/rust/datafusion/examples/dataframe.rs > > https://github.com/apache/arrow/blob/master/rust/datafusion/examples/parquet_sql.rs > > If you already have RecordBatches you can register an in memory table as > well. > > Hope that helps, > Andrew > > > On Sat, Jan 23, 2021 at 7:33 AM Fernando Herrera < > [email protected]> wrote: > >> Hi all, >> >> A quick question regarding reading a parquet file. What is the best way >> to read a parquet file and keep it in memory to do data analysis? >> >> What I'm doing now is using the record reader from the >> ParquetFileArrowReader and then I read all the record batches from the >> file. I keep the batches in memory in a vector of record batches. This way >> I have access to them to do some aggregations I need from the file. >> >> Is there another way to do this? >> >> Thanks, >> Fernando >> >
