Re: [RUST] Reading parquet

Andrew Lamb Sun, 24 Jan 2021 04:01:44 -0800

Hi Fernando,

Keeping the data in memory as `RecordBatch`es sounds like the way to go if
you want it all to be in memory.

Another way to work in Rust with data from parquet files is to use the
`DataFusion` library; Depending on your needs it might save you some time
building up your analytics (e.g. it has aggregations, filtering and sorting
built it).

Here are some examples of how to use DataFusion with a parquet file (with
the dataframe and the SQL api):
https://github.com/apache/arrow/blob/master/rust/datafusion/examples/dataframe.rs
https://github.com/apache/arrow/blob/master/rust/datafusion/examples/parquet_sql.rs

If you already have RecordBatches you can register an in memory table as
well.

Hope that helps,
Andrew

On Sat, Jan 23, 2021 at 7:33 AM Fernando Herrera <
[email protected]> wrote:

> Hi all,
>
> A quick question regarding reading a parquet file. What is the best way to
> read a parquet file and keep it in memory to do data analysis?
>
> What I'm doing now is using the record reader from the
> ParquetFileArrowReader and then I read all the record batches from the
> file. I keep the batches in memory in a vector of record batches. This way
> I have access to them to do some aggregations I need from the file.
>
> Is there another way to do this?
>
> Thanks,
> Fernando
>

Re: [RUST] Reading parquet

Reply via email to