> Also, how is an object that implements Array <dyn Array> downcasted to other types of Arrays. I'm doing it now using as_any and then down ref to the type I want. But I have to write the type in the code and I want to find a way for it to be done automatically.
I think this is the standard way in Rust -- because Rust is statically typed, in order to do anything with the implementations a cast to a concrete type is typically needed. Something to look at might be the various compute kernels in https://github.com/apache/arrow/tree/master/rust/arrow/src/compute/kernels that do operate on `ArrayRef`s -- either for operations you could use directly or at least inspiration / examples of how to manipulate the various array types. > By the way, would it make sense to create a struct Table similar to the one in pyarrow to collect several Record Batches? I think this could make sense, though most of the operations I see in pyarrow.Table <https://arrow.apache.org/docs/python/generated/pyarrow.Table.html> (e.g. filter and select) are already supported in the Dataframe api. Is there any operation in particular that you would like to use? Andrew On Sun, Jan 24, 2021 at 7:41 AM Fernando Herrera < [email protected]> wrote: > Thanks Andrew, > > I did read the examples that you mentioned and I don't think they will > help me with what I want to do. I need to create two hash maps from the > parquet file to do further comparisons on those maps. In both cases I need > to create a set of unique ngrams from strings stored in the parquet file. > > By the way, would it make sense to create a struct Table similar to the > one in pyarrow to collect several Record Batches? > > Also, how is an object that implements Array <dyn Array> downcasted to > other types of Arrays. I'm doing it now using as_any and then down ref to > the type I want. But I have to write the type in the code and I want to > find a way for it to be done automatically. > > Thanks, > Fernando > > On Sun, 24 Jan 2021, 12:01 Andrew Lamb, <[email protected]> wrote: > >> Hi Fernando, >> >> Keeping the data in memory as `RecordBatch`es sounds like the way to go >> if you want it all to be in memory. >> >> Another way to work in Rust with data from parquet files is to use the >> `DataFusion` library; Depending on your needs it might save you some time >> building up your analytics (e.g. it has aggregations, filtering and sorting >> built it). >> >> Here are some examples of how to use DataFusion with a parquet file (with >> the dataframe and the SQL api): >> >> https://github.com/apache/arrow/blob/master/rust/datafusion/examples/dataframe.rs >> >> https://github.com/apache/arrow/blob/master/rust/datafusion/examples/parquet_sql.rs >> >> If you already have RecordBatches you can register an in memory table as >> well. >> >> Hope that helps, >> Andrew >> >> >> On Sat, Jan 23, 2021 at 7:33 AM Fernando Herrera < >> [email protected]> wrote: >> >>> Hi all, >>> >>> A quick question regarding reading a parquet file. What is the best way >>> to read a parquet file and keep it in memory to do data analysis? >>> >>> What I'm doing now is using the record reader from the >>> ParquetFileArrowReader and then I read all the record batches from the >>> file. I keep the batches in memory in a vector of record batches. This way >>> I have access to them to do some aggregations I need from the file. >>> >>> Is there another way to do this? >>> >>> Thanks, >>> Fernando >>> >>
