Is DataFusion a good solution for validating and converting large csv
files (+20M, ~400 columns) existing in S3 buckets to parquet?
Have ran some of the Arrow,ArrowFlight/Java and Arrow/JS examples and
now are looking at DataFusion because I see that it can work directly
with CSV files. Not a Rust programmer but I like some of the features
described in the DataFusion docs.
I will deploying the transformed parquet files to S3 that will then be
processed further by Dremio into virtual datasets. For the end users I
will be offering a browser based visualizer for adhoc data analysis and
since ArrowJS does not implement ArrowFlight I plan to create a gateway
between the ArrowJS and Dremio/ArrowFlight.
So... Arrow is in my dev plans but should I bite the bullet to learn
Rust and DataFusion??
thanks for any directions,
John