Is DataFusion a good solution for validating and converting large csv files (+20M, ~400 columns) existing in S3 buckets to parquet?

Have ran some of the Arrow,ArrowFlight/Java and Arrow/JS examples and now are looking at DataFusion because I see that it can work directly with CSV files. Not a Rust programmer but I like some of the features described in the DataFusion docs.

I will deploying the transformed parquet files to S3 that will then be processed further by Dremio into virtual datasets.  For the end users I will be offering a browser based visualizer for adhoc data analysis and since ArrowJS does not implement ArrowFlight I plan to create a gateway between the ArrowJS and Dremio/ArrowFlight.

So... Arrow is in my dev plans but should I bite the bullet to learn Rust and DataFusion??

thanks for any directions,

John


Reply via email to