Hey all,

I need to ingest a tar file containing ~1GB of data in around 10 CSVs. The
data is fairly connected and needs some cleaning, which I'd like to do with
the Batch Table API + SQL (but have never used before). I've got a small
prototype loading the uncompressed CSVs and applying the necessary SQL,
which works well.

I'm wondering about the task of downloading the tar file and unzipping it
into the CSVs. Does this sound like something I can/ should do in Flink, or
should I set up another process to download, unzip, and store in a
filesystem to then read with the Flink Batch job? My research is leading me
towards doing it separately but I'd like to do it all in the same job if
there's a creative way.

Thanks!
Austin

Reply via email to