Re: Decompressing Tar Files for Batch Processing

2020-07-07 Thread Austin Cawley-Edwards
On Tue, Jul 7, 2020 at 10:53 AM Austin Cawley-Edwards < austin.caw...@gmail.com> wrote: > Hey Xiaolong, > > Thanks for the suggestions. Just to make sure I understand, are you saying > to run the download and decompression in the Job Manager before executing > the job? > > I think another way to

Re: Decompressing Tar Files for Batch Processing

2020-07-07 Thread Austin Cawley-Edwards
Hey Chesnay, Thanks for the advice, and easy enough to do it in a separate process. Best, Austin On Tue, Jul 7, 2020 at 10:29 AM Chesnay Schepler wrote: > I would probably go with a separate process. > > Downloading the file could work with Flink if it is already present in > some supported

Re: Decompressing Tar Files for Batch Processing

2020-07-07 Thread Chesnay Schepler
I would probably go with a separate process. Downloading the file could work with Flink if it is already present in some supported filesystem. Decompressing the file is supported for selected formats (deflate, gzip, bz2, xz), but this seems to be an undocumented feature, so I'm not sure how

Decompressing Tar Files for Batch Processing

2020-07-06 Thread Austin Cawley-Edwards
Hey all, I need to ingest a tar file containing ~1GB of data in around 10 CSVs. The data is fairly connected and needs some cleaning, which I'd like to do with the Batch Table API + SQL (but have never used before). I've got a small prototype loading the uncompressed CSVs and applying the