"Transactional" conversion of CSV to Parquet?

MattK Mon, 24 Oct 2016 13:50:09 -0700

I have a cluster that receives log files in a csv format on a per-minutebasis, and those files are immediately available to Drill users. Forperformance I create Parquet files from them in batch using CTAScommands.

I would like to script a process that makes the Parquet files availableon creation, perhaps through a UNION view, but that does not serveduplicate data through both an original csv and converted Parquet fileat the same time.

Is there a common practice to making data available once converted, insomething similar to a transactional batch of "convert then (re)movesource csv files" ?

"Transactional" conversion of CSV to Parquet?

Reply via email to