Hi,

  We need to manage a rolling window of parquet data within drill.

Our parquet files are partitioned by hour,
Once hdfs reach a certain usage threshold, we want to delete the oldest
partition folder.

A simple approach would be to run a cron job that check the hdfs usage and
delete the oldest partition folder if necessary, would that cause issue if
this operation occurs while a query is  running on those files?

Would you recommend instead writing a script/app that submit a "drop table"
on the oldest partition folder using odbc interface?

Any other ideas are welcome.

Thanks a lot!
François

Reply via email to