Hi, We need to manage a rolling window of parquet data within drill.
Our parquet files are partitioned by hour, Once hdfs reach a certain usage threshold, we want to delete the oldest partition folder. A simple approach would be to run a cron job that check the hdfs usage and delete the oldest partition folder if necessary, would that cause issue if this operation occurs while a query is running on those files? Would you recommend instead writing a script/app that submit a "drop table" on the oldest partition folder using odbc interface? Any other ideas are welcome. Thanks a lot! François