I found this: https://drill.apache.org/docs/drop-table/ :
"Currently, Drill does not have a mechanism in place, such as read locks on files, to address concurrency issues. For example, if one user runs a query that references a table that another user simultaneously issues the DROP TABLE command against, there is no mechanism in place to prevent a collision of the two processes. In such a scenario, Drill may return partial query results or a system error to the user running the query when the table is dropped." A solution would be to perform a delete operation in HDFS when no queries are in "running" state or schedule a delete outside business hours. On Tue, Mar 1, 2016 at 11:25 AM, François Méthot <[email protected]> wrote: > Hi, > > We need to manage a rolling window of parquet data within drill. > > Our parquet files are partitioned by hour, > Once hdfs reach a certain usage threshold, we want to delete the oldest > partition folder. > > A simple approach would be to run a cron job that check the hdfs usage and > delete the oldest partition folder if necessary, would that cause issue if > this operation occurs while a query is running on those files? > > Would you recommend instead writing a script/app that submit a "drop > table" on the oldest partition folder using odbc interface? > > Any other ideas are welcome. > > Thanks a lot! > François > > > > > > > >
