Hi Team, Kindly check the below query regarding the partition pruning. We are using the partition pruning for our current project in Apache Drill and have some questions. Please find the below details of the scenario-
File Type- Parquet generated from Python Folder structure in hdfs- /<root_folder>/<dir0>/<dir1>/<dir2> Query used to select data under <dir2>- To take advantage of partition pruning select column1, column2, ... from dfs.`tmp`.`<root_folder>` where dir0 = <dir0> and dir1 = <dir1> and dir2 = <dir2> and <filter> = ..; Observation- Although the execution is fast, the time taken for planning is quite high. I didn't see VALUES operator in the physical plan of the query, rather there was SCAN operator. How can we ensure that the selected data is partition pruned here ? As an alternative, I modified the query to bring down the planning time of it and included the sub-directories in the root directory. The modified query is- select column1, column2, ... from dfs.`tmp`.`<root_folder>/<dir0>/<dir1>/<dir2>` where <filter> = ..; Can you please tell me why the planning time is so high for the first query? How can we take advantage of partition pruning from it ? Or should we include sub-directories in the root directory ? Thanks in advance. *Sreeparna Bhabani*
