[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user asfgit closed the pull request at: https://github.com/apache/drill/pull/519 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r71059510 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java --- @@ -143,6 +144,11 @@ public static final RelOptRule getDirFilterOnScan(OptimizerRulesContext optimize } protected void doOnMatch(RelOptRuleCall call, Filter filterRel, Project projectRel, TableScan scanRel) { +if (wasAllPartitionsPruned) { --- End diff -- Pls see commit #ce509af for the changes. Since the 'all partitions pruned' is a special case, I had to keep track of this state and propagate it during the new FileSelection creation after partition pruning. Thus, if the first phase directory pruning eliminated all partitions and we re-added 1 for schema purposes, this directory selection will be eventually seen by the ParquetGroupScan which will examine a flag 'wasAllPartitionsPruned' and decide to retrieve only 1 file from this directory. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r71059388 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/FileSystemPartitionDescriptor.java --- @@ -212,8 +223,12 @@ public TableScan createTableScan(List newPartitionLocation) t } if (scanRel instanceof DrillScanRel) { - final FileSelection newFileSelection = new FileSelection(null, newFiles, getBaseTableLocation()); - final FileGroupScan newGroupScan = ((FileGroupScan)((DrillScanRel)scanRel).getGroupScan()).clone(newFileSelection); +// final FormatSelection formatSelection = (FormatSelection)((DynamicDrillTable)scanRel.getTable()).getSelection(); + final FormatSelection formatSelection = (FormatSelection)table.getSelection(); + final FileSelection newFileSelection = new FileSelection(null, newFiles, getBaseTableLocation(), + cacheFileRoot, formatSelection.getSelection().getDirStatus()); + final FileGroupScan newGroupScan = + ((FileGroupScan)((DrillScanRel)scanRel).getGroupScan()).clone(newFileSelection, cacheFileRoot); --- End diff -- Fixed in commit #7ae8f7a --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r71059395 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java --- @@ -211,9 +217,76 @@ public void testNoSupportedError() throws Exception { .go(); } + @Test // DRILL-4530 + public void testDrill4530_1() throws Exception { +// create metadata cache +test(String.format("refresh table metadata dfs_test.`%s/%s`", getDfsTestTmpSchemaLocation(), tableName2)); +checkForMetadataFile(tableName2); + +// run query and check correctness +String query1 = String.format("select dir0, dir1, o_custkey, o_orderdate from dfs_test.`%s/%s` " + +" where dir0=1995 and dir1='Q3'", +getDfsTestTmpSchemaLocation(), tableName2); +int expectedRowCount = 20; +int expectedNumFiles = 2; + +int actualRowCount = testSql(query1); +assertEquals(expectedRowCount, actualRowCount); +String numFilesPattern = "numFiles=" + expectedNumFiles; +String usedMetaPattern = "usedMetadataFile=true"; +String cacheFileRootPattern = String.format("%s/%s/1995/Q3", getDfsTestTmpSchemaLocation(), tableName2); --- End diff -- I have changed the test pattern to match 'cacheFileRoot=' string. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r71059368 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java --- @@ -607,7 +607,14 @@ public long getRowCount() { fileSet = Sets.newHashSet(fileNames); } -if (fileNames.isEmpty()) { +List finalFileNames; +if (fileSet != null) { + finalFileNames = Lists.newArrayList(fileSet); --- End diff -- Commit #ef37b77 addresses this comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r70706252 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java --- @@ -143,6 +144,11 @@ public static final RelOptRule getDirFilterOnScan(OptimizerRulesContext optimize } protected void doOnMatch(RelOptRuleCall call, Filter filterRel, Project projectRel, TableScan scanRel) { +if (wasAllPartitionsPruned) { --- End diff -- Agree that this should not be an internal state. I had added it to overcome an issue where the prunescan rule gets re-applied even when there are not qualifying partitions (this happens because we always populate the newPartitions list with at least 1 entry to produce a valid schema during execution). I tried removing this flag but will need some associated changes to prevent the rule from being reapplied without termination. The old code (without my patch) terminates the pruning because FileSelection.supportDirPruning() returns FALSE when the selection has directories. Whereas, for this optimization I want to do the pruning on directories first. So, let me think of a way to address that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r70548891 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java --- @@ -607,7 +607,14 @@ public long getRowCount() { fileSet = Sets.newHashSet(fileNames); } -if (fileNames.isEmpty()) { +List finalFileNames; +if (fileSet != null) { + finalFileNames = Lists.newArrayList(fileSet); --- End diff -- Yes, I can try to keep only the fileSet and get rid of fileNames (although I would still need the finalFileNames list since FileSelection only accepts a List, not a Set). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r70537956 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java --- @@ -607,7 +607,14 @@ public long getRowCount() { fileSet = Sets.newHashSet(fileNames); } -if (fileNames.isEmpty()) { +List finalFileNames; +if (fileSet != null) { + finalFileNames = Lists.newArrayList(fileSet); --- End diff -- I see the need to remove duplicates and the confusion comes from old code. Is it reasonable to change existing code, and only keep fileSet in all cases? Keep fileName and fileSet seems to be a bit confusing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r70536259 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/FileSystemPartitionDescriptor.java --- @@ -212,8 +223,12 @@ public TableScan createTableScan(List newPartitionLocation) t } if (scanRel instanceof DrillScanRel) { - final FileSelection newFileSelection = new FileSelection(null, newFiles, getBaseTableLocation()); - final FileGroupScan newGroupScan = ((FileGroupScan)((DrillScanRel)scanRel).getGroupScan()).clone(newFileSelection); +// final FormatSelection formatSelection = (FormatSelection)((DynamicDrillTable)scanRel.getTable()).getSelection(); + final FormatSelection formatSelection = (FormatSelection)table.getSelection(); + final FileSelection newFileSelection = new FileSelection(null, newFiles, getBaseTableLocation(), + cacheFileRoot, formatSelection.getSelection().getDirStatus()); + final FileGroupScan newGroupScan = + ((FileGroupScan)((DrillScanRel)scanRel).getGroupScan()).clone(newFileSelection, cacheFileRoot); --- End diff -- I agree it seems redundant second parameter. I don't recall why I needed it at some point but let me see if I can get rid of it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r70532442 --- Diff: exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/TestParquetMetadataCache.java --- @@ -211,9 +217,76 @@ public void testNoSupportedError() throws Exception { .go(); } + @Test // DRILL-4530 + public void testDrill4530_1() throws Exception { +// create metadata cache +test(String.format("refresh table metadata dfs_test.`%s/%s`", getDfsTestTmpSchemaLocation(), tableName2)); +checkForMetadataFile(tableName2); + +// run query and check correctness +String query1 = String.format("select dir0, dir1, o_custkey, o_orderdate from dfs_test.`%s/%s` " + +" where dir0=1995 and dir1='Q3'", +getDfsTestTmpSchemaLocation(), tableName2); +int expectedRowCount = 20; +int expectedNumFiles = 2; + +int actualRowCount = testSql(query1); +assertEquals(expectedRowCount, actualRowCount); +String numFilesPattern = "numFiles=" + expectedNumFiles; +String usedMetaPattern = "usedMetadataFile=true"; +String cacheFileRootPattern = String.format("%s/%s/1995/Q3", getDfsTestTmpSchemaLocation(), tableName2); --- End diff -- The verification of cacheFileRootPattern probably need put "cacheFileRoot=" as the prefix. Otherwise, the list of files in GroupScan will always find a match for cacheFileRoot, right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r70529269 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java --- @@ -143,6 +144,11 @@ public static final RelOptRule getDirFilterOnScan(OptimizerRulesContext optimize } protected void doOnMatch(RelOptRuleCall call, Filter filterRel, Project projectRel, TableScan scanRel) { +if (wasAllPartitionsPruned) { --- End diff -- I feel this flag may lead to scenarios where pruning does not happen when it should. The flag is PruneScanRule's internal variable. The same rule could be applied to pruning for multiple tables. So, if PruneScanRule turns on this flag for T1, then we might skip pruning logic for T2 later on, when this rule is fired. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r70527515 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/ParquetPartitionDescriptor.java --- @@ -130,13 +131,13 @@ protected void createPartitionSublists() { } @Override - public TableScan createTableScan(List newPartitionLocation) throws Exception { + public TableScan createTableScan(List newPartitionLocation, String cacheFileRoot) throws Exception { --- End diff -- ParquetPartitionDescriptor does not supportsSinglePartOptimization. Should it here throws unsupportedException, and implement createTableScan(newPartitionLocation) only? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r70524663 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/FileSystemPartitionDescriptor.java --- @@ -212,8 +223,12 @@ public TableScan createTableScan(List newPartitionLocation) t } if (scanRel instanceof DrillScanRel) { - final FileSelection newFileSelection = new FileSelection(null, newFiles, getBaseTableLocation()); - final FileGroupScan newGroupScan = ((FileGroupScan)((DrillScanRel)scanRel).getGroupScan()).clone(newFileSelection); +// final FormatSelection formatSelection = (FormatSelection)((DynamicDrillTable)scanRel.getTable()).getSelection(); + final FormatSelection formatSelection = (FormatSelection)table.getSelection(); + final FileSelection newFileSelection = new FileSelection(null, newFiles, getBaseTableLocation(), + cacheFileRoot, formatSelection.getSelection().getDirStatus()); + final FileGroupScan newGroupScan = + ((FileGroupScan)((DrillScanRel)scanRel).getGroupScan()).clone(newFileSelection, cacheFileRoot); --- End diff -- This newFileSelection already has cacheFileRoot. Do you think clone() only need take FileSelection, since FileSelection has cacheFileRoot infor? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r70524553 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java --- @@ -607,7 +607,14 @@ public long getRowCount() { fileSet = Sets.newHashSet(fileNames); } -if (fileNames.isEmpty()) { +List finalFileNames; +if (fileSet != null) { + finalFileNames = Lists.newArrayList(fileSet); --- End diff -- Actually, fileNames can contain duplicates whereas fileSet has a unique list. fileSet was already present in the code earlier and used by removeUnneededRowGroups(). I am leveraging fileSet because when expanding directories it may be possible to end up with duplicate files which we don't want. I could perhaps add a comment here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r70514652 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java --- @@ -269,13 +283,54 @@ protected void doOnMatch(RelOptRuleCall call, Filter filterRel, Project projectR int recordCount = 0; int qualifiedCount = 0; -// Inner loop: within each batch iterate over the PartitionLocations -for(PartitionLocation part: partitions){ - if(!output.getAccessor().isNull(recordCount) && output.getAccessor().get(recordCount) == 1){ -newPartitions.add(part); -qualifiedCount++; +if (checkForSingle && +partitions.get(0).isCompositePartition() /* apply single partition check only for composite partitions */) { + // Inner loop: within each batch iterate over the PartitionLocations + for (PartitionLocation part : partitions) { --- End diff -- yes, it's ok to put the refactoring effort as an enhancement JIR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r70514381 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetGroupScan.java --- @@ -607,7 +607,14 @@ public long getRowCount() { fileSet = Sets.newHashSet(fileNames); } -if (fileNames.isEmpty()) { +List finalFileNames; +if (fileSet != null) { + finalFileNames = Lists.newArrayList(fileSet); --- End diff -- Can we use fileNames only? From Line 589 / 607, seems fileSet is assigned from fileNames; seems they are same under two ELSE branches. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r70002787 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java --- @@ -269,13 +283,54 @@ protected void doOnMatch(RelOptRuleCall call, Filter filterRel, Project projectR int recordCount = 0; int qualifiedCount = 0; -// Inner loop: within each batch iterate over the PartitionLocations -for(PartitionLocation part: partitions){ - if(!output.getAccessor().isNull(recordCount) && output.getAccessor().get(recordCount) == 1){ -newPartitions.add(part); -qualifiedCount++; +if (checkForSingle && +partitions.get(0).isCompositePartition() /* apply single partition check only for composite partitions */) { + // Inner loop: within each batch iterate over the PartitionLocations + for (PartitionLocation part : partitions) { --- End diff -- @jinfengni I started doing the refactoring to move this particular optimization into a partition descriptor specific class. However, I was having to propagate several internal states to the doSinglePartOpt() method...such as partition map, the referenced dirs bitset , the value vector array and others. Also, since this optimization occurs inside an outer loop, it is not straightforward without doing a broader restructuring of the PruneScanRule.doOnMatch() method. Would you be ok to defer this refactoring to an enhancement JIRA ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r70002368 --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/HivePartitionDescriptor.java --- @@ -151,7 +152,7 @@ protected void createPartitionSublists() { } @Override - public TableScan createTableScan(List newPartitions) throws Exception { + public TableScan createTableScan(List newPartitions, String cacheFileRoot) throws Exception { --- End diff -- Refactored this interface method. Updated the PR with a separate commit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r70002215 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java --- @@ -269,13 +283,54 @@ protected void doOnMatch(RelOptRuleCall call, Filter filterRel, Project projectR int recordCount = 0; int qualifiedCount = 0; -// Inner loop: within each batch iterate over the PartitionLocations -for(PartitionLocation part: partitions){ - if(!output.getAccessor().isNull(recordCount) && output.getAccessor().get(recordCount) == 1){ -newPartitions.add(part); -qualifiedCount++; +if (checkForSingle && +partitions.get(0).isCompositePartition() /* apply single partition check only for composite partitions */) { + // Inner loop: within each batch iterate over the PartitionLocations + for (PartitionLocation part : partitions) { +assert part.isCompositePartition(); +if(!output.getAccessor().isNull(recordCount) && output.getAccessor().get(recordCount) == 1) { + newPartitions.add(part); + if (isSinglePartition) { // only need to do this if we are already single partition +// compose the array of partition values for the directories that are referenced by filter: +// e.g suppose the dir hierarchy is year/quarter/month and the query is: +// SELECT * FROM T WHERE dir0=2015 AND dir1 = 'Q1', +// then for 2015/Q1/Feb, this will have ['2015', 'Q1', null] --- End diff -- I have fixed this case and updated the pull request with a separate commit (not squashed yet so you can review it separately). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r68814702 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSelection.java --- @@ -47,16 +47,25 @@ private List statuses; public List files; + /** + * root path for the selections + */ public final String selectionRoot; + /** + * root path for the metadata cache file (if any) + */ + public final String cacheFileRoot; --- End diff -- Thanks for the explanation. It makes sense and I do see why updating selectionRoot would cause problems for the queries you mentioned. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r68678168 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSelection.java --- @@ -47,16 +47,25 @@ private List statuses; public List files; + /** + * root path for the selections + */ public final String selectionRoot; + /** + * root path for the metadata cache file (if any) + */ + public final String cacheFileRoot; --- End diff -- That was my initial approach (updating the selectionRoot without keeping a separate cacheFileRoot). However, I ran into a few issues. The main one that I recall is that the dir0, dir1 etc columns are associated with the selectionRoot, so suppose I run the following query: SELECT dir0, dir1 FROM dfs.tmp.t2 WHERE dir0=2015 AND dir1='Q1' and if the selectionRoot gets modified to point to '2015/Q1' then we have lost the context of the original dir0, dir1 because everything will become relative to the new selectionRoot.This produces wrong results. The same problem occurred with a SELECT * query where the directory columns where not showing up correctly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r68673292 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java --- @@ -269,13 +283,54 @@ protected void doOnMatch(RelOptRuleCall call, Filter filterRel, Project projectR int recordCount = 0; int qualifiedCount = 0; -// Inner loop: within each batch iterate over the PartitionLocations -for(PartitionLocation part: partitions){ - if(!output.getAccessor().isNull(recordCount) && output.getAccessor().get(recordCount) == 1){ -newPartitions.add(part); -qualifiedCount++; +if (checkForSingle && +partitions.get(0).isCompositePartition() /* apply single partition check only for composite partitions */) { + // Inner loop: within each batch iterate over the PartitionLocations + for (PartitionLocation part : partitions) { +assert part.isCompositePartition(); +if(!output.getAccessor().isNull(recordCount) && output.getAccessor().get(recordCount) == 1) { + newPartitions.add(part); + if (isSinglePartition) { // only need to do this if we are already single partition +// compose the array of partition values for the directories that are referenced by filter: +// e.g suppose the dir hierarchy is year/quarter/month and the query is: +// SELECT * FROM T WHERE dir0=2015 AND dir1 = 'Q1', +// then for 2015/Q1/Feb, this will have ['2015', 'Q1', null] --- End diff -- Good point. The path [2015, null, Jan] should *not* qualify for the single partition optimization because in the general case there could be multiple 'Jan' subdirectories under the dir1 directory. We would still use the metadata cache but at the level of the dir0, so ideally one should get cacheFileRoot=/tmp/t2/2015. I have a fix for this and will update the PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r67716686 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/FileSelection.java --- @@ -47,16 +47,25 @@ private List statuses; public List files; + /** + * root path for the selections + */ public final String selectionRoot; + /** + * root path for the metadata cache file (if any) + */ + public final String cacheFileRoot; --- End diff -- When singlePartitionOpt is applied, is it possible to update selectionRoot to be cacheFileRoot? That is, we do not maintain cacheFileRoot separately. In stead, a FileSelection with updated selectionRoot is used when singlePartitionOpt is applied. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r67711244 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java --- @@ -269,13 +283,54 @@ protected void doOnMatch(RelOptRuleCall call, Filter filterRel, Project projectR int recordCount = 0; int qualifiedCount = 0; -// Inner loop: within each batch iterate over the PartitionLocations -for(PartitionLocation part: partitions){ - if(!output.getAccessor().isNull(recordCount) && output.getAccessor().get(recordCount) == 1){ -newPartitions.add(part); -qualifiedCount++; +if (checkForSingle && +partitions.get(0).isCompositePartition() /* apply single partition check only for composite partitions */) { + // Inner loop: within each batch iterate over the PartitionLocations + for (PartitionLocation part : partitions) { +assert part.isCompositePartition(); +if(!output.getAccessor().isNull(recordCount) && output.getAccessor().get(recordCount) == 1) { + newPartitions.add(part); + if (isSinglePartition) { // only need to do this if we are already single partition +// compose the array of partition values for the directories that are referenced by filter: +// e.g suppose the dir hierarchy is year/quarter/month and the query is: +// SELECT * FROM T WHERE dir0=2015 AND dir1 = 'Q1', +// then for 2015/Q1/Feb, this will have ['2015', 'Q1', null] --- End diff -- For WHERE condition dir0=2015 and dir2 = 'Jan', if the dataset happens to have only one 'Jan' under '2015' directory, will this qualify for singlePartitionOpt? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r67709299 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java --- @@ -320,7 +377,17 @@ protected void doOnMatch(RelOptRuleCall call, Filter filterRel, Project projectR condition = condition.accept(reverseVisitor); pruneCondition = pruneCondition.accept(reverseVisitor); - RelNode inputRel = descriptor.createTableScan(newPartitions); + String cacheFileRoot = null; + if (checkForSingle && isSinglePartition) { +// if metadata cache file could potentially be used, then assign a proper cacheFileRoot +String path = ""; +for (int j = 0; j <= maxIndex; j++) { + path += "/" + spInfo[j]; --- End diff -- Related to Line 313, here we do not check spInfo[j] == null ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r67602442 --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/HivePartitionDescriptor.java --- @@ -151,7 +152,7 @@ protected void createPartitionSublists() { } @Override - public TableScan createTableScan(List newPartitions) throws Exception { + public TableScan createTableScan(List newPartitions, String cacheFileRoot) throws Exception { --- End diff -- I suppose I could add a second interface method for createTableScan that allows the second parameter and keep the old method as-is. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r67602385 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java --- @@ -269,13 +283,54 @@ protected void doOnMatch(RelOptRuleCall call, Filter filterRel, Project projectR int recordCount = 0; int qualifiedCount = 0; -// Inner loop: within each batch iterate over the PartitionLocations -for(PartitionLocation part: partitions){ - if(!output.getAccessor().isNull(recordCount) && output.getAccessor().get(recordCount) == 1){ -newPartitions.add(part); -qualifiedCount++; +if (checkForSingle && +partitions.get(0).isCompositePartition() /* apply single partition check only for composite partitions */) { + // Inner loop: within each batch iterate over the PartitionLocations + for (PartitionLocation part : partitions) { --- End diff -- Yes, the rule has become too complex. I will go ahead and at least refactor this specific optimization. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r67602350 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java --- @@ -269,13 +283,54 @@ protected void doOnMatch(RelOptRuleCall call, Filter filterRel, Project projectR int recordCount = 0; int qualifiedCount = 0; -// Inner loop: within each batch iterate over the PartitionLocations -for(PartitionLocation part: partitions){ - if(!output.getAccessor().isNull(recordCount) && output.getAccessor().get(recordCount) == 1){ -newPartitions.add(part); -qualifiedCount++; +if (checkForSingle && +partitions.get(0).isCompositePartition() /* apply single partition check only for composite partitions */) { + // Inner loop: within each batch iterate over the PartitionLocations + for (PartitionLocation part : partitions) { +assert part.isCompositePartition(); +if(!output.getAccessor().isNull(recordCount) && output.getAccessor().get(recordCount) == 1) { + newPartitions.add(part); + if (isSinglePartition) { // only need to do this if we are already single partition +// compose the array of partition values for the directories that are referenced by filter: +// e.g suppose the dir hierarchy is year/quarter/month and the query is: +// SELECT * FROM T WHERE dir0=2015 AND dir1 = 'Q1', +// then for 2015/Q1/Feb, this will have ['2015', 'Q1', null] +// Note that we are not using the PartitionLocation here but composing a different list because +// we are only interested in the directory columns that are referenced in the filter condition. not +// the SELECT list or other parts of the query. +Pair p = composePartition(referencedDirsBitSet, partitionMap, vectors, recordCount); +String[] parts = p.getLeft(); +int tmpIndex = p.getRight(); +if (spInfo == null) { + spInfo = parts; + maxIndex = tmpIndex; +} else if (maxIndex != tmpIndex) { + isSinglePartition = false; + break; +} else { + // we only want to compare until the maxIndex inclusive since subsequent values would be null + for (int j = 0; j <= maxIndex; j++) { +if (spInfo[j] == null // prefixes should be non-null --- End diff -- If the query has for example WHERE dir2 = 'January' and does not have any condition on dir0 or dir1, then the spInfo array itself will be non-null but will have null elements in it: [null, null, 'January'] and maxIndex = 2. In this case, the single partition optimization should not be applied. I can add a comment here to explain. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r67567993 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java --- @@ -269,13 +283,54 @@ protected void doOnMatch(RelOptRuleCall call, Filter filterRel, Project projectR int recordCount = 0; int qualifiedCount = 0; -// Inner loop: within each batch iterate over the PartitionLocations -for(PartitionLocation part: partitions){ - if(!output.getAccessor().isNull(recordCount) && output.getAccessor().get(recordCount) == 1){ -newPartitions.add(part); -qualifiedCount++; +if (checkForSingle && +partitions.get(0).isCompositePartition() /* apply single partition check only for composite partitions */) { + // Inner loop: within each batch iterate over the PartitionLocations + for (PartitionLocation part : partitions) { +assert part.isCompositePartition(); +if(!output.getAccessor().isNull(recordCount) && output.getAccessor().get(recordCount) == 1) { + newPartitions.add(part); + if (isSinglePartition) { // only need to do this if we are already single partition +// compose the array of partition values for the directories that are referenced by filter: +// e.g suppose the dir hierarchy is year/quarter/month and the query is: +// SELECT * FROM T WHERE dir0=2015 AND dir1 = 'Q1', +// then for 2015/Q1/Feb, this will have ['2015', 'Q1', null] +// Note that we are not using the PartitionLocation here but composing a different list because +// we are only interested in the directory columns that are referenced in the filter condition. not +// the SELECT list or other parts of the query. +Pair p = composePartition(referencedDirsBitSet, partitionMap, vectors, recordCount); +String[] parts = p.getLeft(); +int tmpIndex = p.getRight(); +if (spInfo == null) { + spInfo = parts; + maxIndex = tmpIndex; +} else if (maxIndex != tmpIndex) { + isSinglePartition = false; + break; +} else { + // we only want to compare until the maxIndex inclusive since subsequent values would be null + for (int j = 0; j <= maxIndex; j++) { +if (spInfo[j] == null // prefixes should be non-null --- End diff -- Form Line 305-306, spInfo and maxIndex are in sync. Why will we have spInfo[j] == null, when j <= maxIndex? I thought maxIndex is obtained such that element in spInfo is not null. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r67567703 --- Diff: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/planner/sql/HivePartitionDescriptor.java --- @@ -151,7 +152,7 @@ protected void createPartitionSublists() { } @Override - public TableScan createTableScan(List newPartitions) throws Exception { + public TableScan createTableScan(List newPartitions, String cacheFileRoot) throws Exception { --- End diff -- From the comment, cacheFileRoot could be null. But is it possible to not adding this parameter to HivePartitionDescriptor? After all, cacheFileRoot will not be applied to hive partition. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user jinfengni commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r67567343 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/partition/PruneScanRule.java --- @@ -269,13 +283,54 @@ protected void doOnMatch(RelOptRuleCall call, Filter filterRel, Project projectR int recordCount = 0; int qualifiedCount = 0; -// Inner loop: within each batch iterate over the PartitionLocations -for(PartitionLocation part: partitions){ - if(!output.getAccessor().isNull(recordCount) && output.getAccessor().get(recordCount) == 1){ -newPartitions.add(part); -qualifiedCount++; +if (checkForSingle && +partitions.get(0).isCompositePartition() /* apply single partition check only for composite partitions */) { + // Inner loop: within each batch iterate over the PartitionLocations + for (PartitionLocation part : partitions) { --- End diff -- Do you think it's possible to refactor this part of code such that only file system prune scan will have this logic? This simplePartition optimization only applies to file system prune. Also, doOnMatch() method is getting bigger. Something like: descriptor has supportsSinglePartOptimization(), and prune scan rule has doSinglePartOpt(), which is by default a no-op, and has implementation for file system prune rule. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r67392462 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java --- @@ -208,8 +209,18 @@ public DrillTable isReadable(DrillFileSystem fs, FileSelection selection, FileSystemPlugin fsPlugin, String storageEngineName, String userName) throws IOException { // TODO: we only check the first file for directory reading. - if(selection.containsDirectories(fs)){ -if(isDirReadable(fs, selection.getFirstPath(fs))){ + if(selection.containsDirectories(fs)) { +Path dirMetaPath = new Path(selection.getSelectionRoot(), Metadata.METADATA_DIRECTORIES_FILENAME); +if (fs.exists(dirMetaPath)) { + ParquetTableMetadataDirs mDirs = Metadata.readMetadataDirs(fs, dirMetaPath.toString()); + if (mDirs.getDirectories().size() > 0) { +FileSelection dirSelection = FileSelection.createFromDirectories(mDirs.getDirectories(), selection); +dirSelection.setExpandedPartial(); +return new DynamicDrillTable(fsPlugin, storageEngineName, userName, --- End diff -- make sense.. I will add a comment. thanks for reviewing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r67387102 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java --- @@ -208,8 +209,18 @@ public DrillTable isReadable(DrillFileSystem fs, FileSelection selection, FileSystemPlugin fsPlugin, String storageEngineName, String userName) throws IOException { // TODO: we only check the first file for directory reading. - if(selection.containsDirectories(fs)){ -if(isDirReadable(fs, selection.getFirstPath(fs))){ + if(selection.containsDirectories(fs)) { +Path dirMetaPath = new Path(selection.getSelectionRoot(), Metadata.METADATA_DIRECTORIES_FILENAME); +if (fs.exists(dirMetaPath)) { + ParquetTableMetadataDirs mDirs = Metadata.readMetadataDirs(fs, dirMetaPath.toString()); + if (mDirs.getDirectories().size() > 0) { +FileSelection dirSelection = FileSelection.createFromDirectories(mDirs.getDirectories(), selection); +dirSelection.setExpandedPartial(); +return new DynamicDrillTable(fsPlugin, storageEngineName, userName, --- End diff -- A comment then, perhaps. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user amansinha100 commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r67289117 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java --- @@ -208,8 +209,18 @@ public DrillTable isReadable(DrillFileSystem fs, FileSelection selection, FileSystemPlugin fsPlugin, String storageEngineName, String userName) throws IOException { // TODO: we only check the first file for directory reading. - if(selection.containsDirectories(fs)){ -if(isDirReadable(fs, selection.getFirstPath(fs))){ + if(selection.containsDirectories(fs)) { +Path dirMetaPath = new Path(selection.getSelectionRoot(), Metadata.METADATA_DIRECTORIES_FILENAME); +if (fs.exists(dirMetaPath)) { + ParquetTableMetadataDirs mDirs = Metadata.readMetadataDirs(fs, dirMetaPath.toString()); + if (mDirs.getDirectories().size() > 0) { +FileSelection dirSelection = FileSelection.createFromDirectories(mDirs.getDirectories(), selection); +dirSelection.setExpandedPartial(); +return new DynamicDrillTable(fsPlugin, storageEngineName, userName, --- End diff -- I intentionally don't call isDirReadable() here because that method returns true if a metadata cache file exists and I am doing a similar check for the directories file here with fs.exists(dirMetaPath).If this check fails, we will fall through to the old code path (line 223) which does check isDirReadable(). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
Github user parthchandra commented on a diff in the pull request: https://github.com/apache/drill/pull/519#discussion_r67272839 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetFormatPlugin.java --- @@ -208,8 +209,18 @@ public DrillTable isReadable(DrillFileSystem fs, FileSelection selection, FileSystemPlugin fsPlugin, String storageEngineName, String userName) throws IOException { // TODO: we only check the first file for directory reading. - if(selection.containsDirectories(fs)){ -if(isDirReadable(fs, selection.getFirstPath(fs))){ + if(selection.containsDirectories(fs)) { +Path dirMetaPath = new Path(selection.getSelectionRoot(), Metadata.METADATA_DIRECTORIES_FILENAME); +if (fs.exists(dirMetaPath)) { + ParquetTableMetadataDirs mDirs = Metadata.readMetadataDirs(fs, dirMetaPath.toString()); + if (mDirs.getDirectories().size() > 0) { +FileSelection dirSelection = FileSelection.createFromDirectories(mDirs.getDirectories(), selection); +dirSelection.setExpandedPartial(); +return new DynamicDrillTable(fsPlugin, storageEngineName, userName, --- End diff -- Are you missing the isDirReadable() check here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill pull request #519: DRILL-4530: Optimize partition pruning with metadat...
GitHub user amansinha100 opened a pull request: https://github.com/apache/drill/pull/519 DRILL-4530: Optimize partition pruning with metadata caching for the ⦠â¦single partition case. - Enhance PruneScanRule to detect single partitions based on referenced dirs in the filter. - Keep a new status of EXPANDED_PARTIAL for FileSelection. - Create separate .directories metadata file to prune directories first before files. - Introduce cacheFileRoot attribute to keep track of the parent directory of the cache file after partition pruning. You can merge this pull request into a Git repository by running: $ git pull https://github.com/amansinha100/incubator-drill DRILL-4530-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/519.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #519 commit 9c9687e804fa05c8f4b7b065738c458cb88bf5c4 Author: Aman Sinha Date: 2016-03-25T19:55:59Z DRILL-4530: Optimize partition pruning with metadata caching for the single partition case. - Enhance PruneScanRule to detect single partitions based on referenced dirs in the filter. - Keep a new status of EXPANDED_PARTIAL for FileSelection. - Create separate .directories metadata file to prune directories first before files. - Introduce cacheFileRoot attribute to keep track of the parent directory of the cache file after partition pruning. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---