Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/16723 )
Change subject: IMPALA-10314: Optimize planning time for simple limits ...................................................................... Patch Set 3: (4 comments) Looks good! http://gerrit.cloudera.org:8080/#/c/16723/3/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/16723/3/common/thrift/ImpalaService.thrift@601 PS3, Line 601: (1 row per file) > Yeah, I had initially thought of that but the file metadata as it exists to Done http://gerrit.cloudera.org:8080/#/c/16723/3/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java: http://gerrit.cloudera.org:8080/#/c/16723/3/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@180 PS3, Line 180: order-by > Not sure what you meant here. In the presence of order-by, we cannot push Yes, I agree order-by should not be allowed. Sorry I was not thinking when I type. http://gerrit.cloudera.org:8080/#/c/16723/3/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java: http://gerrit.cloudera.org:8080/#/c/16723/3/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@209 PS3, Line 209: for (FeFsPartition p : partitions) { : numRows += p.getNumFileDescriptors(); : prunedPartitions.add(p); : if (numRows >= analyzer.getSimpleLimitStatus().second) { : // here we only limit the partitions; later in HdfsScanNode we will : // limit the file descriptors within a partition : break; : } > I think on balance it'd be better to make the partition selection determini Yes, I agree and have experienced similar situations before. I recall at the end we had to apply the randomness. http://gerrit.cloudera.org:8080/#/c/16723/3/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@209 PS3, Line 209: for (FeFsPartition p : partitions) { : numRows += p.getNumFileDescriptors(); : prunedPartitions.add(p); : if (numRows >= analyzer.getSimpleLimitStatus().second) { : // here we only limit the partitions; later in HdfsScanNode we will : // limit the file descriptors within a partition : break; : } > Yes, these are the pruned partitions that will be sent over to the runtime. Yes, I think I like the always true hint option. It seems the new code should check that first and then allow the WHERE clause. Customers can forget the hint here and there in the query and end up with incorrect results. Wait and see on randomize the partitions to scan sounds good to me. -- To view, visit http://gerrit.cloudera.org:8080/16723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574 Gerrit-Change-Number: 16723 Gerrit-PatchSet: 3 Gerrit-Owner: Aman Sinha <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Shant Hovsepian <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Tue, 17 Nov 2020 20:08:56 +0000 Gerrit-HasComments: Yes
