Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16723 )

Change subject: IMPALA-10314: Optimize planning time for simple limits
......................................................................


Patch Set 3:

(4 comments)

Looks good!

http://gerrit.cloudera.org:8080/#/c/16723/3/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/16723/3/common/thrift/ImpalaService.thrift@601
PS3, Line 601: (1 row per file)
> Yeah, I had initially thought of that but the file metadata as it exists to
Done


http://gerrit.cloudera.org:8080/#/c/16723/3/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/16723/3/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@180
PS3, Line 180: order-by
> Not sure what you meant here.  In the presence of order-by, we cannot push
Yes, I agree order-by should not be allowed. Sorry I was not thinking when I 
type.


http://gerrit.cloudera.org:8080/#/c/16723/3/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java:

http://gerrit.cloudera.org:8080/#/c/16723/3/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@209
PS3, Line 209: for (FeFsPartition p : partitions) {
             :         numRows += p.getNumFileDescriptors();
             :         prunedPartitions.add(p);
             :         if (numRows >= analyzer.getSimpleLimitStatus().second) {
             :           // here we only limit the partitions; later in 
HdfsScanNode we will
             :           // limit the file descriptors within a partition
             :           break;
             :         }
> I think on balance it'd be better to make the partition selection determini
Yes, I agree and have experienced similar situations before. I recall at the 
end we had to apply the randomness.


http://gerrit.cloudera.org:8080/#/c/16723/3/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@209
PS3, Line 209: for (FeFsPartition p : partitions) {
             :         numRows += p.getNumFileDescriptors();
             :         prunedPartitions.add(p);
             :         if (numRows >= analyzer.getSimpleLimitStatus().second) {
             :           // here we only limit the partitions; later in 
HdfsScanNode we will
             :           // limit the file descriptors within a partition
             :           break;
             :         }
> Yes, these are the pruned partitions that will be sent over to the runtime.
Yes, I think I like the always true hint option. It seems the new code should 
check that first and then allow the WHERE clause. Customers can forget the hint 
here and there in the query and end up with incorrect results.

Wait and see on randomize the partitions to scan sounds good to me.



--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 3
Gerrit-Owner: Aman Sinha <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Shant Hovsepian <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Tue, 17 Nov 2020 20:08:56 +0000
Gerrit-HasComments: Yes

Reply via email to