Qifan Chen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16792 )

Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint
......................................................................


Patch Set 5:

(3 comments)

Looks good to me!

http://gerrit.cloudera.org:8080/#/c/16792/5/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/16792/5/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@223
PS5, Line 223: if (getTableRefs().size() == 1) return true;
Should we remove this? It seems hasConvertLimitToSampleHint() can return true 
or false depending on whether the hint has been set to the only table ref here. 
It could be not set.


http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java:

http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@209
PS2, Line 209: estimatedTotalRows
> I made this change to use a scaled down value of the estimated row count  (
Sounds about right.

I also like the idea to specify the sample size in terms of number rows, which 
will speed up the sampling of a few rows from a very large table, where %1 
could be in the order of million rows. I can file a JIRA on this and work on it 
after the min/max work.


http://gerrit.cloudera.org:8080/#/c/16792/5/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java:

http://gerrit.cloudera.org:8080/#/c/16792/5/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@217
PS5, Line 217: partitions.size()/numTotalPartitions
Cool! This will work well when the partitions are about the same size, which is 
mostly true with hash partitions.

For other partition schemes with unequal sizes, such as range partitioning, I 
wonder if the use of HdfsPartition::numRows_ would work:  sample rate = #rows 
to return / # rows in the surviving partitions.



--
To view, visit http://gerrit.cloudera.org:8080/16792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b
Gerrit-Change-Number: 16792
Gerrit-PatchSet: 5
Gerrit-Owner: Aman Sinha <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Qifan Chen <[email protected]>
Gerrit-Reviewer: Shant Hovsepian <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Fri, 04 Dec 2020 14:34:48 +0000
Gerrit-HasComments: Yes

Reply via email to