Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16723 )

Change subject: IMPALA-10314: Optimize planning time for simple limits
......................................................................


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/cup/sql-parser.cup@3115
PS5, Line 3115:   KW_WHERE opt_plan_hints:pred_hints expr:e
I guess this is a bit limiting in the it applies only to the whole where 
clause. Should it be part of the expr production below so it can be attached to 
any expression?

I don't think this affects the functionality of this patch, since we're only 
checking the top-level statement anyway, but it seems like itwould me more 
elegant to have the expr hint be associated with the expr in the parser?

If there are complications with that, maybe a comment here explaining the 
limitation would be sufficient.


http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/analysis/Predicate.java
File fe/src/main/java/org/apache/impala/analysis/Predicate.java:

http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/analysis/Predicate.java@30
PS5, Line 30: isAlwaysTrue_
maybe hasAlwaysTrueHint_ just to make it crystal-clear that it's not actually a 
guarantee?


http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@869
PS5, Line 869:             if (fsHasBlocks && fd.getNumFileBlocks() == 0) 
continue;
nit: use braces for multi-line if


http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@870
PS5, Line 870: fd.getFileLength()
> Yes. Totally agree. We probably can live with the 0-row data files through
We already had to deal with a similar issue here
https://impala.apache.org/docs/build/html/topics/impala_optimize_partition_key_scans.html
 ; we should document similarly

Generally it doesn't make any sense to write files with 0 rows and it should be 
rare. Our experience is that some misbehaving tools can generate 0 row files 
(we've seen Spark do it with issues like 
https://issues.apache.org/jira/browse/SPARK-10216).

You're right that the files are non-empty because they have the footer with the 
schema. I don't think there's an upper bound on the size either though, cause 
they could have an arbitrarily complex scheme.



--
To view, visit http://gerrit.cloudera.org:8080/16723
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574
Gerrit-Change-Number: 16723
Gerrit-PatchSet: 5
Gerrit-Owner: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Shant Hovsepian <sh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Fri, 20 Nov 2020 01:29:42 +0000
Gerrit-HasComments: Yes

Reply via email to