Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/16723 )
Change subject: IMPALA-10314: Optimize planning time for simple limits ...................................................................... Patch Set 5: (2 comments) Looks very good! The empty parquet data file may be a corner case to worry about. http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@870 PS5, Line 870: fd.getFileLength() Is it possible for a Parquet data file with empty # of rows pass this test? Note that due to the meta-data portion, such a file will have some number of bytes. See one case here https://github.com/G-Research/ParquetSharp/issues/110. If we can look at the meta-data of such a file, the number of rows is right there. 871 struct FileMetaData { 872 /** Version of this file **/ 873 1: required i32 version 874 875 /** Parquet schema for this file. This schema contains metadata for all the columns. 876 * The schema is represented as a tree with a single root. The nodes of the tree 877 * are flattened to a list by doing a depth-first traversal. 878 * The column metadata contains the path in the schema for that column which can be 879 * used to map columns to nodes in the schema. 880 * The first element is the root **/ 881 2: required list<SchemaElement> schema; 882 883 /** Number of rows in this file **/ 884 3: required i64 num_rows http://gerrit.cloudera.org:8080/#/c/16723/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@947 PS5, Line 947: if (isSimpleLimit && simpleLimitNumRows == : analyzer.getSimpleLimitStatus().second) { : // for the simple limit case if the estimated rows has already reached the limit : // there's no need to process more partitions : break; : } This is good. -- To view, visit http://gerrit.cloudera.org:8080/16723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9d6a79263bc092e0f3e9a1d72da5618f3cc35574 Gerrit-Change-Number: 16723 Gerrit-PatchSet: 5 Gerrit-Owner: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com> Gerrit-Reviewer: Shant Hovsepian <sh...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Thu, 19 Nov 2020 14:30:23 +0000 Gerrit-HasComments: Yes