[Impala-ASF-CR] IMPALA-5309: Adds TABLESAMPLE clause for HDFS table refs.

Marcel Kornacker (Code Review) Sat, 13 May 2017 11:36:54 -0700

Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-5309: Adds TABLESAMPLE clause for HDFS table refs.
......................................................................



Patch Set 1:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/6868/1/fe/src/main/java/org/apache/impala/analysis/TableRef.java
File fe/src/main/java/org/apache/impala/analysis/TableRef.java:

Line 355:     if (!(this instanceof BaseTableRef) ||
move || to next line


http://gerrit.cloudera.org:8080/#/c/6868/1/fe/src/main/java/org/apache/impala/analysis/TableSampleParams.java
File fe/src/main/java/org/apache/impala/analysis/TableSampleParams.java:

Line 24:  * Represents a TABLESAMPLE clause.
TableSampleClause as the class name then?


http://gerrit.cloudera.org:8080/#/c/6868/1/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

Line 808:    * Returns a stripped-down clone of this partition with a new list 
of file descriptors.
what does stripped-down mean?


http://gerrit.cloudera.org:8080/#/c/6868/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

Line 1940:    * The 'parcentBytes' parameter must be between 0 and 100.
parcent


http://gerrit.cloudera.org:8080/#/c/6868/1/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java:

Line 158
comment why you're not tracking referenced partitions here


http://gerrit.cloudera.org:8080/#/c/6868/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

Line 92:  * partition pruning and evaluation of other predicates happens after 
sampling.
why should partition  pruning happen after sampling?


Line 127:   private List<HdfsPartition> sample_;
sampled_partitions_? sample_ is a bit sparse


Line 264:    * sample and then prune partitions from the sample.
why is that?

given that semantically we should be sampling rows, not files, it's not clear 
to me that this order is more correct that having partition pruning precede 
file sampling. if you sample before partition pruning, you might end up 
dropping partitions with very few large files.

in my mind, sampling should try to avoid dropping partitions altogether (unless 
you really only have, say, 10 rows in a partition, and you're looking for a 5% 
sample).


Line 715:     // Adjust the cardinality based on table sampling.
separate logically separate block w/ blank line


http://gerrit.cloudera.org:8080/#/c/6868/1/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

Line 339:     // Not applicable to non-HDFS tables.
what about collection refs?


Line 340:     //AnalysisError("select * from functional_kudu.alltypes 
tablesample system (10)",
why commented out?


http://gerrit.cloudera.org:8080/#/c/6868/1/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
File fe/src/test/java/org/apache/impala/analysis/ParserTest.java:

Line 617:     ParserError("select * from t tablesample (10) a");
add a negative percentage somewhere


http://gerrit.cloudera.org:8080/#/c/6868/1/testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
File testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test:

Line 30: # that is refected in the cardinality.
reflected


http://gerrit.cloudera.org:8080/#/c/6868/1/tests/query_test/test_tablesample.py
File tests/query_test/test_tablesample.py:

Line 44:   def test_tablesample(self, vector):
why not simply put this into a query .test file?


-- 
To view, visit http://gerrit.cloudera.org:8080/6868
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ief112cfb1e4983c5d94c08696dc83da9ccf43f70
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5309: Adds TABLESAMPLE clause for HDFS table refs.

Reply via email to