Alex Behm has posted comments on this change. Change subject: IMPALA-2328: Address additional comments ......................................................................
Patch Set 1: (9 comments) http://gerrit.cloudera.org:8080/#/c/6147/1/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 494: SchemaNode* node = NULL; nullptr Line 501: // In this case, we are selecting a column that is not in the file. We would fill Remove "In this case", that's clear. Line 502: // this column with NULL during the scan, so any predicate would fail. Return early. Suggest minor rewording for clarity: We would set its slot to NULL during the scan, ... so any predicate would evaluate to false. Line 507: if (pos_field) { Does the FE guarantee that such predicates are not sent to the BE for min/max filtering? You can try something like: select pos from functional_parquet.complextypestbl.int_array where pos < 5; I believe it will hit the DCHECK here. http://gerrit.cloudera.org:8080/#/c/6147/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: Line 342: if (slot == null) continue; You can check slot.getDesc().getColumn() to see if the slot corresponds to a real column. If getColumn() returns null, then you can have a 'pos' slot. http://gerrit.cloudera.org:8080/#/c/6147/1/testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test File testdata/workloads/functional-query/queries/QueryTest/parquet_stats.test: Line 234: drop table if exists name_resolve; no need for this if we are using unique_database Line 256: ---- QUERY The above test seems sufficient. Line 264: select count(*) from functional_parquet.alltypessmall where '0' > cast(tinyint_col as string) Might be good to have an example here that demonstrates why it's not easy to support explicit casts. Maybe cast(bigint_col as tinyint) < 10 http://gerrit.cloudera.org:8080/#/c/6147/1/tests/query_test/test_parquet_stats.py File tests/query_test/test_parquet_stats.py: Line 24: This suite tests support for Parquet statistics using SQL queries. Tests runtime optimizations based on Parquet statistics? -- To view, visit http://gerrit.cloudera.org:8080/6147 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I54c205fad7afc4a0b0a7d0f654859de76db29a02 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-HasComments: Yes
