Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18141 )
Change subject: IMPALA-10898: Add runtime IN-list filters for ORC tables ...................................................................... Patch Set 18: (10 comments) Thanks! Addressed the comments. http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/exec/hdfs-orc-scanner.h File be/src/exec/hdfs-orc-scanner.h: http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/exec/hdfs-orc-scanner.h@427 PS17, Line 427: Decides > nit. Decides Done http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/exec/hdfs-orc-scanner.h@433 PS17, Line 433: Pushab > nit. IsPushableInListFilter. Done http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/exec/hdfs-orc-scanner.cc File be/src/exec/hdfs-orc-scanner.cc: http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/exec/hdfs-orc-scanner.cc@1486 PS17, Line 1486: if (in_list_filter->ContainsNull()) { : // Add a null literal with type. : > I wonder how the ORC layer recognizes the need to filter null. The logic h It's not a column type. We are using this constructor to create a null literal with type: https://github.com/apache/orc/blob/rel/release-1.7.0/c++/include/orc/sargs/Literal.hh#L72-L75 ORC has stats about nulls. They will be used in predicate pushdown. We have some tests verifying these: https://github.com/apache/impala/blob/873fe2e5241c5714dfd94a186d524edc1cbad0ad/testdata/workloads/functional-query/queries/QueryTest/orc-stats.test#L787-L863 http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/service/query-options.cc File be/src/service/query-options.cc: http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/service/query-options.cc@974 PS17, Line 974: // Parse and verify the enabled runtime filter types. > nit. Parse and verify the enabled runtime filter types. Done http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/util/in-list-filter-ir.cc File be/src/util/in-list-filter-ir.cc: http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/util/in-list-filter-ir.cc@18 PS17, Line 18: #include "common/object-pool.h" : #include "util/in-list-filter.h" : > nit. May arrange them in ascending order. Done http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/util/in-list-filter.cc File be/src/util/in-list-filter.cc: http://gerrit.cloudera.org:8080/#/c/18141/17/be/src/util/in-list-filter.cc@38 PS17, Line 38: switch (col_type.type) { : case TYPE_TINYINT: : v = *reinterpret_cast<const int8_t*>(val); : break; : case TYPE_SMALLINT: : v = *reinterpret_cast<const int16_t*>(val); : break; : case TYPE_INT: : v = *reinterpret_cast<const int32_t*>(val); : break; : case TYPE_BIGINT: : v = *reinterpret_cast<const int64_t*>(val); : break; : case TYPE_DATE: : v = reinterpret_cast<const DateValue*>(val)->Value(); : break; : case TYPE_STRING: : case TYPE_VARCHAR: : s = reinterpret_cast<const StringValue*>(val); : return str_values_.find(string(s->ptr, s->len)) != str_values_.end(); : case TYPE_CHAR: : return str_values_.find(string(reinterpret_cast<const char*>(val), col_type.len)) : != str_values_.end(); : default: : DCHECK(false) << "Not support IN-list filter type: " << TypeToString(type_); : return false; : } > Future improvement. Yeah, planned to do this in IMPALA-11141. http://gerrit.cloudera.org:8080/#/c/18141/17/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: http://gerrit.cloudera.org:8080/#/c/18141/17/common/thrift/ImpalaService.thrift@725 PS17, Line 725: // Maximum number of distinct entries in a runtime in-list filter. > nit. distinct Done http://gerrit.cloudera.org:8080/#/c/18141/17/common/thrift/PlanNodes.thrift File common/thrift/PlanNodes.thrift: http://gerrit.cloudera.org:8080/#/c/18141/17/common/thrift/PlanNodes.thrift@a134 PS17, Line 134: > nit. dropped. These are removed. Do you mean we should keep them? http://gerrit.cloudera.org:8080/#/c/18141/17/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: http://gerrit.cloudera.org:8080/#/c/18141/17/testdata/datasets/functional/functional_schema_template.sql@274 PS17, Line 274: neg > nit. may spell out the entire word: alltypestiny_negative Done http://gerrit.cloudera.org:8080/#/c/18141/17/testdata/datasets/functional/schema_constraints.csv File testdata/datasets/functional/schema_constraints.csv: http://gerrit.cloudera.org:8080/#/c/18141/17/testdata/datasets/functional/schema_constraints.csv@323 PS17, Line 323: alltypestiny_neg > nit. alltypestiny_negative Done -- To view, visit http://gerrit.cloudera.org:8080/18141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I25080628233799aa0b6be18d5a832f1385414501 Gerrit-Change-Number: 18141 Gerrit-PatchSet: 18 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Fri, 25 Feb 2022 00:33:53 +0000 Gerrit-HasComments: Yes
