Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18141 )
Change subject: IMPALA-10898: Add runtime IN-list filters for ORC tables ...................................................................... Patch Set 14: (14 comments) Thank Qifan for the detailed review! http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter-ir.cc File be/src/util/in-list-filter-ir.cc: http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter-ir.cc@26 PS13, Line 26: if (UNLIKELY(val == nullptr)) { > UNLIKELY Done http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter-ir.cc@30 PS13, Line 30: if (UNLIKELY(values_.size() >= entry_limit_ || str_values_.size() >= entry_limit_)) { > UNLIKELY Done http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter-ir.cc@55 PS13, Line 55: if (UNLIKELY(s->ptr == nullptr)) { : contains_null_ = true > nit. should we check null-ness again? See line at 26. The default constructor of StringValue creates a null 'ptr'. I think we'd better check this. https://github.com/apache/impala/blob/fe04c50/be/src/runtime/string-value.h#L51 http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter-ir.cc@58 PS13, Line 58: str_total > nit. Probably should be named as str_total_size_. Done http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter.h File be/src/util/in-list-filter.h: http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter.h@39 PS13, Line 39: InListFilter(ColumnType type, uint32_t entry_limit, bool contains_null = false); > Include contains_null and column type here. Done http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter.cc File be/src/util/in-list-filter.cc: http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter.cc@63 PS13, Line 63: retur > "return false" here helps with release code. Done http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter.cc@71 PS13, Line 71: if (type.type == TYPE > nit. it is better to supply the type in the cstr. Done http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter.cc@78 PS13, Line 78: : InListFilter* InListFilter::Create(const InListFilterPB& protobuf, ColumnType type, : uint32_t entry_limit, ObjectPool* pool) { > nit. probably should inited in the cstr. Done http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter.cc@100 PS13, Line 100: break; > return null? Done http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter.cc@112 PS13, Line 112: : : void InListFilter::ToProtobuf(const InListFilter* filter, InListFilterPB* protobuf) { : DCHECK(protobuf != nullptr); : i > Other fields that are not copied: type_, contains_null_, str_size_ and ent Oops, we don't need this method. Removed it. http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter.cc@142 PS13, Line 142: if (type_ == TYPE_STRING || type_ == TYPE_VARCHAR || type_ == TYPE_CHAR) { > Same comment for Copy method: missing fields. Done http://gerrit.cloudera.org:8080/#/c/18141/13/be/src/util/in-list-filter.cc@174 PS13, Line 174: > should handle null case. Oops! Done. http://gerrit.cloudera.org:8080/#/c/18141/13/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test File testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test: http://gerrit.cloudera.org:8080/#/c/18141/13/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@787 PS13, Line 787: broadcast > May repeat this test with partition HJ to verify that in-list filters is no Done. I change the test to verify distributed plan in order to show the partitioned join. Otherwise it's getting the single node plan by default. http://gerrit.cloudera.org:8080/#/c/18141/13/testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test File testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test: http://gerrit.cloudera.org:8080/#/c/18141/13/testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test@127 PS13, Line 127: > may add a test on date column type. Sure. The ORC date_tbl is corrupted and need to wait for https://gerrit.cloudera.org/c/18262/ to be merged. -- To view, visit http://gerrit.cloudera.org:8080/18141 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I25080628233799aa0b6be18d5a832f1385414501 Gerrit-Change-Number: 18141 Gerrit-PatchSet: 14 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Qifan Chen <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Wed, 23 Feb 2022 07:33:09 +0000 Gerrit-HasComments: Yes
