wangning has posted comments on this change. ( http://gerrit.cloudera.org:8080/16674 )
Change subject: KUDU-1644 hash-partition based in-list predicate optimization ...................................................................... Patch Set 10: (17 comments) http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/partition.h File src/kudu/common/partition.h: http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/partition.h@267 PS8, Line 267: return hash_bucket_schemas_; > nit: probably better to declare this function in the header file and define Done http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/partition.h@266 PS8, Line 266: const std::vector<HashBucketSchema>& hash_partition_schemas() const { : return hash_bucket_schemas_; : } : > +1 Done http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/partition.h@266 PS8, Line 266: const std::vector<HashBucketSchema>& hash_partition_schemas() const { : return hash_bucket_schemas_; : } : > It'd be better to return bool instead of using as an out parameter. Tried to get index of hash_bucket_schema, and modified it to use int32_t as return. http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/partition.h@273 PS8, Line 273: atus GetRang > Shouldn't this check be directly below the call to FindColumn()? What happe Done http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec-test.cc File src/kudu/common/scan_spec-test.cc: http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec-test.cc@518 PS8, Line 518: > nit: "partition", same elsewhere Thanks http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec-test.cc@553 PS8, Line 553: SCOPED_TRACE(spec_p3.ToString(schema)); : EXPECT_EQ("PK >= (int8 a=4, int8 b=10, int8 c=-128) AND " : "PK < (int8 a=8, > Could you also add a case for hash(a), hash(b)? Thx for point out. I didn't thought about it throughly. I found it doesn't work well in this case. I change the method to detect is a key is in a partition and patched more cases. I think it may works well now. http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec-test.cc@563 PS8, Line 563: EXPECT_EQ("PK >= (int8 a=0, int8 b=40, int8 c=-128) AND " : "PK < (int8 a=9, int8 b=71, int8 c=-128) AND " : "a IN (0, 2, 5, 9) AND b IN (40, 60, 70)", : spec_p4.ToString(schema)); : : spec_p5.PruneHashForInlistIfPossible(schema, partitions[4], partition_schema); : spec_p5.OptimizeScan(schema, &arena_, true); : : SCOPED_TRACE(spec_p5.ToString(schema)); : EXPECT_EQ("PK >= (int8 a=0, int8 b=20, int8 c=-128) AND " : "PK < (int8 a=9, int8 b=51, int8 c=-128) AND " : "a IN (0, 2, 5, 9) AND b IN (20, 30, 50)", : spec_p5.ToString(schema)); : : spec_p6.PruneHashForInlistIfPossible(schema, partitions[5], partition_schema); : spec_p6.OptimizeScan(schema, &arena_, true); : : SCOPED_TRACE(spec_p6.ToString(schema)); : EXPECT_EQ("PK >= (int8 a=0, int8 b=10, int8 c=-128) AND " : "PK < (int8 a=9, int8 b=81, int8 c=-128) AND " : "a IN (0, 2, 5, 9) AND b IN (10, 80)", : spec_p6.ToString(schema)); : : spec_p7.PruneHashForInlistIfPossible(schema, partitions[6], partition_schema); : spec_p7.OptimizeScan(schema, &arena_, true); > nit: seems like this code is reused a bunch in these tests. Maybe define a Done http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.h File src/kudu/common/scan_spec.h: http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.h@62 PS8, Line 62: is pre > is empty removed this method and merged into CanShortcuit method http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.h@80 PS8, Line 80: hash(onekey), has > I suppose the comment indicates, "supports hash schema with only one key"? Patched and can you have an more eye on the comment if possible cause I don't know I delivered the idea properly for poor english skill. Thank you! http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.cc File src/kudu/common/scan_spec.cc: http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.cc@88 PS8, Line 88: return predicate.second.predicate_type() == PredicateType::None > Why shouldn't we put the InList empty check here? Done http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.cc@186 PS8, Line 186: int hash_idx = partition_schema.TryGetSingleColumnHashPartitionIndex(schema, idx); : if (hash_idx == -1) co > nit: can these lines be merged? same elsewhere with using temp variable for Added a another short-circuit detect. BTW status s is only used here, and the coming one is in lambda function. http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.cc@191 PS8, Line 191: [idx, hash_idx, &schema, &partition, &partition_schema](const void* value) { > We should be able to push this to the front of the loop and avoid a FindCol Done http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.cc@193 PS8, Line 193: onst uint > +1. If we pass 'idx' here, we would save an extra call to FindColumn() in ' Done http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.cc@193 PS8, Line 193: onst uint > nit: we've already found the column index -- could we pass 'idx' or the col Done http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.cc@205 PS8, Line 205: d_set<string> missing_col_names; : for (const auto& entry : predicates_) { > nit: can merge these lines Done http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/tserver/tablet_service.cc File src/kudu/tserver/tablet_service.cc: http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/tserver/tablet_service.cc@2667 PS8, Line 2667: : spec.OptimizeScan(tablet_schema, scanner->arena(), true); : VLOG(3) << "After > If we pushed the emptiness check into CanShortCircuit(), would that avoid t sure, added http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/tserver/tablet_service.cc@2737 PS8, Line 2737: : // It's important to keep the reference to the tablet for the case when the : // tablet replica's shutdown is run concurrently with the code below. : shared_ptr<Tablet> tablet; : R > Hmm, I'm not sure, but could we move this up to right after OptimizeScan()? Add a extra shortcuit detect now as a temp workaround and I'm looking for the possibility with more time on this part of code. -- To view, visit http://gerrit.cloudera.org:8080/16674 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2 Gerrit-Change-Number: 16674 Gerrit-PatchSet: 10 Gerrit-Owner: wangning <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Bankim Bhavsar <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mahesh Reddy <[email protected]> Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: wangning <[email protected]> Gerrit-Comment-Date: Tue, 03 Nov 2020 08:26:26 +0000 Gerrit-HasComments: Yes
