wangning has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16674 )

Change subject: KUDU-1644 hash-partition based in-list predicate optimization
......................................................................


Patch Set 10:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/partition.h
File src/kudu/common/partition.h:

http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/partition.h@267
PS8, Line 267:     return hash_bucket_schemas_;
> nit: probably better to declare this function in the header file and define
Done


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/partition.h@266
PS8, Line 266: const std::vector<HashBucketSchema>& hash_partition_schemas() 
const {
             :     return hash_bucket_schemas_;
             :   }
             :
> +1
Done


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/partition.h@266
PS8, Line 266: const std::vector<HashBucketSchema>& hash_partition_schemas() 
const {
             :     return hash_bucket_schemas_;
             :   }
             :
> It'd be better to return bool instead of using as an out parameter.
Tried to get index of hash_bucket_schema, and modified it to use int32_t as 
return.


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/partition.h@273
PS8, Line 273: atus GetRang
> Shouldn't this check be directly below the call to FindColumn()? What happe
Done


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec-test.cc
File src/kudu/common/scan_spec-test.cc:

http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec-test.cc@518
PS8, Line 518:
> nit: "partition", same elsewhere
Thanks


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec-test.cc@553
PS8, Line 553:  SCOPED_TRACE(spec_p3.ToString(schema));
             :   EXPECT_EQ("PK >= (int8 a=4, int8 b=10, int8 c=-128) AND "
             :             "PK < (int8 a=8,
> Could you also add a case for hash(a), hash(b)?
Thx for point out. I didn't thought about it throughly.
 I found it doesn't work well in this case. I change the method to detect is a 
key is in a partition and patched more cases. I think it may works well now.


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec-test.cc@563
PS8, Line 563:   EXPECT_EQ("PK >= (int8 a=0, int8 b=40, int8 c=-128) AND "
             :             "PK < (int8 a=9, int8 b=71, int8 c=-128) AND "
             :             "a IN (0, 2, 5, 9) AND b IN (40, 60, 70)",
             :             spec_p4.ToString(schema));
             :
             :   spec_p5.PruneHashForInlistIfPossible(schema, partitions[4], 
partition_schema);
             :   spec_p5.OptimizeScan(schema, &arena_, true);
             :
             :   SCOPED_TRACE(spec_p5.ToString(schema));
             :   EXPECT_EQ("PK >= (int8 a=0, int8 b=20, int8 c=-128) AND "
             :             "PK < (int8 a=9, int8 b=51, int8 c=-128) AND "
             :             "a IN (0, 2, 5, 9) AND b IN (20, 30, 50)",
             :             spec_p5.ToString(schema));
             :
             :   spec_p6.PruneHashForInlistIfPossible(schema, partitions[5], 
partition_schema);
             :   spec_p6.OptimizeScan(schema, &arena_, true);
             :
             :   SCOPED_TRACE(spec_p6.ToString(schema));
             :   EXPECT_EQ("PK >= (int8 a=0, int8 b=10, int8 c=-128) AND "
             :             "PK < (int8 a=9, int8 b=81, int8 c=-128) AND "
             :             "a IN (0, 2, 5, 9) AND b IN (10, 80)",
             :             spec_p6.ToString(schema));
             :
             :   spec_p7.PruneHashForInlistIfPossible(schema, partitions[6], 
partition_schema);
             :   spec_p7.OptimizeScan(schema, &arena_, true);
> nit: seems like this code is reused a bunch in these tests. Maybe define a
Done


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.h
File src/kudu/common/scan_spec.h:

http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.h@62
PS8, Line 62:  is pre
> is empty
removed this method and merged into CanShortcuit method


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.h@80
PS8, Line 80:     hash(onekey), has
> I suppose the comment indicates, "supports hash schema with only one key"?
Patched and can you have an more eye on the comment if possible cause I don't 
know I delivered the idea properly for poor english skill. Thank you!


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.cc
File src/kudu/common/scan_spec.cc:

http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.cc@88
PS8, Line 88:                   return predicate.second.predicate_type() == 
PredicateType::None
> Why shouldn't we put the InList empty check here?
Done


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.cc@186
PS8, Line 186: int hash_idx = 
partition_schema.TryGetSingleColumnHashPartitionIndex(schema, idx);
             :     if (hash_idx == -1) co
> nit: can these lines be merged? same elsewhere with using temp variable for
Added a another short-circuit detect.
BTW status s is only used here, and the coming one is in lambda function.


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.cc@191
PS8, Line 191:           [idx, hash_idx, &schema, &partition, 
&partition_schema](const void* value) {
> We should be able to push this to the front of the loop and avoid a FindCol
Done


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.cc@193
PS8, Line 193: onst uint
> +1. If we pass 'idx' here, we would save an extra call to FindColumn() in '
Done


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.cc@193
PS8, Line 193: onst uint
> nit: we've already found the column index -- could we pass 'idx' or the col
Done


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/common/scan_spec.cc@205
PS8, Line 205: d_set<string> missing_col_names;
             :   for (const auto& entry : predicates_) {
> nit: can merge these lines
Done


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/tserver/tablet_service.cc@2667
PS8, Line 2667:
              :   spec.OptimizeScan(tablet_schema, scanner->arena(), true);
              :   VLOG(3) << "After
> If we pushed the emptiness check into CanShortCircuit(), would that avoid t
sure, added


http://gerrit.cloudera.org:8080/#/c/16674/8/src/kudu/tserver/tablet_service.cc@2737
PS8, Line 2737:
              :   // It's important to keep the reference to the tablet for the 
case when the
              :   // tablet replica's shutdown is run concurrently with the 
code below.
              :   shared_ptr<Tablet> tablet;
              :   R
> Hmm, I'm not sure, but could we move this up to right after OptimizeScan()?
Add a extra shortcuit detect now as a temp workaround and I'm looking for the 
possibility with more time on this part of code.



--
To view, visit http://gerrit.cloudera.org:8080/16674
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I202001535669a72de7fbb9e766dbc27db48e0aa2
Gerrit-Change-Number: 16674
Gerrit-PatchSet: 10
Gerrit-Owner: wangning <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Bankim Bhavsar <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mahesh Reddy <[email protected]>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: wangning <[email protected]>
Gerrit-Comment-Date: Tue, 03 Nov 2020 08:26:26 +0000
Gerrit-HasComments: Yes

Reply via email to