Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10983 )
Change subject: KUDU-1291. Efficiently support predicates on non-prefix key components ...................................................................... Patch Set 10: (14 comments) I've only been through a few files, but here are some nits so far. Seems there are a few open points from Mike too. http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h File src/kudu/cfile/cfile_reader.h: http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h@59 PS10, Line 59: static const int kPrimaryKeyMaxSize = 16*1024; nit: is this a hard limit or just a target? How is this enforced? http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h@339 PS10, Line 339: to store the current value nit: note that the "current value" refers to the value after seeking. http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/cfile/cfile_reader.h@470 PS10, Line 470: Status SetCurrentValue(); : super-nit: in reading the name SetX(), based on other usages in the codebase (grep -r "Set\w" in src/kudu), I kind of expect the contract of some SetX(x) to set some internal variable to x. Mind changing this to something like StoreCurrentValue or CacheCurrentValue? http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/common/encoded_key.h File src/kudu/common/encoded_key.h: http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/common/encoded_key.h@63 PS10, Line 63: int num_columns = 0 nit: what does this do? http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/common/partial_row.h File src/kudu/common/partial_row.h: http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/common/partial_row.h@52 PS10, Line 52: template<typename KeyTypeWrapper> struct SliceTypeRowOps; // IWYU pragma: keep : template<typename KeyTypeWrapper> struct NumTypeRowOps; // IWYU pragma: keep nit: not your fault, but mind removing the spaces here? http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h File src/kudu/tablet/cfile_set.h: http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h@207 PS10, Line 207: Arena* arena nit: note that (I think) arena stores the buffer containing with the encoded key http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h@207 PS10, Line 207: gscoped_ptr nit: if the change would be contained within cfile_set.cc, mind updating the newly added occurrences of gscoped_ptr to unique_ptr? http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h@209 PS10, Line 209: prefix_key nit: Since this isn't a variable name, maybe take out the underscore. http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h@210 PS10, Line 210: "num_cols" nit: 'num_prefix_cols' http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.h@210 PS10, Line 210: // "prefix_key" refers to the first "num_cols" columns of the current key : // (current key is the key currently pointed to by validx_iter_). : // 'cache_seeked_value' controls whether we will remember the key seeked to : // in the "current value" of the iterator. nit: can you reword this to clarify whether "current" refers to before or after the seek? http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.cc File src/kudu/tablet/cfile_set.cc: http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.cc@413 PS10, Line 413: !(lower_bound_idx_ == 0 && upper_bound_idx_ == row_count_) nit: IMO slightly easier to read as (lower_bound_idx_ != 0 || upper_bound_idx != row_count_) http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.cc@423 PS10, Line 423: const std::unordered_map<std::string, ColumnPredicate> &predicates = spec.predicates(); : for (auto it=predicates.begin(); it!=predicates.end(); ++it) { Merge these into something like: for (const auto& col_and_pred: predicates) { const string& col_name = col_and_pred.first; const ColumnPredicate& pred = col_and_pred.second; const Schema& schema = base_data_->tablet_schema(); // ... use `col_name`, `pred`, and `schema` } http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.cc@441 PS10, Line 441: skip_scan_predicate_column_id_ = col_id; : : // Store the predicate value. : skip_scan_predicate_value_ = (it->second).raw_lower(); I think it's generally good hygiene that these don't get modified until after we know we're enable skip-scanning, like we do for `use_skip_scan_`. Though maybe it doesn't matter if these aren't going to be used by the iterator anyway? http://gerrit.cloudera.org:8080/#/c/10983/10/src/kudu/tablet/cfile_set.cc@509 PS10, Line 509: TryEnableSkipScan(*spec); nit: I know it doesn't make a difference execution-wise, but I think this makes more sense up by Init() at L394 -- To view, visit http://gerrit.cloudera.org:8080/10983 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I230cd5a288e28ace796b352a603e0d1bcc1e4e0f Gerrit-Change-Number: 10983 Gerrit-PatchSet: 10 Gerrit-Owner: Anupama Gupta <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Anupama Gupta <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Tue, 07 Aug 2018 05:21:18 +0000 Gerrit-HasComments: Yes
