Todd Lipcon has posted comments on this change.

Change subject: tablet: check for row presence in rowset-wise order
......................................................................


Patch Set 8:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/6483/8/src/kudu/tablet/row_op.h
File src/kudu/tablet/row_op.h:

PS8, Line 85: Flag whether this op has already had 'present_in_rowset' filled 
in.
> rephrase here and elsewhere to "present and alive"
hrm... you mean down below for the comment on 'present_in_rowset' right? Will 
do, there.


http://gerrit.cloudera.org:8080/#/c/6483/8/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

PS8, Line 650: mutate->present_in_rowset->MutateRow(
             :         ts,
             :         *mutate->key_probe,
             :         mutate->decoded_op.changelist,
             :         tx_state->op_id(),
             :         stats,
             :         result.get());
> inconsistent wrapping with MutateRow below.
yea. I think I can probably improve this code a bit anyway to only call 
MutateRow once.


PS8, Line 658: rowset
> should refer here and elsewhere to "DiskRowSet" since the MRS is also a row
Well, the existing RowSet might be a DuplicatingRowSet implementation, not 
necessarily a DRS, right?


Line 667:   if (s.ok()) {
> PREDICT_TRUE
Done


PS8, Line 688: // TODO(todd) determine why we sometimes get empty writes!
> is this new or did it happen before this patch series?
not new. it seems to be in the master, we sometimes make a SyncWrite of an 
empty batch.


PS8, Line 696: pair<Slice, int>
> aren't you using a struct for this somewhere else? can you reuse the same s
we are, but it's in the internals of RowSetTree, so I don't think it would make 
it more clear to make someone go dig into a different file.


PS8, Line 703: 
keys_and_indexes.emplace_back(op->key_probe->encoded_key_slice(), i);
> could you determine in this loop whether the keys were already sorted and r
will add a TODO.


PS8, Line 717: &
> nit: pair<Slice, int>&
Done


PS8, Line 740:   const auto& ClearPending = [&]() {
             :     next_key.clear();
             :     if (pending.empty()) return;
             :     DCHECK(std::is_sorted(pending.begin(), pending.end(),
             :                           [&](const pair<RowSet*, int>& a,
             :                               const pair<RowSet*, int>& b) {
             :                             auto s_a = keys[a.second];
             :                             auto s_b = keys[b.second];
             :                             return s_a.compare(s_b) < 0;
             :                           }));
             :     RowSet* rs = pending[0].first;
             :     for (auto it = pending.begin();
             :          it != pending.end();
             :          ++it) {
             :       DCHECK_EQ(it->first, rs);
             :       int op_idx = keys_and_indexes[it->second].second;
             :       RowOp* op = row_ops_base[op_idx];
             :       if (op->present_in_rowset) {
             :         // Already found this op present somewhere.
             :         continue;
             :       }
             : 
             :       // TODO(todd) is CHECK_OK correct? it used to be that 
errors here
             :       // would just be silently ignored, so this seems at least 
an improvement.
             :       bool present = false;
             :       CHECK_OK(rs->CheckRowPresent(*op->key_probe, &present, 
tx_state->mutable_op_stats(op_idx)));
             :       if (present) {
             :         op->present_in_rowset = rs;
             :       }
             :     }
             :     pending.clear();
             :   };
> this is quite the hefty anonymous function. move it somwhere else?
the problem is that it refers to quite a few variables in scope (row_ops_base, 
tx_state, keys_and_indexes, etc). Let me see if adding the docs you requested 
below make it clearer what's going on.


PS8, Line 773:   const TabletComponents* comps = 
DCHECK_NOTNULL(tx_state->tablet_components());
             :   comps->rowsets->ForEachRowSetContainingKeys(
             :       keys,
             :       [&](RowSet* rs, int i) {
             :         if (!pending.empty() && rs != pending.back().first) {
             :           ClearPending();
             :         }
             :         pending.emplace_back(rs, i);
             :       });
             :   ClearPending();
> doc what this is doing
Done


PS8, Line 784:   for (auto& p : keys_and_indexes) {
             :     row_ops_base[p.second]->checked_present = true;
             :   }
> is checked_present used before this? could we just set it in one of the ear
We can't set it in the ClearPending callback because it's possible that some of 
the ops would be completely culled by the IntervalTree (i.e. fall fully outside 
any existing interval).

We can't set it before the std::unique() call because then we'd end up marking 
duplicate rowkeys as having been checked, which is not accurate.

We could weave it into the std::unique loop, but then we'd have to stop using 
std::unique and implement it ourselves. Might be worth it to avoid some cache 
misses. I'll add a TODO.


http://gerrit.cloudera.org:8080/#/c/6483/8/src/kudu/tablet/tablet.h
File src/kudu/tablet/tablet.h:

PS8, Line 403:   // Validate the contents of 'op' and return a bad Status if it 
is
             :   // invalid.
             :   Status ValidateOp(const RowOp& op) const;
             : 
             :   // Validate 'op' as in 'ValidateOp()' above. If it is invalid, 
marks
             :   // the op as failed and returns false. If valid, marks the op 
as validated
             :   // and returns true.
             :   bool ValidateOpOrMarkFailed(RowOp* op) const;
> general wondering since this is not dependent on tablet internal state coul
It does depend on the tablet state a little bit in that it requires the schema. 
Would rather avoid this refactoring for now, though I agree that tablet.cc is a 
bit of a monster at 2k+ lines.


http://gerrit.cloudera.org:8080/#/c/6483/8/src/kudu/tablet/transactions/write_transaction.cc
File src/kudu/tablet/transactions/write_transaction.cc:

PS8, Line 273: 
> nit no need to wrap
Done


http://gerrit.cloudera.org:8080/#/c/6483/8/src/kudu/tablet/transactions/write_transaction.h
File src/kudu/tablet/transactions/write_transaction.h:

PS8, Line 163:   // 'i'.
> nit no need to wrap
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I9a678bb0da130ce94aae3634c82ace2d6898ffa2
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to