Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/11395 )
Change subject: KUDU-686 (part 2/2): use DeltaPreparer in DeltaFileIterator ...................................................................... Patch Set 9: (13 comments) http://gerrit.cloudera.org:8080/#/c/11395/7/src/kudu/tablet/delta_key.h File src/kudu/tablet/delta_key.h: http://gerrit.cloudera.org:8080/#/c/11395/7/src/kudu/tablet/delta_key.h@51 PS7, Line 51: aTypeSelector ins > docs Done http://gerrit.cloudera.org:8080/#/c/11395/8/src/kudu/tablet/delta_store.h File src/kudu/tablet/delta_store.h: http://gerrit.cloudera.org:8080/#/c/11395/8/src/kudu/tablet/delta_store.h@222 PS8, Line 222: aPreparer traits > docs Done http://gerrit.cloudera.org:8080/#/c/11395/8/src/kudu/tablet/delta_store.h@271 PS8, Line 271: > ... under a snapshot? Done http://gerrit.cloudera.org:8080/#/c/11395/8/src/kudu/tablet/delta_store.h@290 PS8, Line 290: > hum, idx of what? docs would help As a general rule I prefer to leave simple accessors like this undocumented and to document the state itself. See the comment on last_added_idx_ below. http://gerrit.cloudera.org:8080/#/c/11395/8/src/kudu/tablet/delta_store.h@301 PS8, Line 301: // convert the row's latest deletion state into a saved deletion or > hum this method seems weird and sketchy. when isn't there a new row index a I'll try to clarify the intent further. http://gerrit.cloudera.org:8080/#/c/11395/8/src/kudu/tablet/delta_store.cc File src/kudu/tablet/delta_store.cc: http://gerrit.cloudera.org:8080/#/c/11395/8/src/kudu/tablet/delta_store.cc@238 PS8, Line 238: for (const auto& row_id : deleted_) { : uint32_t idx_in_block = row_id - prev_prepared_idx_; : sel_vec->SetRowUnselected(idx_in_block); : } : : for (const auto& row_id : reinserted_) { : uint32_t idx_in_block = row_id - prev_prepared_idx_; : sel_vec->SetRowSelected(idx_in_block); : } > is this still correct if the same row has both deletes and reinserts? for i That's why UpdateDeletionState and MaybeProcessPreviousRowChange work the way they do: we effectively squash the entire sequence of liveness changes for a row into just the final state. So delete->reinsert->delete would be represented as a single entry in deleted_. http://gerrit.cloudera.org:8080/#/c/11395/8/src/kudu/tablet/delta_store.cc@342 PS8, Line 342: // We can't use RowChangeListDecoder.TwiddleDeleteStatus because: : // 1. Our deletion status includes an additional UNKNOWN state. : // 2. The logical chain of DELETEs and REINSERTs for a given row may extend : // across DeltaPreparer instances. For example, the same row may be deleted : // in one delta file and reinserted in the next. But, because : // DeltaPreparers cannot exchange this information in the context of batch : // preparation, we have to allow any state transition from UNKNOWN. : // : // DELETE+REINSERT pairs are reset back to UNKNOWN: these rows were both : // deleted and reinserted in the same batch, so their states haven't actually > this makes sense but I think this patch needs to come with an extensive dis Sure. I ran it 1000 times in slow mode and all passed. FWIW, I created a lower-level "fuzz" test to cover this and other aspects of this code. I'd appreciate a review on that too. https://gerrit.cloudera.org/c/11140/ http://gerrit.cloudera.org:8080/#/c/11395/8/src/kudu/tablet/delta_store.cc@358 PS8, Line 358: UNKN > maybe UNKNOWN would be a better name? OK. http://gerrit.cloudera.org:8080/#/c/11395/8/src/kudu/tablet/deltafile.cc File src/kudu/tablet/deltafile.cc: http://gerrit.cloudera.org:8080/#/c/11395/8/src/kudu/tablet/deltafile.cc@588 PS8, Line 588: prepared_ = true; > nit move this to the bottom? It's DCHECKed in AddDeltas() though. In the interest of reducing code churn I kept it where it was. http://gerrit.cloudera.org:8080/#/c/11395/8/src/kudu/tablet/deltafile.cc@650 PS8, Line 650: if (VLOG_IS_ON(3)) { : RowChangeList rcl(slice); : DVLOG(3) << "Visited " << DeltaType_Name(DeltaTypeSelector<Type>::kTag) : << " delta for key: " << key.ToString() << " Mut: " : << rcl.ToString(*preparer_.opts().projection) : << " Continue?: " << (!finished_row ? "TRUE" : "FALSE"); : } > kinda weird this commet is here. move it to where finished_row is and maybe I'll move the comment, but finished_row isn't unused; it's referenced on L627. http://gerrit.cloudera.org:8080/#/c/11395/8/src/kudu/tablet/deltamemstore-test.cc File src/kudu/tablet/deltamemstore-test.cc: http://gerrit.cloudera.org:8080/#/c/11395/8/src/kudu/tablet/deltamemstore-test.cc@73 PS8, Line 73: #ifdef NDEBUG : 100, : #else : 1, : #endif > this is not how we normally do this, is it? What specifically are you referring to? I do see prior art using NDEBUG to vary the number of runs of tests. For example: src/kudu/tablet/cbtree-test.cc:#ifdef NDEBUG src/kudu/tablet/cbtree-test.cc- int n_trials = 600; src/kudu/tablet/cbtree-test.cc-#else src/kudu/tablet/cbtree-test.cc- int n_trials = 30; src/kudu/tablet/cbtree-test.cc-#endif src/kudu/tablet/cbtree-test.cc- http://gerrit.cloudera.org:8080/#/c/11395/8/src/kudu/tablet/deltamemstore.cc File src/kudu/tablet/deltamemstore.cc: http://gerrit.cloudera.org:8080/#/c/11395/8/src/kudu/tablet/deltamemstore.cc@264 PS8, Line 264: // off the cost of a seek against the cost of decoding some irrelevant delta : // keys. Experimentation with a microbenchmark revealed that only when ~50 : // deltas were skipped was the seek cheaper than the decoding. : // : // Given that updates are expected to be uncommon and that most scans are : // _not_ historical, the current implementation eschews seeking in favor of : // skipping irrelevant deltas one by one. : RETURN_NOT_OK(preparer_.AddDelta(key, val, &finished_row)); : iter_->Next(); > same comment Done http://gerrit.cloudera.org:8080/#/c/11395/8/src/kudu/tablet/mvcc.h File src/kudu/tablet/mvcc.h: http://gerrit.cloudera.org:8080/#/c/11395/8/src/kudu/tablet/mvcc.h@114 PS8, Line 114: // Returns true if 'other' represents the same set of timestamps as this : // snapshot, false otherwise. : bool Equals(const MvccSnapshot& other) const; > what do you mean time ranges? is other the exact same snapshot or not? e.g. Hmm, "time ranges" was a poor word choice. 'other' is going to be exactly equal. Maybe "set of timestamps" would be better? -- To view, visit http://gerrit.cloudera.org:8080/11395 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I87de52092262c4b42c1bd5107f9139edfc3888b5 Gerrit-Change-Number: 11395 Gerrit-PatchSet: 9 Gerrit-Owner: Adar Dembo <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Comment-Date: Fri, 05 Oct 2018 07:50:07 +0000 Gerrit-HasComments: Yes
