Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11142 )
Change subject: KUDU-2463 pt 1: adjust MVCC when replaying no-ops ...................................................................... Patch Set 7: (32 comments) http://gerrit.cloudera.org:8080/#/c/11142/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/11142/6//COMMIT_MSG@10 PS6, Line 10: r schema > timestamps Done http://gerrit.cloudera.org:8080/#/c/11142/6//COMMIT_MSG@17 PS6, Line 17: cannot necessarily be said for timestamps of no-ops and change configs. > also schema changes (I think) Good callout on schema changes, though I think they fall into the same bucket as write ops. They both use the TransactionDriver and get their timestamps assigned on the prepare thread. http://gerrit.cloudera.org:8080/#/c/11142/6//COMMIT_MSG@37 PS6, Line 37: This alone isn't enough to prevent KUDU-2463, but it reduces the window > The cases where it doesn't prevent the problem are super edge case, right? Right, with Part 2, this is a pretty narrow edge case. http://gerrit.cloudera.org:8080/#/c/11142/6/src/kudu/consensus/consensus.proto File src/kudu/consensus/consensus.proto: http://gerrit.cloudera.org:8080/#/c/11142/6/src/kudu/consensus/consensus.proto@235 PS6, Line 235: If > If Done http://gerrit.cloudera.org:8080/#/c/11142/6/src/kudu/consensus/consensus.proto@236 PS6, Line 236: to be true. > to be true Done http://gerrit.cloudera.org:8080/#/c/11142/6/src/kudu/integration-tests/raft_consensus-itest.cc File src/kudu/integration-tests/raft_consensus-itest.cc: http://gerrit.cloudera.org:8080/#/c/11142/6/src/kudu/integration-tests/raft_consensus-itest.cc@259 PS6, Line 259: stamp > is this needed for this test to pass? or just for debuggability upon test f Yep, it's needed. Right now, the first no-op is assigned a timestamp of 1, so we end up committing timestamp 1. With it, we update the clean time to 1. Then, we try to replay WAL entries with timestamp 1 and crash because timestamp 1 has already been committed. http://gerrit.cloudera.org:8080/#/c/11142/6/src/kudu/integration-tests/timestamp_advancement-itest.cc File src/kudu/integration-tests/timestamp_advancement-itest.cc: PS6: > Did clang-tidy run on this file? Seems Adar found the issue with clang tidy https://gerrit.cloudera.org/c/11457/ http://gerrit.cloudera.org:8080/#/c/11142/6/src/kudu/integration-tests/timestamp_advancement-itest.cc@78 PS6, Line 78: namespace itest { > nit: hrm, is there a reason you put these outside the namespace? seems easi Done http://gerrit.cloudera.org:8080/#/c/11142/6/src/kudu/integration-tests/timestamp_advancement-itest.cc@79 PS6, Line 79: > nit: out of alpha order Done http://gerrit.cloudera.org:8080/#/c/11142/6/src/kudu/integration-tests/timestamp_advancement-itest.cc@86 PS6, Line 86: > not used? Done http://gerrit.cloudera.org:8080/#/c/11142/6/src/kudu/integration-tests/timestamp_advancement-itest.cc@90 PS6, Line 90: // Set a low batch size so we have finer-grained control over flushing of > not used? Done http://gerrit.cloudera.org:8080/#/c/11142/6/src/kudu/integration-tests/timestamp_advancement-itest.cc@94 PS6, Line 94: write.Start(); > not used Done http://gerrit.cloudera.org:8080/#/c/11142/6/src/kudu/integration-tests/timestamp_advancement-itest.cc@107 PS6, Line 107: // Flush the current log batch and roll over to get a fresh WAL segment. > nit: not used Done http://gerrit.cloudera.org:8080/#/c/11142/6/src/kudu/integration-tests/timestamp_advancement-itest.cc@108 PS6, Line 108: ASSERT_OK(tablet_replica->log()->WaitUntilAllFlushed()); > nit: out of alpha order Done http://gerrit.cloudera.org:8080/#/c/11142/6/src/kudu/integration-tests/timestamp_advancement-itest.cc@112 PS6, Line 112: SERT_OK(tablet_replica->ta > nit: TimestampAdvancementITest ? Done http://gerrit.cloudera.org:8080/#/c/11142/6/src/kudu/integration-tests/timestamp_advancement-itest.cc@121 PS6, Line 121: > this is the default Done http://gerrit.cloudera.org:8080/#/c/11142/6/src/kudu/integration-tests/timestamp_advancement-itest.cc@122 PS6, Line 122: // Get the tablet replica on the tablet server 'ts'. : scoped_refptr<TabletReplica> tablet_replica_on_ts(int ts) const { : vector<scoped_refptr<TabletRep > Why change this stuff? If no compelling reason, let's remove it so we don't Done http://gerrit.cloudera.org:8080/#/c/11142/6/src/kudu/integration-tests/timestamp_advancement-itest.cc@135 PS6, Line 135: RETURN_NOT_OK(LogReader::Open(env_, : ts->server()->fs_manager()->GetTabletWalDir(tablet_id), : scoped_refptr<log::LogIndex>(), tablet_id, : scoped_refptr<MetricEntity>(), &reader)); : log::SegmentSequence segs; : RETURN_NOT_OK(reader->GetSegmentsSnapshot(&segs)); : unique_ptr<log::LogEntryPB> entry; : for (const auto& seg : segs) { : log::LogEntryReader reader(seg.get()); : while (true) { : Status s = reader.ReadNextEntry(&entry); : if (s.IsEndOfFile()) break; : > you should be able to replace this with: Done http://gerrit.cloudera.org:8080/#/c/11142/6/src/kudu/integration-tests/timestamp_advancement-itest.cc@149 PS6, Line 149: > segment Done http://gerrit.cloudera.org:8080/#/c/11142/6/src/kudu/integration-tests/timestamp_advancement-itest.cc@150 PS6, Line 150: repl > nit: to Done http://gerrit.cloudera.org:8080/#/c/11142/6/src/kudu/integration-tests/timestamp_advancement-itest.cc@151 PS6, Line 151: urn Status::OK(); > not seeing this Done http://gerrit.cloudera.org:8080/#/c/11142/6/src/kudu/integration-tests/timestamp_advancement-itest.cc@181 PS6, Line 181: MiniTabletServer* ts = tserver(kTserver); > nit: maybe add a CHECK_EQ(1, replicas.size()); Done http://gerrit.cloudera.org:8080/#/c/11142/6/src/kudu/integration-tests/timestamp_advancement-itest.cc@186 PS6, Line 186: _leader_failu > nit: Rename to CheckForWriteOpsInLog() ? Done http://gerrit.cloudera.org:8080/#/c/11142/6/src/kudu/integration-tests/timestamp_advancement-itest.cc@199 PS6, Line 199: y_ms_in_notifica > nit: we don't expect a truncated log file in this test, do we? Done http://gerrit.cloudera.org:8080/#/c/11142/6/src/kudu/integration-tests/timestamp_advancement-itest.cc@202 PS6, Line 202: int64_t gcable_size; > do we care about commit messages? if they don't have a matching replicate m Done http://gerrit.cloudera.org:8080/#/c/11142/6/src/kudu/integration-tests/timestamp_advancement-itest.cc@223 PS6, Line 223: > s/leave// Done http://gerrit.cloudera.org:8080/#/c/11142/6/src/kudu/integration-tests/timestamp_advancement-itest.cc@224 PS6, Line 224: start()); > safe time timestamp Done http://gerrit.cloudera.org:8080/#/c/11142/6/src/kudu/integration-tests/timestamp_advancement-itest.cc@225 PS6, Line 225: OrDie(ts_map_, ts->uuid()); > nit: TestNoOpAdvancesMvccSafeTimeOnBootstrap ? Done http://gerrit.cloudera.org:8080/#/c/11142/6/src/kudu/integration-tests/timestamp_advancement-itest.cc@226 PS6, Line 226: > ... so that we can fill up the WAL with no-op entries naturally Done http://gerrit.cloudera.org:8080/#/c/11142/6/src/kudu/integration-tests/timestamp_advancement-itest.cc@272 PS6, Line 272: > nit: extra blank line Done http://gerrit.cloudera.org:8080/#/c/11142/6/src/kudu/tablet/tablet_bootstrap.cc File src/kudu/tablet/tablet_bootstrap.cc: http://gerrit.cloudera.org:8080/#/c/11142/6/src/kudu/tablet/tablet_bootstrap.cc@1097 PS6, Line 1097: guaranteed to be monotonically : // increasing w > suggestion: guaranteed to be monotonic with respect to the write-ahead log. Done http://gerrit.cloudera.org:8080/#/c/11142/6/src/kudu/tablet/tablet_bootstrap.cc@1113 PS6, Line 1113: } > Hmm. I wonder if we had a bug excluding alter schema here. Do you know whet Chatted briefly about this on Slack: alter schema requests go through the prepare thread, same as writes, so they are serialized. Given that, I left this defaulted to true to match the implementation before this patch. -- To view, visit http://gerrit.cloudera.org:8080/11142 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I26deff32da8c990cb8a2ba220bb81858ddd6d73f Gerrit-Change-Number: 11142 Gerrit-PatchSet: 7 Gerrit-Owner: Andrew Wong <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Comment-Date: Tue, 18 Sep 2018 07:30:30 +0000 Gerrit-HasComments: Yes
