Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/8804 )
Change subject: KUDU-1704: add READ_YOUR_WRITES scan mode ...................................................................... Patch Set 12: (20 comments) http://gerrit.cloudera.org:8080/#/c/8804/12/src/kudu/common/common.proto File src/kudu/common/common.proto: http://gerrit.cloudera.org:8080/#/c/8804/12/src/kudu/common/common.proto@236 PS12, Line 236: subject to the propagated timestamp bound This wording is pretty hard to a casual reader (or even me) to understand. I think it would be more clear to say something like: Each tablet server will choose a timestamp to use for a server-local snapshot scan subject to the following criteria: (1) It will be higher than the propagated timestamp, (2) It will attempt to minimize latency caused by waiting for outstanding write transactions to complete. http://gerrit.cloudera.org:8080/#/c/8804/12/src/kudu/common/common.proto@237 PS12, Line 237: newest timestamp within latest timestamp higher than http://gerrit.cloudera.org:8080/#/c/8804/12/src/kudu/common/common.proto@242 PS12, Line 242: the timestamp a timestamp http://gerrit.cloudera.org:8080/#/c/8804/12/src/kudu/common/common.proto@244 PS12, Line 244: should use it Is this something that end-users have to worry about, or can we word this so that it sounds like Kudu will take care of it? If the latter, how about "The Kudu client library will use it as the propagated timestamp for subsequent reads" or something along those lines? However I'm not sure why we are making the distinction here between a normally propagated timestamp and this different thing. http://gerrit.cloudera.org:8080/#/c/8804/12/src/kudu/common/common.proto@245 PS12, Line 245: wait nit: waiting http://gerrit.cloudera.org:8080/#/c/8804/12/src/kudu/tserver/tablet_server-test.cc File src/kudu/tserver/tablet_server-test.cc: PS12: We are missing a C++ client test in this patch. Is that excluded on purpose? http://gerrit.cloudera.org:8080/#/c/8804/12/src/kudu/tserver/tablet_server-test.cc@1924 PS12, Line 1924: TestScanYourWrites > It worries me a bit that most tests of this functionality overwhelmingly te I agree that a deterministic test would be nice and I think it should be possible using two clients. It would also be nice to have a non-deterministic test where you have two clients in RYW mode reading and scanning. Loop them concurrently in separate threads at least a certain number of rounds or amount of time, as well as until both of the following conditions hold, while also triggering leader elections to make it easier to hit anomalies: 1. neither has ever violated RYW locally 2. we have observed anomalies where if we had been using a strict serializable approach then certain writes would have appeared, but they don't because of the hidden channel (the other client). http://gerrit.cloudera.org:8080/#/c/8804/12/src/kudu/tserver/tablet_server-test.cc@1924 PS12, Line 1924: TEST_F(TabletServerTest, TestScanYourWrites) { nit: add a comment at the top of this test describing what the test does http://gerrit.cloudera.org:8080/#/c/8804/12/src/kudu/tserver/tablet_server-test.cc@1927 PS12, Line 1927: e nit: missing period http://gerrit.cloudera.org:8080/#/c/8804/12/src/kudu/tserver/tablet_server-test.cc@1958 PS12, Line 1958: TEST_F(TabletServerTest, TestScanYourWrites_WithoutPropagatedTimestamp) { nit: add test comment http://gerrit.cloudera.org:8080/#/c/8804/12/src/kudu/tserver/tablet_server-test.cc@1960 PS12, Line 1960: perform a write nit: use capitalization and punctuation for code comments per https://google.github.io/styleguide/cppguide.html#Punctuation,_Spelling_and_Grammar http://gerrit.cloudera.org:8080/#/c/8804/12/src/kudu/tserver/tablet_server-test.cc@1972 PS12, Line 1972: perform a write punctuation http://gerrit.cloudera.org:8080/#/c/8804/12/src/kudu/tserver/tablet_service.h File src/kudu/tserver/tablet_service.h: http://gerrit.cloudera.org:8080/#/c/8804/12/src/kudu/tserver/tablet_service.h@155 PS12, Line 155: // it exceeds the maximum allowed clock synchronization error time. Can we add a note about the context, i.e. why this matters? nit: Also perhaps rename to ValidateSnapshotTimestamp() ? Usually Stamp in Timestamp is not capitalized in the Kudu code, and this seems specific to snapshot timestamps. http://gerrit.cloudera.org:8080/#/c/8804/12/src/kudu/tserver/tablet_service.h@158 PS12, Line 158: if s/if/that/ http://gerrit.cloudera.org:8080/#/c/8804/12/src/kudu/tserver/tablet_service.cc File src/kudu/tserver/tablet_service.cc: http://gerrit.cloudera.org:8080/#/c/8804/12/src/kudu/tserver/tablet_service.cc@2113 PS12, Line 2113: add: DCHECK(s.ok()) ? http://gerrit.cloudera.org:8080/#/c/8804/12/src/kudu/tserver/tablet_service.cc@2114 PS12, Line 2114: // Note: if 'max_allowed_ts' is not obtained from clock_->GetGlobalLatest() it's guaranteed : // to be higher than 'tmp_snap_timestamp'. I'm having trouble following this: 1. What does "not obtained" mean? Is it like GetGlobalLatest() returned Status::Corrruption or something like that and so then we assume max_allowed_ts is still default-constructed? 2. Why is it guaranteed to by higher than 'tmp_snap_timestamp'? Can we explain what guarantees that and why? http://gerrit.cloudera.org:8080/#/c/8804/12/src/kudu/tserver/tablet_service.cc@2133 PS12, Line 2133: that than http://gerrit.cloudera.org:8080/#/c/8804/12/src/kudu/tserver/tablet_service.cc@2135 PS12, Line 2135: much far http://gerrit.cloudera.org:8080/#/c/8804/12/src/kudu/tserver/tablet_service.cc@2164 PS12, Line 2164: RETURN_NOT_OK(ValidateTimeStamp(tmp_snap_timestamp)); Hmm, we already updated the clock on line 2137 so AFAICT the only way this could fail is if the propagated timestamp +1 fails this test or the clean timestamp fails this test. Is that the purpose of this? Should we be trying to check this before we update the clock? I'm actually not sure what the right answer here is but intuitively it seems we should try to perform as much validation as possible before making changes to the system. http://gerrit.cloudera.org:8080/#/c/8804/12/src/kudu/tserver/tablet_service.cc@2170 PS12, Line 2170: VerifyNotAncientHistory same here; is there some way to check this before updating the clock, or we don't really care one way or the other? -- To view, visit http://gerrit.cloudera.org:8080/8804 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I84ddb981a1a0f199d4e66f5d5097318f8c785a48 Gerrit-Change-Number: 8804 Gerrit-PatchSet: 12 Gerrit-Owner: Hao Hao <hao....@cloudera.com> Gerrit-Reviewer: Alexey Serbin <aser...@cloudera.com> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Dan Burkert <d...@cloudera.com> Gerrit-Reviewer: David Ribeiro Alves <davidral...@gmail.com> Gerrit-Reviewer: Hao Hao <hao....@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Tue, 13 Feb 2018 23:55:39 +0000 Gerrit-HasComments: Yes