[kudu-CR] KUDU-1097 (patch1): test for replica health reporting
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8642 ) Change subject: KUDU-1097 (patch1): test for replica health reporting .. KUDU-1097 (patch1): test for replica health reporting Added a test to verify that the leader tablet replica reports on the replica health changes. The tests verifies that the health reports are present in the Raft consensus state reported by the leader replica. The test also verifies that the incremental tablet reports contain appropriate information once the replica health status changes. Change-Id: Ie62b49efebad9a123eec51dd302e375e46e0682d Reviewed-on: http://gerrit.cloudera.org:8080/8642 Tested-by: Kudu Jenkins Reviewed-by: Mike Percy--- M src/kudu/integration-tests/ts_tablet_manager-itest.cc 1 file changed, 291 insertions(+), 49 deletions(-) Approvals: Kudu Jenkins: Verified Mike Percy: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/8642 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ie62b49efebad9a123eec51dd302e375e46e0682d Gerrit-Change-Number: 8642 Gerrit-PatchSet: 3 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy
[kudu-CR] KUDU-1097 (patch1): test for replica health reporting
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/8642 ) Change subject: KUDU-1097 (patch1): test for replica health reporting .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc File src/kudu/integration-tests/ts_tablet_manager-itest.cc: http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@362 PS1, Line 362: reports > Ah, I see. That's appro I meant that's an interesting approach. Let me put that then for the documentation purposes. -- To view, visit http://gerrit.cloudera.org:8080/8642 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie62b49efebad9a123eec51dd302e375e46e0682d Gerrit-Change-Number: 8642 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Mon, 27 Nov 2017 06:29:41 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1097 (patch1): test for replica health reporting
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/8642 ) Change subject: KUDU-1097 (patch1): test for replica health reporting .. Patch Set 2: I'm ok with this, let's push it -- To view, visit http://gerrit.cloudera.org:8080/8642 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie62b49efebad9a123eec51dd302e375e46e0682d Gerrit-Change-Number: 8642 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Mon, 27 Nov 2017 06:29:17 + Gerrit-HasComments: No
[kudu-CR] KUDU-1097 (patch1): test for replica health reporting
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/8642 ) Change subject: KUDU-1097 (patch1): test for replica health reporting .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc File src/kudu/integration-tests/ts_tablet_manager-itest.cc: http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@362 PS1, Line 362: reports > I see default param as nullptr as documentation that the arg is optional bu Ah, I see. That's appro http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@476 PS1, Line 476: // The scenario below assumes the leader replica does not change anymore. : FLAGS_enable_leader_failure_detection = false; > I'm not sure it's truly safe but maybe it's safe enough for this kind of te I meant if the assertions for the status fail, then we need to retry the whole scenario with the state change, in my understanding -- putting the only code below would not help much in that case. That would require to put under ASSERT_EVENTUALLY almost all the scenario, which could hide some bugs. I would prefer to keep it as it is or just rewrite it, disabling leader failure detection in the very beginning of the test. Let's choose the first option for a while, if you accept it's more or less safe for this type of test. http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@489 PS1, Line 489: const HealthReportPB& report(p.health_report()); > I don't think that's true. The copy constructor will get invoked and the in ok, we talked about that offline, and it seems the code below can clarify a bit on this (compile with -O0 to disable optimizations): #include using namespace std; class A { public: A() : a_(0) { cout << "A::A()" << endl; } A(const A& a) { a_ = a.a_; cout << "A::A()" << endl; } private: int a_; }; int main() { A a; const A& b = a; const A& c(a); A& d(a); return 0; } -- To view, visit http://gerrit.cloudera.org:8080/8642 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie62b49efebad9a123eec51dd302e375e46e0682d Gerrit-Change-Number: 8642 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Mon, 27 Nov 2017 06:27:38 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1097 (patch1): test for replica health reporting
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/8642 ) Change subject: KUDU-1097 (patch1): test for replica health reporting .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc File src/kudu/integration-tests/ts_tablet_manager-itest.cc: http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@489 PS1, Line 489: he scenario below assumes the leader replica doe > I don't think that's true. The copy constructor will get invoked and the in OK, apparently you were right :) -- To view, visit http://gerrit.cloudera.org:8080/8642 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie62b49efebad9a123eec51dd302e375e46e0682d Gerrit-Change-Number: 8642 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Mon, 27 Nov 2017 06:15:01 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1097 (patch1): test for replica health reporting
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/8642 ) Change subject: KUDU-1097 (patch1): test for replica health reporting .. Patch Set 2: Code-Review+2 (3 comments) http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc File src/kudu/integration-tests/ts_tablet_manager-itest.cc: http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@362 PS1, Line 362: eports. > This does not make much sense because this closure is never invoked with tw I see default param as nullptr as documentation that the arg is optional but I'm ok with this http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@476 PS1, Line 476: mapreports; : NO_FATALS(get_health_reports()); > I saw that's how it's done in TsTabletManagerITest::TestReportNewLeaderOnLe I'm not sure it's truly safe but maybe it's safe enough for this kind of test... I don't see how ASSERT_EVENTUALLY would be brittle but I can live with it http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@489 PS1, Line 489: he scenario below assumes the leader replica doe > It's a constructor for a reference. There is no difference here -- in both I don't think that's true. The copy constructor will get invoked and the instance will be assigned to a temporary and "report" will act as a reference to that temporary. Actually I guess this part of C++ is a bit magical for me. const string& s = "blah" should act the same way. That said, it's a test and not a very big deal. -- To view, visit http://gerrit.cloudera.org:8080/8642 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie62b49efebad9a123eec51dd302e375e46e0682d Gerrit-Change-Number: 8642 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Mon, 27 Nov 2017 05:58:27 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-1097 (patch1): test for replica health reporting
Hello Mike Percy, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8642 to look at the new patch set (#2). Change subject: KUDU-1097 (patch1): test for replica health reporting .. KUDU-1097 (patch1): test for replica health reporting Added a test to verify that the leader tablet replica reports on the replica health changes. The tests verifies that the health reports are present in the Raft consensus state reported by the leader replica. The test also verifies that the incremental tablet reports contain appropriate information once the replica health status changes. Change-Id: Ie62b49efebad9a123eec51dd302e375e46e0682d --- M src/kudu/integration-tests/ts_tablet_manager-itest.cc 1 file changed, 291 insertions(+), 49 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/42/8642/2 -- To view, visit http://gerrit.cloudera.org:8080/8642 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie62b49efebad9a123eec51dd302e375e46e0682d Gerrit-Change-Number: 8642 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy
[kudu-CR] KUDU-1097 (patch1): test for replica health reporting
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/8642 ) Change subject: KUDU-1097 (patch1): test for replica health reporting .. Patch Set 1: (17 comments) http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc File src/kudu/integration-tests/ts_tablet_manager-itest.cc: http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@124 PS1, Line 124: Status PrepareTabletReplicas(MonoDelta timeout, > nit: doc these functions Done http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@362 PS1, Line 362: auto get_health_reports = [&](map* reports, > nit: doc this closure Done http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@362 PS1, Line 362: reports > nit: if this is optional, how about we default it to nullptr This does not make much sense because this closure is never invoked with two parameters set to the default values. http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@366 PS1, Line 366: e > nit: usually when we use "e" when iterating it's for an entry in a map, but Done http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@367 PS1, Line 367: con > nit: mind spelling out "consensus" here? Done http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@369 PS1, Line 369: cs.has_leader_uuid() > nit: no need to check has_leader_uuid(), since if it's not set it will retu Done http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@378 PS1, Line 378: config > nit: perhaps name this committed_config as to avoid any confusion about whi This is just a local shortcut for the next 10 lines, so I think this should not bring a confusion -- it's easy to see what the 'config' is. http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@392 PS1, Line 392: auto get_committed_config_from_reports = [&](const string& leader_replica_uuid, > nit: doc this closure Done http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@409 PS1, Line 409: ; > typo Done http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@410 PS1, Line 410: const TabletReportPB& report = reports[0]; : ASSERT_EQ(1, reports.size()); > the order of these lines should be reversed good catch! http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@432 PS1, Line 432: EXPECT_EQ > EXPECT_EQ will not trigger an ASSERT_EVENTUALLY failure; consider using ASS Good catch! Exactly: EXPECT_XX macros do not generate fatal test failures. It seems I just moved those under the ASSERT_EVENTUALLY, not thinking about that. http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@469 PS1, Line 469: EXPECT_EQ > ASSERT_EQ Done http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@471 PS1, Line 471: EXPECT_EQ > ASSERT_EQ Done http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@476 PS1, Line 476: // The scenario below assumes the leader replica does not change anymore. : FLAGS_enable_leader_failure_detection = false; > This isn't set specified as a runtime flag and i'm not sure it's really saf I saw that's how it's done in TsTabletManagerITest::TestReportNewLeaderOnLeaderChange above -- they call this after starting the cluster already. So, I assumed it was safe enough. I also looked into the code, and it seemed safe to change this in run-time. Probably, that's because it has changed recently: we don't use dedicated thread for that anymore, so toggling this is not error-prone anymore. Wrapping the code below in ASSERT_EVENTUALLY would be brittle, IMO. http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@485 PS1, Line 485: p > nit: mind just spelling out "peer" here? Done http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@489 PS1, Line 489: const HealthReportPB& report(p.health_report()); > nit: why copy constructor? why not just get a ref like: It's a constructor for a reference. There is no difference here -- in both cases it will be a reference, no copy constructor should be invoked. It's just one symbol less. http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@491 PS1, Line 491: EXPECT_EQ > nit: if we wrap this in ASSERT_EVENTUALLY then we will have to change this I think we can get away with changing the
[kudu-CR] KUDU-1097 (patch1): test for replica health reporting
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/8642 ) Change subject: KUDU-1097 (patch1): test for replica health reporting .. Patch Set 1: (17 comments) Looks good. Thanks for writing this test for the patch 1 functionality. I have some nitpicky feedback and that's about it. http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc File src/kudu/integration-tests/ts_tablet_manager-itest.cc: http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@124 PS1, Line 124: Status PrepareTabletReplicas(MonoDelta timeout, nit: doc these functions http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@362 PS1, Line 362: auto get_health_reports = [&](map* reports, nit: doc this closure http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@362 PS1, Line 362: reports nit: if this is optional, how about we default it to nullptr http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@366 PS1, Line 366: e nit: usually when we use "e" when iterating it's for an entry in a map, but this container is just a vector, so we should just use "replica" or something similar for this variable name. http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@367 PS1, Line 367: con nit: mind spelling out "consensus" here? http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@369 PS1, Line 369: cs.has_leader_uuid() nit: no need to check has_leader_uuid(), since if it's not set it will return the default instance which is an empty string and thus won't match peer_uuid() http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@378 PS1, Line 378: config nit: perhaps name this committed_config as to avoid any confusion about which config it is http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@392 PS1, Line 392: auto get_committed_config_from_reports = [&](const string& leader_replica_uuid, nit: doc this closure http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@409 PS1, Line 409: ; typo http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@410 PS1, Line 410: const TabletReportPB& report = reports[0]; : ASSERT_EQ(1, reports.size()); the order of these lines should be reversed http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@432 PS1, Line 432: EXPECT_EQ EXPECT_EQ will not trigger an ASSERT_EVENTUALLY failure; consider using ASSERT_EQ here http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@469 PS1, Line 469: EXPECT_EQ ASSERT_EQ http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@471 PS1, Line 471: EXPECT_EQ ASSERT_EQ http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@476 PS1, Line 476: // The scenario below assumes the leader replica does not change anymore. : FLAGS_enable_leader_failure_detection = false; This isn't set specified as a runtime flag and i'm not sure it's really safe to change at runtime. Something we often to do in these situations is just wrap an ASSERT_EVENTUALLY() around any multi-step test sections. http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@485 PS1, Line 485: p nit: mind just spelling out "peer" here? http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@489 PS1, Line 489: const HealthReportPB& report(p.health_report()); nit: why copy constructor? why not just get a ref like: const HealthReportPB& report = p.health_report(); http://gerrit.cloudera.org:8080/#/c/8642/1/src/kudu/integration-tests/ts_tablet_manager-itest.cc@491 PS1, Line 491: EXPECT_EQ nit: if we wrap this in ASSERT_EVENTUALLY then we will have to change this one and the one below it to ASSERT_EQ -- To view, visit http://gerrit.cloudera.org:8080/8642 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie62b49efebad9a123eec51dd302e375e46e0682d Gerrit-Change-Number: 8642 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Comment-Date: Mon, 27 Nov 2017 03:21:43 + Gerrit-HasComments: Yes