Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14897 )
Change subject: [master] KUDU-3016 don't lump together updates on all tablet reports ...................................................................... Patch Set 6: (20 comments) http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/common/row_operations.cc File src/kudu/common/row_operations.cc: http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/common/row_operations.cc@105 PS6, Line 105: bool RowOperationsPBEncoder::empty() const { : return pb_->mutable_rows()->empty(); : } > Can be inlined in the header. Done http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/common/row_operations.cc@192 PS6, Line 192: rows_size_delta > Nit: 'row_size_delta'? The field in PB is called 'rows', so I think this name is more appropriate. http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/common/row_operations.cc@214 PS6, Line 214: if (isset_bitmap_size) { : *isset_bitmap_size = isset_size; : } : if (isnull_bitmap_size) { : *isnull_bitmap_size = isnull_size; : } > Do we need these to be nullable or can we assume (and DCHECK) that they're Indeed. Done! http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc File src/kudu/integration-tests/master-stress-test.cc: http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@596 PS6, Line 596: Kudu master > Nit: can elide this. Done http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@599 PS6, Line 599: result > resulting Done http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@604 PS6, Line 604: class MasterRpcSizeLimitStressTest : public KuduTest { > Would the ExternalMiniClusterITestBase fixture allow you to shorten Prepare I thought about that, but it unfortunately it doesn't make much sense since ExternalMiniClusterITestBase assumes there is only a single master in the cluster, otherwise methods such as StartClusterWithOpts fail. Also, I don't need cluster inspector. http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@611 PS6, Line 611: void TearDown() override { : if (cluster_) { : cluster_->Shutdown(); : } : client_.reset(); : KuduTest::TearDown(); : } > Doesn't all this happen naturally? Done http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@624 PS6, Line 624: buidls > builds Done http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@635 PS6, Line 635: // Set custom > Finish writing this? Done http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@682 PS6, Line 682: .wait(true) > This is the default and can be omitted. Done http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@697 PS6, Line 697: TEST_F(MasterRpcSizeLimitStressTest, TabletReports) { > I don't think this test makes sense here. Stress tests (i.e. *-stress-test) The reason it was here because there were many iterations for this test. I started with a pretty exhaustive, long running test that created huge number of replicas to verify this condition happens in larger clusters (i.e. I kept the default limit on the RPC size). With current distilled scenario, we should move it elsewhere, indeed. http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@709 PS6, Line 709: re-electing > re-elect Done http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@712 PS6, Line 712: for (auto idx = 0; idx < kNumTabletServers; ++idx) { > Could we do one loop to pause all tservers, then sleep, then another loop t That wouldn't be effective in terms of achieving the desired result. Current approach (1) tends to distribute all leaders replicas between two, not three servers (2) all tablets have their Raft config updated because of new term. With (1) and (2), the current approach seems to produce more updates per tablet server than the proposed alternative. I added a comment about that. http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/catalog_manager.cc@4305 PS6, Line 4305: transitional > transient Done http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/catalog_manager.cc@4305 PS6, Line 4305: next successfully processed : // tablet report from the tablet server will fix the partial update. > This raises an interesting question: if chunk 2 of 2 fails and the full tab These are UPDATE statements, so the chunk 1 of 2 in the retry report will rewrite already written data. The crucial point is that eventually there will be a consistent snapshot of tablet replicas states in the system catalog tablet. http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/sys_catalog.cc File src/kudu/master/sys_catalog.cc: http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/sys_catalog.cc@838 PS6, Line 838: size_t req_size = req->ByteSizeLong(); > Oh I missed that; yeah something like Pop() seems like it'd be OK. Thank you for the feedback! http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/sys_catalog.cc@850 PS6, Line 850: do_encode = false; > Could we update excess_tablets and break here? Then maybe avoid needing do_ Done http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/sys_catalog.cc@937 PS6, Line 937: Generator generator, > warning: the parameter 'generator' is copied for each invocation but only u Done http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/sys_catalog.cc@942 PS6, Line 942: decltype(tablets_info) > This seems unnecessarily magical; why not just write out 'vector<scoped_ref To me decltype() is simpler to comprehend (there's just input and excess) in the code below and it's shorter to write. If you feel strongly against decltype() here, please let me know: I'll update this. http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/sys_catalog.cc@949 PS6, Line 949: should > Nit: will ? Done -- To view, visit http://gerrit.cloudera.org:8080/14897 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5 Gerrit-Change-Number: 14897 Gerrit-PatchSet: 6 Gerrit-Owner: Alexey Serbin <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Sat, 11 Jan 2020 03:38:29 +0000 Gerrit-HasComments: Yes
