Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14897 )
Change subject: [master] KUDU-3016 don't lump together updates on all tablet reports ...................................................................... Patch Set 6: (19 comments) http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/common/row_operations.cc File src/kudu/common/row_operations.cc: http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/common/row_operations.cc@105 PS6, Line 105: bool RowOperationsPBEncoder::empty() const { : return pb_->mutable_rows()->empty(); : } Can be inlined in the header. http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/common/row_operations.cc@192 PS6, Line 192: rows_size_delta Nit: 'row_size_delta'? http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/common/row_operations.cc@214 PS6, Line 214: if (isset_bitmap_size) { : *isset_bitmap_size = isset_size; : } : if (isnull_bitmap_size) { : *isnull_bitmap_size = isnull_size; : } Do we need these to be nullable or can we assume (and DCHECK) that they're always provided? http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc File src/kudu/integration-tests/master-stress-test.cc: http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@596 PS6, Line 596: Kudu master Nit: can elide this. http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@599 PS6, Line 599: result resulting http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@604 PS6, Line 604: class MasterRpcSizeLimitStressTest : public KuduTest { Would the ExternalMiniClusterITestBase fixture allow you to shorten Prepare()? http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@611 PS6, Line 611: void TearDown() override { : if (cluster_) { : cluster_->Shutdown(); : } : client_.reset(); : KuduTest::TearDown(); : } Doesn't all this happen naturally? http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@624 PS6, Line 624: buidls builds http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@635 PS6, Line 635: // Set custom Finish writing this? http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@682 PS6, Line 682: .wait(true) This is the default and can be omitted. http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@697 PS6, Line 697: TEST_F(MasterRpcSizeLimitStressTest, TabletReports) { I don't think this test makes sense here. Stress tests (i.e. *-stress-test) are intended to be black box tests that run for a fixed amount of time (usually pretty long) and place stress on various components; here you're testing correctness and a particular bug fix. http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@709 PS6, Line 709: re-electing re-elect http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@712 PS6, Line 712: for (auto idx = 0; idx < kNumTabletServers; ++idx) { Could we do one loop to pause all tservers, then sleep, then another loop to resume all of them? That'd be net less sleeping so the test would run faster. http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/catalog_manager.cc@4305 PS6, Line 4305: transitional transient http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/catalog_manager.cc@4305 PS6, Line 4305: next successfully processed : // tablet report from the tablet server will fix the partial update. This raises an interesting question: if chunk 2 of 2 fails and the full tablet report is retried, will chunk 1 of 2 no-op the second time around? Or will it be rewritten? http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/sys_catalog.cc File src/kudu/master/sys_catalog.cc: http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/sys_catalog.cc@838 PS6, Line 838: size_t req_size = req->ByteSizeLong(); Could we just recompute this after adding each tablet instead of all that estimation logic? It's probably more expensive but it's a lot simpler, and DDLs aren't really a hot path. Unless you think N calls to ByteSizeLong() on a huge tablet report is too expensive... I also wonder whether we can use the previous row's size as a proxy for the next row's size, thus avoiding the recomputation of both Estimate() and Add(). How different can they be? http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/sys_catalog.cc@850 PS6, Line 850: do_encode = false; Could we update excess_tablets and break here? Then maybe avoid needing do_encode altogether? http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/sys_catalog.cc@942 PS6, Line 942: decltype(tablets_info) This seems unnecessarily magical; why not just write out 'vector<scoped_refptr<TabletInfo>>'? http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/sys_catalog.cc@949 PS6, Line 949: should Nit: will ? -- To view, visit http://gerrit.cloudera.org:8080/14897 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5 Gerrit-Change-Number: 14897 Gerrit-PatchSet: 6 Gerrit-Owner: Alexey Serbin <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Fri, 10 Jan 2020 10:29:36 +0000 Gerrit-HasComments: Yes
