Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14897 )

Change subject: [master] KUDU-3016 don't lump together updates on all tablet 
reports
......................................................................


Patch Set 6:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/common/row_operations.cc
File src/kudu/common/row_operations.cc:

http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/common/row_operations.cc@105
PS6, Line 105: bool RowOperationsPBEncoder::empty() const {
             :   return pb_->mutable_rows()->empty();
             : }
> Can be inlined in the header.
Done


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/common/row_operations.cc@192
PS6, Line 192: rows_size_delta
> Nit: 'row_size_delta'?
The field in PB is called 'rows', so I think this name is more appropriate.


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/common/row_operations.cc@214
PS6, Line 214:   if (isset_bitmap_size) {
             :     *isset_bitmap_size = isset_size;
             :   }
             :   if (isnull_bitmap_size) {
             :     *isnull_bitmap_size = isnull_size;
             :   }
> Do we need these to be nullable or can we assume (and DCHECK) that they're
Indeed.  Done!


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc
File src/kudu/integration-tests/master-stress-test.cc:

http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@596
PS6, Line 596: Kudu master
> Nit: can elide this.
Done


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@599
PS6, Line 599: result
> resulting
Done


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@604
PS6, Line 604: class MasterRpcSizeLimitStressTest : public KuduTest {
> Would the ExternalMiniClusterITestBase fixture allow you to shorten Prepare
I thought about that, but it unfortunately it doesn't make much sense since 
ExternalMiniClusterITestBase assumes there is only a single master in the 
cluster, otherwise methods such as StartClusterWithOpts fail.  Also, I don't 
need cluster inspector.


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@611
PS6, Line 611:   void TearDown() override {
             :     if (cluster_) {
             :       cluster_->Shutdown();
             :     }
             :     client_.reset();
             :     KuduTest::TearDown();
             :   }
> Doesn't all this happen naturally?
Done


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@624
PS6, Line 624: buidls
> builds
Done


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@635
PS6, Line 635:       // Set custom
> Finish writing this?
Done


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@682
PS6, Line 682:         .wait(true)
> This is the default and can be omitted.
Done


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@697
PS6, Line 697: TEST_F(MasterRpcSizeLimitStressTest, TabletReports) {
> I don't think this test makes sense here. Stress tests (i.e. *-stress-test)
The reason it was here because there were many iterations for this test.  I 
started with a pretty exhaustive, long running test that created huge number of 
replicas to verify this condition happens in larger clusters (i.e. I kept the 
default limit on the RPC size).

With current distilled scenario, we should move it elsewhere, indeed.


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@709
PS6, Line 709: re-electing
> re-elect
Done


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/integration-tests/master-stress-test.cc@712
PS6, Line 712:   for (auto idx = 0; idx < kNumTabletServers; ++idx) {
> Could we do one loop to pause all tservers, then sleep, then another loop t
That wouldn't be effective in terms of achieving the desired result.  Current 
approach (1) tends to distribute all leaders replicas between two, not three 
servers (2) all tablets have their Raft config updated because of new term.  
With (1) and (2), the current approach seems to produce more updates per tablet 
server than the proposed alternative.

I added a comment about that.


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/catalog_manager.cc@4305
PS6, Line 4305: transitional
> transient
Done


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/catalog_manager.cc@4305
PS6, Line 4305: next successfully processed
              :     // tablet report from the tablet server will fix the 
partial update.
> This raises an interesting question: if chunk 2 of 2 fails and the full tab
These are UPDATE statements, so the chunk 1 of 2 in the retry report will 
rewrite already written data.  The crucial point is that eventually there will 
be a consistent snapshot of tablet replicas states in the system catalog tablet.


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/sys_catalog.cc
File src/kudu/master/sys_catalog.cc:

http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/sys_catalog.cc@838
PS6, Line 838:   size_t req_size = req->ByteSizeLong();
> Oh I missed that; yeah something like Pop() seems like it'd be OK.
Thank you for the feedback!


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/sys_catalog.cc@850
PS6, Line 850:         do_encode = false;
> Could we update excess_tablets and break here? Then maybe avoid needing do_
Done


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/sys_catalog.cc@937
PS6, Line 937:     Generator generator,
> warning: the parameter 'generator' is copied for each invocation but only u
Done


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/sys_catalog.cc@942
PS6, Line 942: decltype(tablets_info)
> This seems unnecessarily magical; why not just write out 'vector<scoped_ref
To me decltype() is simpler to comprehend (there's just input and excess) in 
the code below and it's shorter to write.

If you feel strongly against decltype() here, please let me know: I'll update 
this.


http://gerrit.cloudera.org:8080/#/c/14897/6/src/kudu/master/sys_catalog.cc@949
PS6, Line 949: should
> Nit: will ?
Done



--
To view, visit http://gerrit.cloudera.org:8080/14897
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I83e8ca4bc8db7cab8fee6b4a40f48adc8752e7c5
Gerrit-Change-Number: 14897
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sat, 11 Jan 2020 03:38:29 +0000
Gerrit-HasComments: Yes

Reply via email to