[kudu-CR] tools: don't open block manager when dumping UUID

2019-09-16 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14243 )

Change subject: tools: don't open block manager when dumping UUID
..


Patch Set 1:

There isn't much behind this CR; I just ran the tool on a pretty loaded server 
and it took a while to get the UUID.


--
To view, visit http://gerrit.cloudera.org:8080/14243
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0acf3b08fc13c169a52d4cfe847efd62bd30cc0a
Gerrit-Change-Number: 14243
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 17 Sep 2019 05:56:00 +
Gerrit-HasComments: No


[kudu-CR] tools: don't open block manager when dumping UUID

2019-09-16 Thread Andrew Wong (Code Review)
Hello Adar Dembo,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/14243

to review the following change.


Change subject: tools: don't open block manager when dumping UUID
..

tools: don't open block manager when dumping UUID

The block manager can be expensive to open, even when opening it in
read-only mode. If all a user wants is the server's UUID, all we really
need is to read the instance metadata files.

This patch updates the `kudu fs dump uuid` tool to do just that.

Change-Id: I0acf3b08fc13c169a52d4cfe847efd62bd30cc0a
---
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/tools/tool_action_fs.cc
3 files changed, 32 insertions(+), 15 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/43/14243/1
--
To view, visit http://gerrit.cloudera.org:8080/14243
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I0acf3b08fc13c169a52d4cfe847efd62bd30cc0a
Gerrit-Change-Number: 14243
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 


[kudu-CR] [tserver] include ip:port in the tserver name

2019-09-16 Thread helifu (Code Review)
helifu has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14232 )

Change subject: [tserver] include ip:port in the tserver name
..


Patch Set 2:

> (1 comment)
 >
 > The TSAN failure is interesting. It's not related to this patch,
 > but it is related to one of your previous patches, Lifu.
 >
 > In ts_tablet_manager-itest.cc:
 >
 > // 6. Restart the masters.
 > {
 > for (int i = 0; i < kNumMasters; ++i) {
 > int idx = 0;
 > ASSERT_OK(cluster_->GetLeaderMasterIndex());
 > master::MiniMaster* mini_master = CHECK_NOTNULL(cluster_->mini_master(idx));
 > mini_master->Shutdown();
 > SleepFor(MonoDelta::FromMilliseconds(kMaxElectionTime));
 > ASSERT_OK(mini_master->Restart());
 > // Sometimes the election fails until the node restarts.
 > // And the restarted node is elected leader again.
 > // So, it is necessary to wait for all tservers to report.
 > SleepFor(MonoDelta::FromMilliseconds(FLAGS_heartbeat_interval_ms));
 > NO_FATALS(CheckStats(kRowsCount));
 > }
 > }
 >
 > I think there needs to be a call to mini_master->WaitForCatalogManagerInit()
 > somewhere, perhaps after the Restart(), or after the waiting.

The test case here only cares about the leader master, so I don't think it's 
necessary to make sure the restarted master should be online, that means 
calling "mini_master->WaitForCatalogManagerInit()" is not required. Besides, 
there is evidence that the restarted master had been online before failure, 
please look at the full test output from Line2136 to Line2168.


--
To view, visit http://gerrit.cloudera.org:8080/14232
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic6b0d8e29b6cac5a900923b881f0d7a74facbd6e
Gerrit-Change-Number: 14232
Gerrit-PatchSet: 2
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Tue, 17 Sep 2019 02:40:07 +
Gerrit-HasComments: No


[kudu-CR] WIP KUDU-2780: create thread for auto-rebalancing

2019-09-16 Thread Hannah Nguyen (Code Review)
Hannah Nguyen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14177 )

Change subject: WIP KUDU-2780: create thread for auto-rebalancing
..


Patch Set 3:

(40 comments)

http://gerrit.cloudera.org:8080/#/c/14177/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14177/3//COMMIT_MSG@8
PS3, Line 8:
   : Created auto-rebalancer thread in background tasks of catalog 
manager.
   : Set up framework for auto-rebalancing loop.
   : Loop retrieves information on tservers, tables, and tablets for 
rebalancing.
   : The number of replica moves per loop iteration is controlled by a 
flag.
   : If there are placement policy violations, the current loop 
iteration will only
   : perform replica moves to reinstate the policy.
   : Otherwise, the auto-rebalancer will perform moves to do 
inter-location(cross-location),
   : then intra-location(by table then by tserver) rebalancing.
   : If the cluster is balanced, the current rebalancing cycle 
completes, and the
   : thread will sleep for an interval, controlled by another flag.
> nit: I found this a little difficult to read. Could you split it up into pa
Done


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc
File src/kudu/master/auto_rebalancer-test.cc:

http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@58
PS3, Line 58: if (cluster_)
> nit: it's a good practice to use scope braces for 'if ()' clause even it's
Done


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@61
PS3, Line 61: else return -1;
> warning: do not use 'else' after 'return' [readability-else-after-return]
Done


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@68
PS3, Line 68: else return -1;
> warning: do not use 'else' after 'return' [readability-else-after-return]
Done


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@85
PS3, Line 85:   int kNumMasters = 3;
:   int kNumTservers = 3;
:   int kNumTablets = 4;
> nit: these are constants, right?  Add const/constexpr if so.
Done


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@96
PS3, Line 96: 10
> 10+ seconds run-time for a test makes it a 'slow' test.  Add a notion of ru
Done


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@101
PS3, Line 101: if (i==leader_idx) ASSERT_NE(0, 
number_of_loop_iterations(i));
 : else ASSERT_EQ(0, moves_scheduled_this_round(i));
> Would this test fail if master leadership changes while the test is running
Hmm, I think it would.
Is it enough to just reset the leader index then restart the for loop? ie:

 if (leader_changed) { leader_idx = new_leader_idx; i=0; continue; }


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@104
PS3, Line 104:
> nit:  an extra space
Done


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@111
PS3, Line 111:   int kNumMasters = 3;
 :   int kNumTservers = 3;
 :   int kNumTablets = 3;
> Add const/constexpr.
Done


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@132
PS3, Line 132:   auto iterations = number_of_loop_iterations(new_leader_idx);
 :   ASSERT_LT(0, iterations);
> Might it happen that the rebalancing thread hasn't yet after master leader
Would it make sense to sleep for a bit to make sure the auto-rebalancer thread 
has been initialized and begins iterating?


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@139
PS3, Line 139: kNumTservers, kNumTablets;
> These are not constants, but 'kXxx' is used only for constants.  Change the
Done


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@214
PS3, Line 214: auto-rebalancing flag is paused
> nit: how does one pause a flag?
How do you allow a flag to be changed via the CLI?


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@241
PS3, Line 241:   ASSERT_GE(moves_scheduled_before, moves_scheduled_after);
> If there shouldn't be any moves, then why ASSERT_GE(), not ASSERT_EQ()?
if the flag to pause auto-rebalancing is set, the loop is still active (ie 
checking cluster status), just not getting or executing replica moves.
It's possible that while auto-rebalancing is paused, the cluster becomes 
balanced. In this case, the variable "moves_scheduled_this_round" will be reset 
to 0. So it's possible that moves_scheduled_after < moves_scheduled_before.


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@243
PS3, Line 243:   FLAGS_auto_rebalancing_stop_flag = orig_flag;
> I don't think this is 

[kudu-CR] KUDU-2069 p4: stop replication from failed servers in maintenance mode

2019-09-16 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14222 )

Change subject: KUDU-2069 p4: stop replication from failed servers in 
maintenance mode
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/consensus/quorum_util.cc
File src/kudu/consensus/quorum_util.cc:

http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/consensus/quorum_util.cc@456
PS2, Line 456: peer.attrs().replace()
> I should have been more clear on what I said.
Ah, just another observation.  Probably, I should have mentioned that earlier 
for clarity.

If I'm not mistaken, only tools like 'cluster rebalancer' and 'tablet 
change_config move_replica' set the 'REPLACE' attribute to make replica to be 
moved to some other tablet server.  That corresponds to ChangeConfigType of 
MODIFY_PEER.  The automatic re-replication in master uses ADD_PEER and 
REMOVE_PEER correspondingly.

So, if we are about to ignore the REPLACE attribute when a tablet server is in 
the maintenance mode, that means a running session of the rebalancer tool of an 
explicit replica movement would stuck until the source tablet server is back 
into the regular (non-maintenance mode).

Maybe it's worth double-checking what we want from the operability perspective 
here if ignoring the REPLACE attribute for tablet servers went into the 
maintenance mode:
 * allow for accumulation of those 'REPLACE' attributes set for replicas hosted 
by a tablet server in the maintenance mode (and be prepared for corresponding 
movements once the server is back into the normal mode)
 * explicitly disallow setting the 'REPLACE' attribute for replicas hosted by a 
tablet server in the maintenance mode

Another alternative would be honoring the 'REPLACE' attribute even if a tablet 
server is in the maintenance mode, given that the presence of the 'REPLACE' 
attribute means it has been set explicitly, and it was not a result of the 
automatic re-replication activity.



--
To view, visit http://gerrit.cloudera.org:8080/14222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a63b55011d16900c0d27eac0eb75880074204db
Gerrit-Change-Number: 14222
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 17 Sep 2019 00:38:48 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2069 p4: stop replication from failed servers in maintenance mode

2019-09-16 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14222 )

Change subject: KUDU-2069 p4: stop replication from failed servers in 
maintenance mode
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/integration-tests/maintenance_mode-itest.cc
File src/kudu/integration-tests/maintenance_mode-itest.cc:

http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/integration-tests/maintenance_mode-itest.cc@156
PS2, Line 156: TestMaintenanceModeDoesntRereplicate
> Does it make sense to add a scenario to verify that it's possible to move r
To clarify, my comment was about the manual movement of replicas from the node, 
e.g., when explicitly moving a replica  using 'kudu tablet change_config 
move_replica'.

We talked offline regarding another comment related to this question.  So, if 
we are OK with introducing the restriction of not allowing replicas to be moved 
even explicitly, please ignore this comment.



--
To view, visit http://gerrit.cloudera.org:8080/14222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a63b55011d16900c0d27eac0eb75880074204db
Gerrit-Change-Number: 14222
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 17 Sep 2019 00:07:46 +
Gerrit-HasComments: Yes


[kudu-CR] [catalog manager] correct the wrong comment

2019-09-16 Thread honeyhexin (Code Review)
honeyhexin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14233 )

Change subject: [catalog_manager] correct the wrong comment
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14233/1/src/kudu/master/catalog_manager.h
File src/kudu/master/catalog_manager.h:

http://gerrit.cloudera.org:8080/#/c/14233/1/src/kudu/master/catalog_manager.h@141
PS1, Line 141: .
> nit: while you are at it, maybe change this comma (,) into period (.) as we
Done



--
To view, visit http://gerrit.cloudera.org:8080/14233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4071923dc052b6b829ddc25a8a0392ad8d1767e6
Gerrit-Change-Number: 14233
Gerrit-PatchSet: 2
Gerrit-Owner: honeyhexin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: honeyhexin 
Gerrit-Comment-Date: Mon, 16 Sep 2019 23:38:55 +
Gerrit-HasComments: Yes


[kudu-CR] [catalog manager] correct the wrong comment

2019-09-16 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14233 )

Change subject: [catalog_manager] correct the wrong comment
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4071923dc052b6b829ddc25a8a0392ad8d1767e6
Gerrit-Change-Number: 14233
Gerrit-PatchSet: 2
Gerrit-Owner: honeyhexin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: honeyhexin 
Gerrit-Comment-Date: Mon, 16 Sep 2019 23:39:04 +
Gerrit-HasComments: No


[kudu-CR] [catalog manager] correct the wrong comment

2019-09-16 Thread Alexey Serbin (Code Review)
Alexey Serbin has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/14233 )

Change subject: [catalog_manager] correct the wrong comment
..

[catalog_manager] correct the wrong comment

Change-Id: I4071923dc052b6b829ddc25a8a0392ad8d1767e6
Reviewed-on: http://gerrit.cloudera.org:8080/14233
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin 
---
M src/kudu/master/catalog_manager.h
1 file changed, 2 insertions(+), 2 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Alexey Serbin: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/14233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I4071923dc052b6b829ddc25a8a0392ad8d1767e6
Gerrit-Change-Number: 14233
Gerrit-PatchSet: 3
Gerrit-Owner: honeyhexin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: honeyhexin 


[kudu-CR] [thirdparty] add SO REUSEPORT for chronyd NTP socket

2019-09-16 Thread Alexey Serbin (Code Review)
Alexey Serbin has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/14228 )

Change subject: [thirdparty] add SO_REUSEPORT for chronyd NTP socket
..

[thirdparty] add SO_REUSEPORT for chronyd NTP socket

This patch adds SO_REUSEPORT option for NTP server socket opened
by chronyd.  This is to allow for using the port reservation
technique from external minicluster when starting chronyd test
NTP servers.

Change-Id: Iee26fcf93976dd7affe77254751016bcbf398620
Reviewed-on: http://gerrit.cloudera.org:8080/14228
Tested-by: Alexey Serbin 
Reviewed-by: Greg Solovyev 
Reviewed-by: Hao Hao 
Reviewed-by: Adar Dembo 
---
M thirdparty/download-thirdparty.sh
A thirdparty/patches/chrony-reuseport.patch
2 files changed, 36 insertions(+), 2 deletions(-)

Approvals:
  Alexey Serbin: Verified
  Greg Solovyev: Looks good to me, but someone else must approve
  Hao Hao: Looks good to me, but someone else must approve
  Adar Dembo: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/14228
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Iee26fcf93976dd7affe77254751016bcbf398620
Gerrit-Change-Number: 14228
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Greg Solovyev 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] [thirdparty] add SO REUSEPORT for chronyd NTP socket

2019-09-16 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14228 )

Change subject: [thirdparty] add SO_REUSEPORT for chronyd NTP socket
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14228
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iee26fcf93976dd7affe77254751016bcbf398620
Gerrit-Change-Number: 14228
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Greg Solovyev 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 16 Sep 2019 23:29:21 +
Gerrit-HasComments: No


[kudu-CR] [thirdparty] add SO REUSEPORT for chronyd NTP socket

2019-09-16 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14228 )

Change subject: [thirdparty] add SO_REUSEPORT for chronyd NTP socket
..


Patch Set 1:

> >  > Could we get by with a patch to enable ephemeral port binding
 > >  > instead? That's less invasive than this, which seems unlikely
 > to be
 > >  > accepted upstream.
 > >
 > > Why do you think the ephemeral port binding change for chrony is
 > less invasive than adding SO_REUSEPORT (i.e. this small patch)?  Do
 > you have some particular considerations w.r.t. SO_REUSEPORT change?
 > >
 > > From what I see in chrony's source code, changing its code to
 > properly support (and that's is needed if we talking about
 > accepting the patch upstream  by chrony) binding to ephemeral port
 > would change many places there: interpretation of configuration
 > directive 'port' in configuraiton file, multiple places in code
 > that interpret port '0' as a special value meaning "don't open NTP
 > port at all", etc.  By my understanding, that would be much more
 > invasive than this small patch.
 > >
 > > What do you think?
 >
 > I'll copy in what I wrote in Slack.
 >
 > When I wrote "invasive" I didn't mean that it'd require more or
 > less code change to chronyd; I was thinking about it being more
 > difficult to reason about chronyd and the other sockets bound on
 > the system. As I understand it, the purpose of SO_REUSEPORT is for
 > multi-threaded applications that wish to more evenly accept new
 > connections (or new datagrams, for UDP sockets). It's not for "port
 > reservation to enable testing" the way you're using it. If an
 > application binds with SO_REUSEPORT, another application running
 > with the same EUID can now accidentally bind to that address while
 > it's in use (if SO_REUSEPORT is fed to the second bind call).
 > Contrast this with ephemeral port usage, which is an age old
 > sockets concept and has "fire and forget" semantics: you bind to
 > port 0, you see what port you got, and at that point the semantics
 > are just like non-ephemeral port usage.
 >
 > In general, I think we should seek to minimize the number of third
 > party patches we carry, especially of the "upstream will never
 > accept this" variety. My concern is that SO_REUSEPORT in chronyd
 > won't be accepted because it's a little bit weird, and I think
 > that's less of an issue for ephemeral port usage.
 >
 > All that said, if it's going to be much more work to support
 > ephemeral ports, or if you think it's also nonsensical for reasons
 > I'm not seeing, then we can move forward with this instead.

 > >  > Could we get by with a patch to enable ephemeral port binding
 > >  > instead? That's less invasive than this, which seems unlikely
 > to be
 > >  > accepted upstream.
 > >
 > > Why do you think the ephemeral port binding change for chrony is
 > less invasive than adding SO_REUSEPORT (i.e. this small patch)?  Do
 > you have some particular considerations w.r.t. SO_REUSEPORT change?
 > >
 > > From what I see in chrony's source code, changing its code to
 > properly support (and that's is needed if we talking about
 > accepting the patch upstream  by chrony) binding to ephemeral port
 > would change many places there: interpretation of configuration
 > directive 'port' in configuraiton file, multiple places in code
 > that interpret port '0' as a special value meaning "don't open NTP
 > port at all", etc.  By my understanding, that would be much more
 > invasive than this small patch.
 > >
 > > What do you think?
 >
 > I'll copy in what I wrote in Slack.
 >
 > When I wrote "invasive" I didn't mean that it'd require more or
 > less code change to chronyd; I was thinking about it being more
 > difficult to reason about chronyd and the other sockets bound on
 > the system. As I understand it, the purpose of SO_REUSEPORT is for
 > multi-threaded applications that wish to more evenly accept new
 > connections (or new datagrams, for UDP sockets). It's not for "port
 > reservation to enable testing" the way you're using it. If an
 > application binds with SO_REUSEPORT, another application running
 > with the same EUID can now accidentally bind to that address while
 > it's in use (if SO_REUSEPORT is fed to the second bind call).
 > Contrast this with ephemeral port usage, which is an age old
 > sockets concept and has "fire and forget" semantics: you bind to
 > port 0, you see what port you got, and at that point the semantics
 > are just like non-ephemeral port usage.
 >
 > In general, I think we should seek to minimize the number of third
 > party patches we carry, especially of the "upstream will never
 > accept this" variety. My concern is that SO_REUSEPORT in chronyd
 > won't be accepted because it's a little bit weird, and I think
 > that's less of an issue for ephemeral port usage.
 >
 > All that said, if it's going to be much more work to support
 > ephemeral ports, or if 

[kudu-CR] KUDU-2069 p4: stop replication from failed servers in maintenance mode

2019-09-16 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14222 )

Change subject: KUDU-2069 p4: stop replication from failed servers in 
maintenance mode
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/consensus/quorum_util.cc
File src/kudu/consensus/quorum_util.cc:

http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/consensus/quorum_util.cc@456
PS2, Line 456: peer.attrs().replace()
> Do we want to avoid replacing replicas from nodes in maintenance mode?
I should have been more clear on what I said.

My concern was that with this code moving replicas from a node put into the 
maintenance mode is impossible, even if it's done not because of automatic 
re-replication, but because of manual 'replica move' using kudu CLI.

Andrew and I discussed that offline, and it seems the constraint of not 
allowing replica movement from tablet servers in the maintenance mode is inline 
with the intention.



--
To view, visit http://gerrit.cloudera.org:8080/14222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a63b55011d16900c0d27eac0eb75880074204db
Gerrit-Change-Number: 14222
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 16 Sep 2019 23:18:25 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2069 p4: stop replication from failed servers in maintenance mode

2019-09-16 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14222 )

Change subject: KUDU-2069 p4: stop replication from failed servers in 
maintenance mode
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/master/ts_manager.h
File src/kudu/master/ts_manager.h:

http://gerrit.cloudera.org:8080/#/c/14222/2/src/kudu/master/ts_manager.h@94
PS2, Line 94:   void GetWhitelistedUuids(std::unordered_set* 
uuids) const;
> Not sure I understand why that would be any better than generating the list
If O(num tservers in maintenance mode) is relatively high, GetWhitelistedUuids 
will take a lot of locks. The acquisitions are one-at-a-time, but it's possible 
it's still computationally more expensive than ~3 TSManager lookups in 
ShouldAddReplica. With the change to only call GetWhitelistedUuids once per 
heartbeat vs. one per reported tablet, it's probably better than lookups in 
ShouldAddReplica _for full reports_. But what about incremental reports, which 
may only include a few tablets?

As for the name, maybe it'd be clearer if you shifted the complexity to 
ShouldAddReplica by calling the method GetMaintenanceModeTServers. Then 
ShouldAddReplica gets to explain how/why MM affects rereplication, but since 
that area of code is already dealing with that concept, it's more natural for 
the full explanation to also live there.



--
To view, visit http://gerrit.cloudera.org:8080/14222
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9a63b55011d16900c0d27eac0eb75880074204db
Gerrit-Change-Number: 14222
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 16 Sep 2019 23:12:46 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2069 p1: add persistent tserver maintenance mode

2019-09-16 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14217 )

Change subject: KUDU-2069 p1: add persistent tserver maintenance mode
..


Patch Set 6:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/14217/6/src/kudu/master/sys_catalog.cc
File src/kudu/master/sys_catalog.cc:

http://gerrit.cloudera.org:8080/#/c/14217/6/src/kudu/master/sys_catalog.cc@788
PS6, Line 788:   CHECK_OK(SchemaToPB(schema_, req.mutable_schema()));
 :   KuduPartialRow row(_);
 :   CHECK_OK(row.SetInt8(kSysCatalogTableColType, TSERVER_STATE));
 :   CHECK_OK(row.SetStringNoCopy(kSysCatalogTableColId, 
tserver_id));
RETURN_NOT_OK for these?


http://gerrit.cloudera.org:8080/#/c/14217/6/src/kudu/master/ts_manager.h
File src/kudu/master/ts_manager.h:

http://gerrit.cloudera.org:8080/#/c/14217/6/src/kudu/master/ts_manager.h@94
PS6, Line 94:   Status SetTServerState(const std::string& ts_uuid,
Doc.


http://gerrit.cloudera.org:8080/#/c/14217/6/src/kudu/master/ts_manager.h@102
PS6, Line 102:   Status ReloadTServerStates(SysCatalogTable* sys_catalog);
Doc.


http://gerrit.cloudera.org:8080/#/c/14217/6/src/kudu/master/ts_manager.h@122
PS6, Line 122:   std::unordered_map 
ts_state_by_uuid_;
We talked about this in person last week and I thought you were going to add 
TServerStatePB to TSDescriptor and reuse  servers_by_id_ (after modifying 
TSDescriptor to support "persisted but not yet registered" descriptors). What 
changed your mind?


http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/ts_state.h
File src/kudu/master/ts_state.h:

http://gerrit.cloudera.org:8080/#/c/14217/4/src/kudu/master/ts_state.h@27
PS4, Line 27:
:
:
:
> You previously raised concerns about mixing on-disk enums with non-on-disk
Yeah, I don't view the elision of certain special states from the overall set 
of states as a leak in the abstraction. But edge-triggered (ENTER_state) vs. 
level-triggered (IN_state) feels meaningful to me, which is why I advocated for 
different enums for RPC and persistence.



--
To view, visit http://gerrit.cloudera.org:8080/14217
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib669b43b3cee171c4c7dbd54041e29c30cb9f767
Gerrit-Change-Number: 14217
Gerrit-PatchSet: 6
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Greg Solovyev 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Mon, 16 Sep 2019 23:04:39 +
Gerrit-HasComments: Yes


[kudu-CR] [catalog manager] correct the wrong comment

2019-09-16 Thread honeyhexin (Code Review)
Hello Alexey Serbin, Kudu Jenkins, Andrew Wong, Adar Dembo,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14233

to look at the new patch set (#2).

Change subject: [catalog_manager] correct the wrong comment
..

[catalog_manager] correct the wrong comment

Change-Id: I4071923dc052b6b829ddc25a8a0392ad8d1767e6
---
M src/kudu/master/catalog_manager.h
1 file changed, 2 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/14233/2
--
To view, visit http://gerrit.cloudera.org:8080/14233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4071923dc052b6b829ddc25a8a0392ad8d1767e6
Gerrit-Change-Number: 14233
Gerrit-PatchSet: 2
Gerrit-Owner: honeyhexin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] [catalog manager] correct the wrong comment

2019-09-16 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14241 )

Change subject: [catalog_manager] correct the wrong comment
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14241
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8b5dd17e8bc5d7034ff078da05abd62c111bec8c
Gerrit-Change-Number: 14241
Gerrit-PatchSet: 1
Gerrit-Owner: honeyhexin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 16 Sep 2019 22:51:23 +
Gerrit-HasComments: No


[kudu-CR] [catalog manager] correct the wrong comment

2019-09-16 Thread honeyhexin (Code Review)
honeyhexin has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/14241


Change subject: [catalog_manager] correct the wrong comment
..

[catalog_manager] correct the wrong comment

Change-Id: I4071923dc052b6b829ddc25a8a0392ad8d1767e6

Change-Id: I8b5dd17e8bc5d7034ff078da05abd62c111bec8c
---
M src/kudu/master/catalog_manager.h
1 file changed, 2 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/41/14241/1
--
To view, visit http://gerrit.cloudera.org:8080/14241
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8b5dd17e8bc5d7034ff078da05abd62c111bec8c
Gerrit-Change-Number: 14241
Gerrit-PatchSet: 1
Gerrit-Owner: honeyhexin 


[kudu-CR] [thirdparty] add SO REUSEPORT for chronyd NTP socket

2019-09-16 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14228 )

Change subject: [thirdparty] add SO_REUSEPORT for chronyd NTP socket
..


Patch Set 1:

>  > Could we get by with a patch to enable ephemeral port binding
>  > instead? That's less invasive than this, which seems unlikely to be
>  > accepted upstream.
>
> Why do you think the ephemeral port binding change for chrony is less 
> invasive than adding SO_REUSEPORT (i.e. this small patch)?  Do you have some 
> particular considerations w.r.t. SO_REUSEPORT change?
>
> From what I see in chrony's source code, changing its code to properly 
> support (and that's is needed if we talking about accepting the patch 
> upstream  by chrony) binding to ephemeral port would change many places 
> there: interpretation of configuration directive 'port' in configuraiton 
> file, multiple places in code that interpret port '0' as a special value 
> meaning "don't open NTP port at all", etc.  By my understanding, that would 
> be much more invasive than this small patch.
>
> What do you think?

I'll copy in what I wrote in Slack.

When I wrote "invasive" I didn't mean that it'd require more or less code 
change to chronyd; I was thinking about it being more difficult to reason about 
chronyd and the other sockets bound on the system. As I understand it, the 
purpose of SO_REUSEPORT is for multi-threaded applications that wish to more 
evenly accept new connections (or new datagrams, for UDP sockets). It's not for 
"port reservation to enable testing" the way you're using it. If an application 
binds with SO_REUSEPORT, another application running with the same EUID can now 
accidentally bind to that address while it's in use (if SO_REUSEPORT is fed to 
the second bind call). Contrast this with ephemeral port usage, which is an age 
old sockets concept and has "fire and forget" semantics: you bind to port 0, 
you see what port you got, and at that point the semantics are just like 
non-ephemeral port usage.

In general, I think we should seek to minimize the number of third party 
patches we carry, especially of the "upstream will never accept this" variety. 
My concern is that SO_REUSEPORT in chronyd won't be accepted because it's a 
little bit weird, and I think that's less of an issue for ephemeral port usage.

All that said, if it's going to be much more work to support ephemeral ports, 
or if you think it's also nonsensical for reasons I'm not seeing, then we can 
move forward with this instead.


--
To view, visit http://gerrit.cloudera.org:8080/14228
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iee26fcf93976dd7affe77254751016bcbf398620
Gerrit-Change-Number: 14228
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Greg Solovyev 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 16 Sep 2019 22:50:16 +
Gerrit-HasComments: No


[kudu-CR] Add get table statistics interface for cpp client

2019-09-16 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14218 )

Change subject: Add get table statistics interface for cpp client
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/14218
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic00f46fd49c258f46a178a92d142d4d93d615d82
Gerrit-Change-Number: 14218
Gerrit-PatchSet: 2
Gerrit-Owner: ZhangYao 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: ZhangYao 
Gerrit-Comment-Date: Mon, 16 Sep 2019 18:48:53 +
Gerrit-HasComments: No


[kudu-CR] Add get table statistics interface for cpp client

2019-09-16 Thread Adar Dembo (Code Review)
Adar Dembo has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/14218 )

Change subject: Add get table statistics interface for cpp client
..

Add get table statistics interface for cpp client

Change-Id: Ic00f46fd49c258f46a178a92d142d4d93d615d82
Reviewed-on: http://gerrit.cloudera.org:8080/14218
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo 
---
M src/kudu/client/client-test.cc
M src/kudu/client/client.cc
M src/kudu/client/client.h
M src/kudu/client/master_proxy_rpc.cc
A src/kudu/client/table_statistics-internal.h
5 files changed, 150 insertions(+), 0 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Adar Dembo: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/14218
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ic00f46fd49c258f46a178a92d142d4d93d615d82
Gerrit-Change-Number: 14218
Gerrit-PatchSet: 3
Gerrit-Owner: ZhangYao 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: ZhangYao 


[kudu-CR] [tserver] include ip:port in the tserver name

2019-09-16 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14232 )

Change subject: [tserver] include ip:port in the tserver name
..


Patch Set 2:

(1 comment)

The TSAN failure is interesting. It's not related to this patch, but it is 
related to one of your previous patches, Lifu.

In ts_tablet_manager-itest.cc:

  // 6. Restart the masters.
  {
for (int i = 0; i < kNumMasters; ++i) {
  int idx = 0;
  ASSERT_OK(cluster_->GetLeaderMasterIndex());
  master::MiniMaster* mini_master = 
CHECK_NOTNULL(cluster_->mini_master(idx));
  mini_master->Shutdown();
  SleepFor(MonoDelta::FromMilliseconds(kMaxElectionTime));
  ASSERT_OK(mini_master->Restart());
  // Sometimes the election fails until the node restarts.
  // And the restarted node is elected leader again.
  // So, it is necessary to wait for all tservers to report.
  SleepFor(MonoDelta::FromMilliseconds(FLAGS_heartbeat_interval_ms));
  NO_FATALS(CheckStats(kRowsCount));
}
  }

I think there needs to be a call to mini_master->WaitForCatalogManagerInit() 
somewhere, perhaps after the Restart(), or after the waiting.

http://gerrit.cloudera.org:8080/#/c/14232/2/src/kudu/tserver/tablet_server.cc
File src/kudu/tserver/tablet_server.cc:

http://gerrit.cloudera.org:8080/#/c/14232/2/src/kudu/tserver/tablet_server.cc@69
PS2, Line 69:   if (kRunning != state_) {
: return "TabletServer (stopped)";
:   }
:   return strings::Substitute("TabletServer@$0", 
first_rpc_address().ToString());
> Maybe, this is a good opportunity to consolidate implementation of correspo
+1. No reason why masters shouldn't also report this; it's useful in 
multi-master deployments.



--
To view, visit http://gerrit.cloudera.org:8080/14232
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic6b0d8e29b6cac5a900923b881f0d7a74facbd6e
Gerrit-Change-Number: 14232
Gerrit-PatchSet: 2
Gerrit-Owner: helifu 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 16 Sep 2019 18:47:33 +
Gerrit-HasComments: Yes


[kudu-CR] [catalog manager] correct the wrong comment

2019-09-16 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14233 )

Change subject: [catalog_manager] correct the wrong comment
..


Patch Set 1: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14233/1/src/kudu/master/catalog_manager.h
File src/kudu/master/catalog_manager.h:

http://gerrit.cloudera.org:8080/#/c/14233/1/src/kudu/master/catalog_manager.h@141
PS1, Line 141: ,
nit: while you are at it, maybe change this comma (,) into period (.) as well?



--
To view, visit http://gerrit.cloudera.org:8080/14233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4071923dc052b6b829ddc25a8a0392ad8d1767e6
Gerrit-Change-Number: 14233
Gerrit-PatchSet: 1
Gerrit-Owner: honeyhexin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 16 Sep 2019 16:49:33 +
Gerrit-HasComments: Yes


[kudu-CR] [tserver] include ip:port in the tserver name

2019-09-16 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14232 )

Change subject: [tserver] include ip:port in the tserver name
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14232/2/src/kudu/tserver/tablet_server.cc
File src/kudu/tserver/tablet_server.cc:

http://gerrit.cloudera.org:8080/#/c/14232/2/src/kudu/tserver/tablet_server.cc@69
PS2, Line 69:   if (kRunning != state_) {
: return "TabletServer (stopped)";
:   }
:   return strings::Substitute("TabletServer@$0", 
first_rpc_address().ToString());
Maybe, this is a good opportunity to consolidate implementation of 
corresponding methods for both Master and TabletServer into 
kserver::KuduServer::ToString() ?



--
To view, visit http://gerrit.cloudera.org:8080/14232
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic6b0d8e29b6cac5a900923b881f0d7a74facbd6e
Gerrit-Change-Number: 14232
Gerrit-PatchSet: 2
Gerrit-Owner: helifu 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 16 Sep 2019 16:42:35 +
Gerrit-HasComments: Yes


[kudu-CR] WIP KUDU-2780: create thread for auto-rebalancing

2019-09-16 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14177 )

Change subject: WIP KUDU-2780: create thread for auto-rebalancing
..


Patch Set 3:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc
File src/kudu/master/auto_rebalancer-test.cc:

http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@58
PS3, Line 58: if (cluster_)
nit: it's a good practice to use scope braces for 'if ()' clause even it's a 
single-line statement (sprawls to multi-line in this case).


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@85
PS3, Line 85:   int kNumMasters = 3;
:   int kNumTservers = 3;
:   int kNumTablets = 4;
nit: these are constants, right?  Add const/constexpr if so.


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@96
PS3, Line 96: 10
10+ seconds run-time for a test makes it a 'slow' test.  Add a notion of 
running this test only when KUDU_ALLOW_SLOW_TESTS=1 is set; you can see how 
it's done in other tests.


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@101
PS3, Line 101: if (i==leader_idx) ASSERT_NE(0, 
number_of_loop_iterations(i));
 : else ASSERT_EQ(0, moves_scheduled_this_round(i));
Would this test fail if master leadership changes while the test is running?


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@104
PS3, Line 104:
nit:  an extra space


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@111
PS3, Line 111:   int kNumMasters = 3;
 :   int kNumTservers = 3;
 :   int kNumTablets = 3;
Add const/constexpr.


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@132
PS3, Line 132:   auto iterations = number_of_loop_iterations(new_leader_idx);
 :   ASSERT_LT(0, iterations);
Might it happen that the rebalancing thread hasn't yet after master leader 
elected?


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@137
PS3, Line 137: TEST_F(AutoRebalancerTest, NoReplicaMovesIfBalanced) {
Could it be possible to implement these scenarios in form of unit testing of 
some particular methods/function of the rebalancer?  That way it would give 
much more control over the replica placement, etc.  Also, it would be easier 
and faster to run.


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@139
PS3, Line 139: kNumTservers, kNumTablets;
These are not constants, but 'kXxx' is used only for constants.  Change the 
naming.


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@214
PS3, Line 214: auto-rebalancing flag is paused
nit: how does one pause a flag?


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@241
PS3, Line 241:   ASSERT_GE(moves_scheduled_before, moves_scheduled_after);
If there shouldn't be any moves, then why ASSERT_GE(), not ASSERT_EQ()?


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer-test.cc@243
PS3, Line 243:   FLAGS_auto_rebalancing_stop_flag = orig_flag;
I don't think this is necessary -- KuduTest does that automatically using 
google::FlagSaver


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer.h
File src/kudu/master/auto_rebalancer.h:

http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer.h@81
PS3, Line 81:   // Variables for testing.
:   int number_of_loop_iterations;
:   int moves_scheduled_this_round;
If this is for testing, move then under 'private' section and move those tests 
friends of this class.

Another approach might be introducing metrics for these readings: look around 
for METRIC_DEFINE_gauge_ in other components for examples.


http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer.cc
File src/kudu/master/auto_rebalancer.cc:

http://gerrit.cloudera.org:8080/#/c/14177/3/src/kudu/master/auto_rebalancer.cc@120
PS3, Line 120: DEFINE_uint32(auto_rebalancing_max_moves_per_iteration, 5,
 :   "Maximum number of total replica moves to perform 
during one "
 :   "iteration of the auto-rebalancing loop.");
If I'm not mistaken, in the original rebalancer code constant 5 is used as a 
factor multiplied by maximum move operations per tserver, and that's where that 
'magic' constant came from.

The minimum number of operations per server was 2, so effectively this 
parameter was scaling along with number of nodes in the cluster, which is 
'natural'.  I think it would be nice to have some sort of relative constraint 
of the same sort for the auto-rebalancer as well, otherwise it would be 
necessary to tweak this constant on every 

[kudu-CR] [catalog manager] correct the wrong comment

2019-09-16 Thread honeyhexin (Code Review)
honeyhexin has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/14233


Change subject: [catalog_manager] correct the wrong comment
..

[catalog_manager] correct the wrong comment

Change-Id: I4071923dc052b6b829ddc25a8a0392ad8d1767e6
---
M src/kudu/master/catalog_manager.h
1 file changed, 1 insertion(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/14233/1
--
To view, visit http://gerrit.cloudera.org:8080/14233
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4071923dc052b6b829ddc25a8a0392ad8d1767e6
Gerrit-Change-Number: 14233
Gerrit-PatchSet: 1
Gerrit-Owner: honeyhexin 


[kudu-CR] [tserver] include ip:port in the tserver name

2019-09-16 Thread helifu (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/14232

to look at the new patch set (#2).

Change subject: [tserver] include ip:port in the tserver name
..

[tserver] include ip:port in the tserver name

Change-Id: Ic6b0d8e29b6cac5a900923b881f0d7a74facbd6e
---
M src/kudu/tserver/tablet_server.cc
M src/kudu/tserver/tablet_server.h
2 files changed, 19 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/32/14232/2
--
To view, visit http://gerrit.cloudera.org:8080/14232
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic6b0d8e29b6cac5a900923b881f0d7a74facbd6e
Gerrit-Change-Number: 14232
Gerrit-PatchSet: 2
Gerrit-Owner: helifu 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] [tserver] include ip:port in the tserver name

2019-09-16 Thread helifu (Code Review)
helifu has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/14232


Change subject: [tserver] include ip:port in the tserver name
..

[tserver] include ip:port in the tserver name

Change-Id: Ic6b0d8e29b6cac5a900923b881f0d7a74facbd6e
---
M src/kudu/tserver/tablet_server.cc
M src/kudu/tserver/tablet_server.h
2 files changed, 18 insertions(+), 8 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/32/14232/1
--
To view, visit http://gerrit.cloudera.org:8080/14232
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic6b0d8e29b6cac5a900923b881f0d7a74facbd6e
Gerrit-Change-Number: 14232
Gerrit-PatchSet: 1
Gerrit-Owner: helifu