[kudu-CR] KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow

2018-03-08 Thread Todd Lipcon (Code Review)
Todd Lipcon has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9375 )

Change subject: KUDU-2297 (part 5): dump stacks into diagnostics log on service 
queue overflow
..

KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow

In the past we've had several bugs and performance issues which manifest
themselves as service queue overflows. It would be easier to pinpoint
the root cause (eg slow IO vs lock contention) if we had a process-wide
stack snapshot at the time of the overflow.

This patch does some plumbing to trigger such a stack trace into the
diagnostics log when a service queue overflow occurs. The logging is
throttled to once every 5 seconds so that we don't contribute too much
to the load itself.

The plumbing is a bit ugly since the diagnostics log is in the server/
module whereas the overflows surface in the rpc/ module. Having rpc/
call back into server/ would be a cyclic dependency, so instead the rpc/
module exposes a std::function<> style callback for the server to hook
into.

Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8
Reviewed-on: http://gerrit.cloudera.org:8080/9375
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo 
---
M src/kudu/master/master-test.cc
M src/kudu/master/mini_master.h
M src/kudu/rpc/service_pool.cc
M src/kudu/rpc/service_pool.h
M src/kudu/server/diagnostics_log.cc
M src/kudu/server/diagnostics_log.h
M src/kudu/server/rpc_server.cc
M src/kudu/server/rpc_server.h
M src/kudu/server/server_base.cc
M src/kudu/server/server_base.h
10 files changed, 185 insertions(+), 19 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Adar Dembo: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/9375
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8
Gerrit-Change-Number: 9375
Gerrit-PatchSet: 7
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow

2018-03-08 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9375 )

Change subject: KUDU-2297 (part 5): dump stacks into diagnostics log on service 
queue overflow
..


Patch Set 6: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/9375
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8
Gerrit-Change-Number: 9375
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 08 Mar 2018 18:50:45 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow

2018-03-07 Thread Todd Lipcon (Code Review)
Hello Tidy Bot, Mike Percy, Kudu Jenkins, Adar Dembo, Sailesh Mukil,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9375

to look at the new patch set (#6).

Change subject: KUDU-2297 (part 5): dump stacks into diagnostics log on service 
queue overflow
..

KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow

In the past we've had several bugs and performance issues which manifest
themselves as service queue overflows. It would be easier to pinpoint
the root cause (eg slow IO vs lock contention) if we had a process-wide
stack snapshot at the time of the overflow.

This patch does some plumbing to trigger such a stack trace into the
diagnostics log when a service queue overflow occurs. The logging is
throttled to once every 5 seconds so that we don't contribute too much
to the load itself.

The plumbing is a bit ugly since the diagnostics log is in the server/
module whereas the overflows surface in the rpc/ module. Having rpc/
call back into server/ would be a cyclic dependency, so instead the rpc/
module exposes a std::function<> style callback for the server to hook
into.

Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8
---
M src/kudu/master/master-test.cc
M src/kudu/master/mini_master.h
M src/kudu/rpc/service_pool.cc
M src/kudu/rpc/service_pool.h
M src/kudu/server/diagnostics_log.cc
M src/kudu/server/diagnostics_log.h
M src/kudu/server/rpc_server.cc
M src/kudu/server/rpc_server.h
M src/kudu/server/server_base.cc
M src/kudu/server/server_base.h
10 files changed, 185 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/9375/6
--
To view, visit http://gerrit.cloudera.org:8080/9375
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8
Gerrit-Change-Number: 9375
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow

2018-03-07 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9375 )

Change subject: KUDU-2297 (part 5): dump stacks into diagnostics log on service 
queue overflow
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/diagnostics_log.cc
File src/kudu/server/diagnostics_log.cc:

http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/diagnostics_log.cc@196
PS5, Line 196: WARN_NOT_OK(s, "Unable to collect stacks to diagnostics 
log");
> The next patch in this series gets rid of this anyway, so I'll skip this on
Sure.


http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/diagnostics_log.cc@201
PS5, Line 201:   dump_stacks_now_reason_ = boost::none;
> I actually think that's somewhat advantageous because if you get "requested
I don't think it's a big deal. The only rebuttal I can come up with is that if 
I'm troubleshooting I might find it weird to see N "service queue overflowed" 
messages but only N-1 or N-2 stack dumps. But that's pretty contrived, plus the 
relationship between overflows and dumps is already murky due to throttling.



--
To view, visit http://gerrit.cloudera.org:8080/9375
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8
Gerrit-Change-Number: 9375
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 08 Mar 2018 03:20:44 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow

2018-03-07 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9375 )

Change subject: KUDU-2297 (part 5): dump stacks into diagnostics log on service 
queue overflow
..


Patch Set 5:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/master/master-test.cc
File src/kudu/master/master-test.cc:

http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/master/master-test.cc@829
PS5, Line 829: Env::Default()
> KuduTest has an env_ member you can use. Below too.
Done


http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/master/master-test.cc@835
PS5, Line 835:   return line.find("service queue overflowed for 
kudu.master.MasterService") != string::npos;
> Would MatchPattern be more idiomatic? Won't have to worry about string::npo
Done


http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/rpc/service_pool.h
File src/kudu/rpc/service_pool.h:

http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/rpc/service_pool.h@67
PS5, Line 67:   void set_reject_too_busy_hook(std::function hook) {
> Nit: if this is supposed to look like a setter for too_busy_hook_, I think
Done


http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/rpc/service_pool.cc
File src/kudu/rpc/service_pool.cc:

http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/rpc/service_pool.cc@136
PS5, Line 136:   if (too_busy_hook_) {
 : too_busy_hook_();
 :   }
> You may get a less noisy stack if you call this before responding to the RP
I think though it's best to respond as quickly as possible because it frees up 
the memory used by the inbound request


http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/diagnostics_log.cc
File src/kudu/server/diagnostics_log.cc:

http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/diagnostics_log.cc@196
PS5, Line 196: WARN_NOT_OK(s, "Unable to collect stacks to diagnostics 
log");
> Nit: "Will try again in ..." ?
The next patch in this series gets rid of this anyway, so I'll skip this one if 
you don't mind.


http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/diagnostics_log.cc@201
PS5, Line 201:   dump_stacks_now_reason_ = boost::none;
> Why do this here and not on L187, right after storing a local copy in 'reas
I actually think that's somewhat advantageous because if you get "requested" to 
dump stacks while you're already dumping stacks, there isn't really any need to 
wake up and dump stacks again (presumably your previous dump was already more 
accurate). Again if you disagree would rather change it in the follow-up patch 
which restructures this code path. In the end though it doesn't really matter 
much since it's not a super likely race and the only harm is an extra (or 
missing) stack dump which in most cases no one will ever even look at.


http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/rpc_server.h
File src/kudu/server/rpc_server.h:

http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/rpc_server.h@62
PS5, Line 62:   void 
set_reject_too_busy_hook(std::function hook) {
> Nit: same comment about setter naming.
Done



--
To view, visit http://gerrit.cloudera.org:8080/9375
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8
Gerrit-Change-Number: 9375
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 08 Mar 2018 02:55:43 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow

2018-03-07 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9375 )

Change subject: KUDU-2297 (part 5): dump stacks into diagnostics log on service 
queue overflow
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/diagnostics_log.cc
File src/kudu/server/diagnostics_log.cc:

http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/diagnostics_log.cc@118
PS5, Line 118: MutexLock l(lock_);
> just because dump_stacks_now_reason_ is mutated from the logger thread too,
Ah right, nvm then.



--
To view, visit http://gerrit.cloudera.org:8080/9375
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8
Gerrit-Change-Number: 9375
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 07 Mar 2018 21:19:35 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow

2018-03-07 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9375 )

Change subject: KUDU-2297 (part 5): dump stacks into diagnostics log on service 
queue overflow
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/diagnostics_log.cc
File src/kudu/server/diagnostics_log.cc:

http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/diagnostics_log.cc@118
PS5, Line 118: MutexLock l(lock_);
> Why bother locking this? It seems to me like when RunThread() gives up the
just because dump_stacks_now_reason_ is mutated from the logger thread too, and 
this may be called concurrently from multiple threads, so it would be unsafe to 
mutate it without a lock, right? std::string is not threadsafe



--
To view, visit http://gerrit.cloudera.org:8080/9375
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8
Gerrit-Change-Number: 9375
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 07 Mar 2018 20:38:27 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow

2018-03-07 Thread Sailesh Mukil (Code Review)
Sailesh Mukil has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9375 )

Change subject: KUDU-2297 (part 5): dump stacks into diagnostics log on service 
queue overflow
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/rpc/service_pool.cc
File src/kudu/rpc/service_pool.cc:

http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/rpc/service_pool.cc@136
PS5, Line 136:   if (too_busy_hook_) {
 : too_busy_hook_();
 :   }
You may get a less noisy stack if you call this before responding to the RPC. 
Since responding to the RPC would start off some reactor thread tasks 
asynchronously.

But the trade off is that you'll reply to this RPC a little later. Not a big 
deal, but I just thought I'd bring it up.


http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/diagnostics_log.cc
File src/kudu/server/diagnostics_log.cc:

http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/diagnostics_log.cc@118
PS5, Line 118: MutexLock l(lock_);
Why bother locking this? It seems to me like when RunThread() gives up the lock 
to do the actual logging, the 'dump_stacks_now_reason_' can be overwritten 
anyway by another thread calling DumpStacksNow(). Unless I'm missing something.



--
To view, visit http://gerrit.cloudera.org:8080/9375
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8
Gerrit-Change-Number: 9375
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Sailesh Mukil 
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Wed, 07 Mar 2018 20:34:51 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow

2018-03-07 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9375 )

Change subject: KUDU-2297 (part 5): dump stacks into diagnostics log on service 
queue overflow
..


Patch Set 5:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/master/master-test.cc
File src/kudu/master/master-test.cc:

http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/master/master-test.cc@829
PS5, Line 829: Env::Default()
KuduTest has an env_ member you can use. Below too.


http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/master/master-test.cc@835
PS5, Line 835:   return line.find("service queue overflowed for 
kudu.master.MasterService") != string::npos;
Would MatchPattern be more idiomatic? Won't have to worry about string::npos 
comparisons then.


http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/rpc/service_pool.h
File src/kudu/rpc/service_pool.h:

http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/rpc/service_pool.h@67
PS5, Line 67:   void set_reject_too_busy_hook(std::function hook) {
Nit: if this is supposed to look like a setter for too_busy_hook_, I think 
set_too_busy_hook() would be a better name.


http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/diagnostics_log.cc
File src/kudu/server/diagnostics_log.cc:

http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/diagnostics_log.cc@196
PS5, Line 196: WARN_NOT_OK(s, "Unable to collect stacks to diagnostics 
log");
Nit: "Will try again in ..." ?


http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/diagnostics_log.cc@201
PS5, Line 201:   dump_stacks_now_reason_ = boost::none;
Why do this here and not on L187, right after storing a local copy in 'reason'? 
By doing it down here it's possible for another thread to overwrite 
dump_stacks_now_reason_ during the stack logging.


http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/rpc_server.h
File src/kudu/server/rpc_server.h:

http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/rpc_server.h@62
PS5, Line 62:   void 
set_reject_too_busy_hook(std::function hook) {
Nit: same comment about setter naming.



--
To view, visit http://gerrit.cloudera.org:8080/9375
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8
Gerrit-Change-Number: 9375
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Wed, 07 Mar 2018 19:27:44 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow

2018-03-06 Thread Todd Lipcon (Code Review)
Hello Tidy Bot, Mike Percy, Kudu Jenkins, Adar Dembo,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9375

to look at the new patch set (#5).

Change subject: KUDU-2297 (part 5): dump stacks into diagnostics log on service 
queue overflow
..

KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow

In the past we've had several bugs and performance issues which manifest
themselves as service queue overflows. It would be easier to pinpoint
the root cause (eg slow IO vs lock contention) if we had a process-wide
stack snapshot at the time of the overflow.

This patch does some plumbing to trigger such a stack trace into the
diagnostics log when a service queue overflow occurs. The logging is
throttled to once every 5 seconds so that we don't contribute too much
to the load itself.

The plumbing is a bit ugly since the diagnostics log is in the server/
module whereas the overflows surface in the rpc/ module. Having rpc/
call back into server/ would be a cyclic dependency, so instead the rpc/
module exposes a std::function<> style callback for the server to hook
into.

Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8
---
M src/kudu/master/master-test.cc
M src/kudu/master/mini_master.h
M src/kudu/rpc/service_pool.cc
M src/kudu/rpc/service_pool.h
M src/kudu/server/diagnostics_log.cc
M src/kudu/server/diagnostics_log.h
M src/kudu/server/rpc_server.cc
M src/kudu/server/rpc_server.h
M src/kudu/server/server_base.cc
M src/kudu/server/server_base.h
10 files changed, 179 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/9375/5
--
To view, visit http://gerrit.cloudera.org:8080/9375
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8
Gerrit-Change-Number: 9375
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow

2018-03-06 Thread Todd Lipcon (Code Review)
Hello Tidy Bot, Mike Percy, Kudu Jenkins, Adar Dembo,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9375

to look at the new patch set (#4).

Change subject: KUDU-2297 (part 5): dump stacks into diagnostics log on service 
queue overflow
..

KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow

In the past we've had several bugs and performance issues which manifest
themselves as service queue overflows. It would be easier to pinpoint
the root cause (eg slow IO vs lock contention) if we had a process-wide
stack snapshot at the time of the overflow.

This patch does some plumbing to trigger such a stack trace into the
diagnostics log when a service queue overflow occurs. The logging is
throttled to once every 5 seconds so that we don't contribute too much
to the load itself.

The plumbing is a bit ugly since the diagnostics log is in the server/
module whereas the overflows surface in the rpc/ module. Having rpc/
call back into server/ would be a cyclic dependency, so instead the rpc/
module exposes a std::function<> style callback for the server to hook
into.

Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8
---
M src/kudu/master/master-test.cc
M src/kudu/master/mini_master.h
M src/kudu/rpc/service_pool.cc
M src/kudu/rpc/service_pool.h
M src/kudu/server/diagnostics_log.cc
M src/kudu/server/diagnostics_log.h
M src/kudu/server/rpc_server.cc
M src/kudu/server/rpc_server.h
M src/kudu/server/server_base.cc
M src/kudu/server/server_base.h
10 files changed, 170 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/9375/4
--
To view, visit http://gerrit.cloudera.org:8080/9375
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8
Gerrit-Change-Number: 9375
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow

2018-03-06 Thread Todd Lipcon (Code Review)
Hello Tidy Bot, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9375

to look at the new patch set (#3).

Change subject: KUDU-2297 (part 5): dump stacks into diagnostics log on service 
queue overflow
..

KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow

In the past we've had several bugs and performance issues which manifest
themselves as service queue overflows. It would be easier to pinpoint
the root cause (eg slow IO vs lock contention) if we had a process-wide
stack snapshot at the time of the overflow.

This patch does some plumbing to trigger such a stack trace into the
diagnostics log when a service queue overflow occurs. The logging is
throttled to once every 5 seconds so that we don't contribute too much
to the load itself.

The plumbing is a bit ugly since the diagnostics log is in the server/
module whereas the overflows surface in the rpc/ module. Having rpc/
call back into server/ would be a cyclic dependency, so instead the rpc/
module exposes a std::function<> style callback for the server to hook
into.

Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8
---
M src/kudu/master/master-test.cc
M src/kudu/master/mini_master.h
M src/kudu/rpc/service_pool.cc
M src/kudu/rpc/service_pool.h
M src/kudu/server/diagnostics_log.cc
M src/kudu/server/diagnostics_log.h
M src/kudu/server/rpc_server.cc
M src/kudu/server/rpc_server.h
M src/kudu/server/server_base.cc
M src/kudu/server/server_base.h
10 files changed, 169 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/9375/3
--
To view, visit http://gerrit.cloudera.org:8080/9375
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8
Gerrit-Change-Number: 9375
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow

2018-03-06 Thread Todd Lipcon (Code Review)
Hello Tidy Bot, Mike Percy, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9375

to look at the new patch set (#2).

Change subject: KUDU-2297 (part 5): dump stacks into diagnostics log on service 
queue overflow
..

KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow

In the past we've had several bugs and performance issues which manifest
themselves as service queue overflows. It would be easier to pinpoint
the root cause (eg slow IO vs lock contention) if we had a process-wide
stack snapshot at the time of the overflow.

This patch does some plumbing to trigger such a stack trace into the
diagnostics log when a service queue overflow occurs. The logging is
throttled to once every 5 seconds so that we don't contribute too much
to the load itself.

The plumbing is a bit ugly since the diagnostics log is in the server/
module whereas the overflows surface in the rpc/ module. Having rpc/
call back into server/ would be a cyclic dependency, so instead the rpc/
module exposes a std::function<> style callback for the server to hook
into.

Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8
---
M src/kudu/master/master-test.cc
M src/kudu/master/mini_master.h
M src/kudu/rpc/service_pool.cc
M src/kudu/rpc/service_pool.h
M src/kudu/server/diagnostics_log.cc
M src/kudu/server/diagnostics_log.h
M src/kudu/server/rpc_server.cc
M src/kudu/server/rpc_server.h
M src/kudu/server/server_base.cc
M src/kudu/server/server_base.h
10 files changed, 166 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/9375/2
--
To view, visit http://gerrit.cloudera.org:8080/9375
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8
Gerrit-Change-Number: 9375
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot