[kudu-CR] KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9375 ) Change subject: KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow .. KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow In the past we've had several bugs and performance issues which manifest themselves as service queue overflows. It would be easier to pinpoint the root cause (eg slow IO vs lock contention) if we had a process-wide stack snapshot at the time of the overflow. This patch does some plumbing to trigger such a stack trace into the diagnostics log when a service queue overflow occurs. The logging is throttled to once every 5 seconds so that we don't contribute too much to the load itself. The plumbing is a bit ugly since the diagnostics log is in the server/ module whereas the overflows surface in the rpc/ module. Having rpc/ call back into server/ would be a cyclic dependency, so instead the rpc/ module exposes a std::function<> style callback for the server to hook into. Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8 Reviewed-on: http://gerrit.cloudera.org:8080/9375 Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo--- M src/kudu/master/master-test.cc M src/kudu/master/mini_master.h M src/kudu/rpc/service_pool.cc M src/kudu/rpc/service_pool.h M src/kudu/server/diagnostics_log.cc M src/kudu/server/diagnostics_log.h M src/kudu/server/rpc_server.cc M src/kudu/server/rpc_server.h M src/kudu/server/server_base.cc M src/kudu/server/server_base.h 10 files changed, 185 insertions(+), 19 deletions(-) Approvals: Kudu Jenkins: Verified Adar Dembo: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/9375 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8 Gerrit-Change-Number: 9375 Gerrit-PatchSet: 7 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9375 ) Change subject: KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/9375 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8 Gerrit-Change-Number: 9375 Gerrit-PatchSet: 6 Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 08 Mar 2018 18:50:45 + Gerrit-HasComments: No
[kudu-CR] KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow
Hello Tidy Bot, Mike Percy, Kudu Jenkins, Adar Dembo, Sailesh Mukil, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9375 to look at the new patch set (#6). Change subject: KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow .. KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow In the past we've had several bugs and performance issues which manifest themselves as service queue overflows. It would be easier to pinpoint the root cause (eg slow IO vs lock contention) if we had a process-wide stack snapshot at the time of the overflow. This patch does some plumbing to trigger such a stack trace into the diagnostics log when a service queue overflow occurs. The logging is throttled to once every 5 seconds so that we don't contribute too much to the load itself. The plumbing is a bit ugly since the diagnostics log is in the server/ module whereas the overflows surface in the rpc/ module. Having rpc/ call back into server/ would be a cyclic dependency, so instead the rpc/ module exposes a std::function<> style callback for the server to hook into. Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8 --- M src/kudu/master/master-test.cc M src/kudu/master/mini_master.h M src/kudu/rpc/service_pool.cc M src/kudu/rpc/service_pool.h M src/kudu/server/diagnostics_log.cc M src/kudu/server/diagnostics_log.h M src/kudu/server/rpc_server.cc M src/kudu/server/rpc_server.h M src/kudu/server/server_base.cc M src/kudu/server/server_base.h 10 files changed, 185 insertions(+), 19 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/9375/6 -- To view, visit http://gerrit.cloudera.org:8080/9375 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8 Gerrit-Change-Number: 9375 Gerrit-PatchSet: 6 Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9375 ) Change subject: KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/diagnostics_log.cc File src/kudu/server/diagnostics_log.cc: http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/diagnostics_log.cc@196 PS5, Line 196: WARN_NOT_OK(s, "Unable to collect stacks to diagnostics log"); > The next patch in this series gets rid of this anyway, so I'll skip this on Sure. http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/diagnostics_log.cc@201 PS5, Line 201: dump_stacks_now_reason_ = boost::none; > I actually think that's somewhat advantageous because if you get "requested I don't think it's a big deal. The only rebuttal I can come up with is that if I'm troubleshooting I might find it weird to see N "service queue overflowed" messages but only N-1 or N-2 stack dumps. But that's pretty contrived, plus the relationship between overflows and dumps is already murky due to throttling. -- To view, visit http://gerrit.cloudera.org:8080/9375 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8 Gerrit-Change-Number: 9375 Gerrit-PatchSet: 5 Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 08 Mar 2018 03:20:44 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9375 ) Change subject: KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow .. Patch Set 5: (7 comments) http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/master/master-test.cc File src/kudu/master/master-test.cc: http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/master/master-test.cc@829 PS5, Line 829: Env::Default() > KuduTest has an env_ member you can use. Below too. Done http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/master/master-test.cc@835 PS5, Line 835: return line.find("service queue overflowed for kudu.master.MasterService") != string::npos; > Would MatchPattern be more idiomatic? Won't have to worry about string::npo Done http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/rpc/service_pool.h File src/kudu/rpc/service_pool.h: http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/rpc/service_pool.h@67 PS5, Line 67: void set_reject_too_busy_hook(std::functionhook) { > Nit: if this is supposed to look like a setter for too_busy_hook_, I think Done http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/rpc/service_pool.cc File src/kudu/rpc/service_pool.cc: http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/rpc/service_pool.cc@136 PS5, Line 136: if (too_busy_hook_) { : too_busy_hook_(); : } > You may get a less noisy stack if you call this before responding to the RP I think though it's best to respond as quickly as possible because it frees up the memory used by the inbound request http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/diagnostics_log.cc File src/kudu/server/diagnostics_log.cc: http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/diagnostics_log.cc@196 PS5, Line 196: WARN_NOT_OK(s, "Unable to collect stacks to diagnostics log"); > Nit: "Will try again in ..." ? The next patch in this series gets rid of this anyway, so I'll skip this one if you don't mind. http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/diagnostics_log.cc@201 PS5, Line 201: dump_stacks_now_reason_ = boost::none; > Why do this here and not on L187, right after storing a local copy in 'reas I actually think that's somewhat advantageous because if you get "requested" to dump stacks while you're already dumping stacks, there isn't really any need to wake up and dump stacks again (presumably your previous dump was already more accurate). Again if you disagree would rather change it in the follow-up patch which restructures this code path. In the end though it doesn't really matter much since it's not a super likely race and the only harm is an extra (or missing) stack dump which in most cases no one will ever even look at. http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/rpc_server.h File src/kudu/server/rpc_server.h: http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/rpc_server.h@62 PS5, Line 62: void set_reject_too_busy_hook(std::function hook) { > Nit: same comment about setter naming. Done -- To view, visit http://gerrit.cloudera.org:8080/9375 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8 Gerrit-Change-Number: 9375 Gerrit-PatchSet: 5 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 08 Mar 2018 02:55:43 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/9375 ) Change subject: KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/diagnostics_log.cc File src/kudu/server/diagnostics_log.cc: http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/diagnostics_log.cc@118 PS5, Line 118: MutexLock l(lock_); > just because dump_stacks_now_reason_ is mutated from the logger thread too, Ah right, nvm then. -- To view, visit http://gerrit.cloudera.org:8080/9375 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8 Gerrit-Change-Number: 9375 Gerrit-PatchSet: 5 Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 07 Mar 2018 21:19:35 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9375 ) Change subject: KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/diagnostics_log.cc File src/kudu/server/diagnostics_log.cc: http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/diagnostics_log.cc@118 PS5, Line 118: MutexLock l(lock_); > Why bother locking this? It seems to me like when RunThread() gives up the just because dump_stacks_now_reason_ is mutated from the logger thread too, and this may be called concurrently from multiple threads, so it would be unsafe to mutate it without a lock, right? std::string is not threadsafe -- To view, visit http://gerrit.cloudera.org:8080/9375 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8 Gerrit-Change-Number: 9375 Gerrit-PatchSet: 5 Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 07 Mar 2018 20:38:27 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/9375 ) Change subject: KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/rpc/service_pool.cc File src/kudu/rpc/service_pool.cc: http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/rpc/service_pool.cc@136 PS5, Line 136: if (too_busy_hook_) { : too_busy_hook_(); : } You may get a less noisy stack if you call this before responding to the RPC. Since responding to the RPC would start off some reactor thread tasks asynchronously. But the trade off is that you'll reply to this RPC a little later. Not a big deal, but I just thought I'd bring it up. http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/diagnostics_log.cc File src/kudu/server/diagnostics_log.cc: http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/diagnostics_log.cc@118 PS5, Line 118: MutexLock l(lock_); Why bother locking this? It seems to me like when RunThread() gives up the lock to do the actual logging, the 'dump_stacks_now_reason_' can be overwritten anyway by another thread calling DumpStacksNow(). Unless I'm missing something. -- To view, visit http://gerrit.cloudera.org:8080/9375 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8 Gerrit-Change-Number: 9375 Gerrit-PatchSet: 5 Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Wed, 07 Mar 2018 20:34:51 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9375 ) Change subject: KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow .. Patch Set 5: (6 comments) http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/master/master-test.cc File src/kudu/master/master-test.cc: http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/master/master-test.cc@829 PS5, Line 829: Env::Default() KuduTest has an env_ member you can use. Below too. http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/master/master-test.cc@835 PS5, Line 835: return line.find("service queue overflowed for kudu.master.MasterService") != string::npos; Would MatchPattern be more idiomatic? Won't have to worry about string::npos comparisons then. http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/rpc/service_pool.h File src/kudu/rpc/service_pool.h: http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/rpc/service_pool.h@67 PS5, Line 67: void set_reject_too_busy_hook(std::functionhook) { Nit: if this is supposed to look like a setter for too_busy_hook_, I think set_too_busy_hook() would be a better name. http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/diagnostics_log.cc File src/kudu/server/diagnostics_log.cc: http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/diagnostics_log.cc@196 PS5, Line 196: WARN_NOT_OK(s, "Unable to collect stacks to diagnostics log"); Nit: "Will try again in ..." ? http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/diagnostics_log.cc@201 PS5, Line 201: dump_stacks_now_reason_ = boost::none; Why do this here and not on L187, right after storing a local copy in 'reason'? By doing it down here it's possible for another thread to overwrite dump_stacks_now_reason_ during the stack logging. http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/rpc_server.h File src/kudu/server/rpc_server.h: http://gerrit.cloudera.org:8080/#/c/9375/5/src/kudu/server/rpc_server.h@62 PS5, Line 62: void set_reject_too_busy_hook(std::function hook) { Nit: same comment about setter naming. -- To view, visit http://gerrit.cloudera.org:8080/9375 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8 Gerrit-Change-Number: 9375 Gerrit-PatchSet: 5 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Wed, 07 Mar 2018 19:27:44 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow
Hello Tidy Bot, Mike Percy, Kudu Jenkins, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9375 to look at the new patch set (#5). Change subject: KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow .. KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow In the past we've had several bugs and performance issues which manifest themselves as service queue overflows. It would be easier to pinpoint the root cause (eg slow IO vs lock contention) if we had a process-wide stack snapshot at the time of the overflow. This patch does some plumbing to trigger such a stack trace into the diagnostics log when a service queue overflow occurs. The logging is throttled to once every 5 seconds so that we don't contribute too much to the load itself. The plumbing is a bit ugly since the diagnostics log is in the server/ module whereas the overflows surface in the rpc/ module. Having rpc/ call back into server/ would be a cyclic dependency, so instead the rpc/ module exposes a std::function<> style callback for the server to hook into. Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8 --- M src/kudu/master/master-test.cc M src/kudu/master/mini_master.h M src/kudu/rpc/service_pool.cc M src/kudu/rpc/service_pool.h M src/kudu/server/diagnostics_log.cc M src/kudu/server/diagnostics_log.h M src/kudu/server/rpc_server.cc M src/kudu/server/rpc_server.h M src/kudu/server/server_base.cc M src/kudu/server/server_base.h 10 files changed, 179 insertions(+), 19 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/9375/5 -- To view, visit http://gerrit.cloudera.org:8080/9375 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8 Gerrit-Change-Number: 9375 Gerrit-PatchSet: 5 Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow
Hello Tidy Bot, Mike Percy, Kudu Jenkins, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9375 to look at the new patch set (#4). Change subject: KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow .. KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow In the past we've had several bugs and performance issues which manifest themselves as service queue overflows. It would be easier to pinpoint the root cause (eg slow IO vs lock contention) if we had a process-wide stack snapshot at the time of the overflow. This patch does some plumbing to trigger such a stack trace into the diagnostics log when a service queue overflow occurs. The logging is throttled to once every 5 seconds so that we don't contribute too much to the load itself. The plumbing is a bit ugly since the diagnostics log is in the server/ module whereas the overflows surface in the rpc/ module. Having rpc/ call back into server/ would be a cyclic dependency, so instead the rpc/ module exposes a std::function<> style callback for the server to hook into. Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8 --- M src/kudu/master/master-test.cc M src/kudu/master/mini_master.h M src/kudu/rpc/service_pool.cc M src/kudu/rpc/service_pool.h M src/kudu/server/diagnostics_log.cc M src/kudu/server/diagnostics_log.h M src/kudu/server/rpc_server.cc M src/kudu/server/rpc_server.h M src/kudu/server/server_base.cc M src/kudu/server/server_base.h 10 files changed, 170 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/9375/4 -- To view, visit http://gerrit.cloudera.org:8080/9375 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8 Gerrit-Change-Number: 9375 Gerrit-PatchSet: 4 Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow
Hello Tidy Bot, Mike Percy, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9375 to look at the new patch set (#3). Change subject: KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow .. KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow In the past we've had several bugs and performance issues which manifest themselves as service queue overflows. It would be easier to pinpoint the root cause (eg slow IO vs lock contention) if we had a process-wide stack snapshot at the time of the overflow. This patch does some plumbing to trigger such a stack trace into the diagnostics log when a service queue overflow occurs. The logging is throttled to once every 5 seconds so that we don't contribute too much to the load itself. The plumbing is a bit ugly since the diagnostics log is in the server/ module whereas the overflows surface in the rpc/ module. Having rpc/ call back into server/ would be a cyclic dependency, so instead the rpc/ module exposes a std::function<> style callback for the server to hook into. Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8 --- M src/kudu/master/master-test.cc M src/kudu/master/mini_master.h M src/kudu/rpc/service_pool.cc M src/kudu/rpc/service_pool.h M src/kudu/server/diagnostics_log.cc M src/kudu/server/diagnostics_log.h M src/kudu/server/rpc_server.cc M src/kudu/server/rpc_server.h M src/kudu/server/server_base.cc M src/kudu/server/server_base.h 10 files changed, 169 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/9375/3 -- To view, visit http://gerrit.cloudera.org:8080/9375 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8 Gerrit-Change-Number: 9375 Gerrit-PatchSet: 3 Gerrit-Owner: Todd LipconGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow
Hello Tidy Bot, Mike Percy, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9375 to look at the new patch set (#2). Change subject: KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow .. KUDU-2297 (part 5): dump stacks into diagnostics log on service queue overflow In the past we've had several bugs and performance issues which manifest themselves as service queue overflows. It would be easier to pinpoint the root cause (eg slow IO vs lock contention) if we had a process-wide stack snapshot at the time of the overflow. This patch does some plumbing to trigger such a stack trace into the diagnostics log when a service queue overflow occurs. The logging is throttled to once every 5 seconds so that we don't contribute too much to the load itself. The plumbing is a bit ugly since the diagnostics log is in the server/ module whereas the overflows surface in the rpc/ module. Having rpc/ call back into server/ would be a cyclic dependency, so instead the rpc/ module exposes a std::function<> style callback for the server to hook into. Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8 --- M src/kudu/master/master-test.cc M src/kudu/master/mini_master.h M src/kudu/rpc/service_pool.cc M src/kudu/rpc/service_pool.h M src/kudu/server/diagnostics_log.cc M src/kudu/server/diagnostics_log.h M src/kudu/server/rpc_server.cc M src/kudu/server/rpc_server.h M src/kudu/server/server_base.cc M src/kudu/server/server_base.h 10 files changed, 166 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/9375/2 -- To view, visit http://gerrit.cloudera.org:8080/9375 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I62019c71121d7ca4037cab86a7c2ea048a261ad8 Gerrit-Change-Number: 9375 Gerrit-PatchSet: 2 Gerrit-Owner: Todd LipconGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot