[kudu-CR] Make tcmalloc heap sampling more useful
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9263 ) Change subject: Make tcmalloc heap sampling more useful .. Make tcmalloc heap sampling more useful Previously our /pprof/heap path required that the process be started with lifetime heap profiling enabled. This mode of operation is very slow so we would never be able to use it in production. Even in tests it's of limited utility. The alternative is tcmalloc heap _sampling_ which only records periodic allocations. According to recent mailing list threads, Google runs most of their binaries with this enabled in production, so it's designed to be very low overhead and potentially useful for us in the field. This doesn't go so far as to enable it by default, but it does adjust the /pprof/heap endpoint to dump the sampled allocations instead of starting a profile. Sampling can be enabled using either the tcmalloc env variable (TCMALLOC_SAMPLE_PARAMETER) or a new command line flag --heap-sample-every-n-bytes. I tested this manually by running a kudu tserver with loadgen and sampling set to 500kb. I then used pprof to grab the heap sample information which looked like I expected it to (most of the memory from MRS) Change-Id: I939c2f01f17ceb4b9520bb566f66463952b2a255 Reviewed-on: http://gerrit.cloudera.org:8080/9263 Tested-by: Kudu Jenkins Reviewed-by: Will Berkeley--- M docs/troubleshooting.adoc M src/kudu/server/pprof_path_handlers.cc M src/kudu/util/flags.cc 3 files changed, 95 insertions(+), 26 deletions(-) Approvals: Kudu Jenkins: Verified Will Berkeley: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/9263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I939c2f01f17ceb4b9520bb566f66463952b2a255 Gerrit-Change-Number: 9263 Gerrit-PatchSet: 4 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] Make tcmalloc heap sampling more useful
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9263 ) Change subject: Make tcmalloc heap sampling more useful .. Patch Set 3: Errant ??- what I mean is I checked it builds on macOS -- To view, visit http://gerrit.cloudera.org:8080/9263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I939c2f01f17ceb4b9520bb566f66463952b2a255 Gerrit-Change-Number: 9263 Gerrit-PatchSet: 3 Gerrit-Owner: Todd LipconGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Thu, 22 Feb 2018 21:00:33 + Gerrit-HasComments: No
[kudu-CR] Make tcmalloc heap sampling more useful
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9263 ) Change subject: Make tcmalloc heap sampling more useful .. Patch Set 3: Code-Review+2 Builds on macOS ✔️ -- To view, visit http://gerrit.cloudera.org:8080/9263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I939c2f01f17ceb4b9520bb566f66463952b2a255 Gerrit-Change-Number: 9263 Gerrit-PatchSet: 3 Gerrit-Owner: Todd LipconGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Thu, 22 Feb 2018 20:59:46 + Gerrit-HasComments: No
[kudu-CR] Make tcmalloc heap sampling more useful
Hello Will Berkeley, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9263 to look at the new patch set (#3). Change subject: Make tcmalloc heap sampling more useful .. Make tcmalloc heap sampling more useful Previously our /pprof/heap path required that the process be started with lifetime heap profiling enabled. This mode of operation is very slow so we would never be able to use it in production. Even in tests it's of limited utility. The alternative is tcmalloc heap _sampling_ which only records periodic allocations. According to recent mailing list threads, Google runs most of their binaries with this enabled in production, so it's designed to be very low overhead and potentially useful for us in the field. This doesn't go so far as to enable it by default, but it does adjust the /pprof/heap endpoint to dump the sampled allocations instead of starting a profile. Sampling can be enabled using either the tcmalloc env variable (TCMALLOC_SAMPLE_PARAMETER) or a new command line flag --heap-sample-every-n-bytes. I tested this manually by running a kudu tserver with loadgen and sampling set to 500kb. I then used pprof to grab the heap sample information which looked like I expected it to (most of the memory from MRS) Change-Id: I939c2f01f17ceb4b9520bb566f66463952b2a255 --- M docs/troubleshooting.adoc M src/kudu/server/pprof_path_handlers.cc M src/kudu/util/flags.cc 3 files changed, 95 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/9263/3 -- To view, visit http://gerrit.cloudera.org:8080/9263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I939c2f01f17ceb4b9520bb566f66463952b2a255 Gerrit-Change-Number: 9263 Gerrit-PatchSet: 3 Gerrit-Owner: Todd LipconGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] Make tcmalloc heap sampling more useful
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9263 ) Change subject: Make tcmalloc heap sampling more useful .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/9263/2/src/kudu/server/pprof_path_handlers.cc File src/kudu/server/pprof_path_handlers.cc: http://gerrit.cloudera.org:8080/#/c/9263/2/src/kudu/server/pprof_path_handlers.cc@104 PS2, Line 104: message with instructions how to enable it > The message (tcmalloc.cc L592) just mentions the TCMALLOC_SAMPLE_PARAMETER Done http://gerrit.cloudera.org:8080/#/c/9263/2/src/kudu/util/flags.cc File src/kudu/util/flags.cc: http://gerrit.cloudera.org:8080/#/c/9263/2/src/kudu/util/flags.cc@88 PS2, Line 88: heap_sample_parameter > I know this name matches the one that tcmalloc uses, but I'd like a more de Done http://gerrit.cloudera.org:8080/#/c/9263/2/src/kudu/util/flags.cc@89 PS2, Line 89: N > What's N? I think this would also be fixed with a better name and adjusted Done -- To view, visit http://gerrit.cloudera.org:8080/9263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I939c2f01f17ceb4b9520bb566f66463952b2a255 Gerrit-Change-Number: 9263 Gerrit-PatchSet: 2 Gerrit-Owner: Todd LipconGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Tue, 13 Feb 2018 20:02:46 + Gerrit-HasComments: Yes
[kudu-CR] Make tcmalloc heap sampling more useful
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9263 ) Change subject: Make tcmalloc heap sampling more useful .. Patch Set 2: > You're telling me you can't symbolize addresses in your head?! :( > In that case you can use the pprof tool as follows: > > ./thirdparty/installed/uninstrumented/bin/pprof build/latest/bin/kudu-master > http://localhost:8051/pprof/heap Ah, nice. > I believe there is also a newer version of pprof implemented in go, That's right. Go has a nice package https://golang.org/pkg/net/http/pprof/ that basically gives a lot of this profiling kind of stuff exposed over an http interface for free just by importing it, plus a pprof tool built into the go toolchain for processing the result. -- To view, visit http://gerrit.cloudera.org:8080/9263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I939c2f01f17ceb4b9520bb566f66463952b2a255 Gerrit-Change-Number: 9263 Gerrit-PatchSet: 2 Gerrit-Owner: Todd LipconGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Tue, 13 Feb 2018 18:40:50 + Gerrit-HasComments: No
[kudu-CR] Make tcmalloc heap sampling more useful
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9263 ) Change subject: Make tcmalloc heap sampling more useful .. Patch Set 2: > Is that because it's macOS? If not, how does one read that? You're telling me you can't symbolize addresses in your head?! In that case you can use the pprof tool as follows: ./thirdparty/installed/uninstrumented/bin/pprof build/latest/bin/kudu-master http://localhost:8051/pprof/heap (or point it at a path to an already-downloaded heap profile instead of using HTTP if you collect one from some production server without the tool installed). I believe there is also a newer version of pprof implemented in go, but I haven't tried that one. I think they moved towards protobuf-based dumps instead of text in recent years. -- To view, visit http://gerrit.cloudera.org:8080/9263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I939c2f01f17ceb4b9520bb566f66463952b2a255 Gerrit-Change-Number: 9263 Gerrit-PatchSet: 2 Gerrit-Owner: Todd LipconGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Tue, 13 Feb 2018 18:24:14 + Gerrit-HasComments: No
[kudu-CR] Make tcmalloc heap sampling more useful
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9263 ) Change subject: Make tcmalloc heap sampling more useful .. Patch Set 2: (3 comments) I tried this out on my macOS system and the heap samples looked like this: heap profile: 93: 57580776 [93: 57580776] @ heap_v2/512000 14: 13993984 [14: 13993984] @ 0x10b9dce13 0x7fff57cc9201 0x7fff57cc850b 0x10d9386f9 0x10d938603 0x10d938c0f 0x10d939579 0x10d932137 0x10d92fbcb 0x10d93096c 0x10a19f012 0x10a1a09d4 0x10a1a08e4 0x10a1a0810 0x10a18b7c8 0x10a18a737 0x10a0498ee 0x10a05078b 0x10a04f755 0x10a110ff7 0x10a102d1f 0x10a1050a9 0x10a104ffc 0x10a104faa 0x10a5f08de 0x10da06ff9 0x10da0206f 0x10da1da69 0x10da1d9d0 0x10da1d97a 0x10da1d75d 5: 2621440 [ 5: 2621440] @ 0x10b9dce13 0x7fff57cc9201 0x7fff57cc850b 0x10d9386f9 0x10d938603 0x10d932137 0x10d932aeb 0x10d93377c 0x1099537d2 0x10a113dfd 0x10a113920 0x10a046164 0x10a10ff46 0x10a100115 0x10a0fef97 0x10a1050a9 0x10a104ffc 0x10a104faa 0x10a5f08de 0x10da06ff9 0x10da0206f 0x10da1da69 0x10da1d9d0 0x10da1d97a 0x10da1d75d 0x10b33e198 0x10d9ecb88 0x7fff57dac6c1 0x7fff57dac56d 0x7fff57dabc5d 2: 509952 [ 2: 509952] @ 0x10b9dccf1 0x7fff57cc9201 0x7fff57cc850b 0x10d9386f9 0x10d938603 0x10d938c0f 0x10d939579 0x10d932137 0x10d92fbcb 0x10d93096c 0x10a19f012 0x10a19edc4 0x10a1a28a9 0x10a1a14a1 0x10a1a0e13 0x10a18b8bf 0x10a18a80a 0x10a0498ee 0x10a05078b 0x10a04f755 0x10a110ff7 0x10a102d1f 0x10a1050a9 0x10a104ffc 0x10a104faa 0x10a5f08de 0x10da06ff9 0x10da0206f 0x10da1da69 0x10da1d9d0 0x10da1d97a 3: 192 [ 3: 192] @ 0x10b9dccf1 0x7fff57cc9201 0x7fff57cc850b 0x7fff55ba8628 0x10a1bd6ce 0x10a049964 0x10a05078b 0x10a04f755 0x10a110ff7 0x10a102d1f 0x10a1050a9 0x10a104ffc 0x10a104faa 0x10a5f08de 0x10da06ff9 0x10da0206f 0x10da1da69 0x10da1d9d0 0x10da1d97a 0x10da1d75d 0x10b33e198 0x10d9ecb88 0x7fff57dac6c1 0x7fff57dac56d 0x7fff57dabc5d 12: 12464128 [12: 12464128] @ 0x10b9dce13 0x7fff57cc9201 0x7fff57cc850b 0x10d9386f9 0x10d938603 0x10d938c0f 0x10d939579 0x10d932137 0x10d92fbcb 0x10d93096c 0x10a19f012 0x10a19edc4 0x10a1a28a9 0x10a1a14a1 0x10a1a0e13 0x10a18b8bf 0x10a18a80a 0x10a0498ee 0x10a05078b 0x10a04f755 0x10a110ff7 0x10a102d1f 0x10a1050a9 0x10a104ffc 0x10a104faa 0x10a5f08de 0x10da06ff9 0x10da0206f 0x10da1da69 0x10da1d9d0 0x10da1d97a Is that because it's macOS? If not, how does one read that? http://gerrit.cloudera.org:8080/#/c/9263/2/src/kudu/server/pprof_path_handlers.cc File src/kudu/server/pprof_path_handlers.cc: http://gerrit.cloudera.org:8080/#/c/9263/2/src/kudu/server/pprof_path_handlers.cc@104 PS2, Line 104: message with instructions how to enable it The message (tcmalloc.cc L592) just mentions the TCMALLOC_SAMPLE_PARAMETER environment variable and not the gflag. http://gerrit.cloudera.org:8080/#/c/9263/2/src/kudu/util/flags.cc File src/kudu/util/flags.cc: http://gerrit.cloudera.org:8080/#/c/9263/2/src/kudu/util/flags.cc@88 PS2, Line 88: heap_sample_parameter I know this name matches the one that tcmalloc uses, but I'd like a more descriptive name. Maybe something like "heap_sample_every_n_bytes"? That sort of name has precedents in our logging macros. http://gerrit.cloudera.org:8080/#/c/9263/2/src/kudu/util/flags.cc@89 PS2, Line 89: N What's N? I think this would also be fixed with a better name and adjusted description. -- To view, visit http://gerrit.cloudera.org:8080/9263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I939c2f01f17ceb4b9520bb566f66463952b2a255 Gerrit-Change-Number: 9263 Gerrit-PatchSet: 2 Gerrit-Owner: Todd LipconGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Tue, 13 Feb 2018 18:16:12 + Gerrit-HasComments: Yes
[kudu-CR] Make tcmalloc heap sampling more useful
Hello Will Berkeley, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9263 to look at the new patch set (#2). Change subject: Make tcmalloc heap sampling more useful .. Make tcmalloc heap sampling more useful Previously our /pprof/heap path required that the process be started with lifetime heap profiling enabled. This mode of operation is very slow so we would never be able to use it in production. Even in tests it's of limited utility. The alternative is tcmalloc heap _sampling_ which only records periodic allocations. According to recent mailing list threads, Google runs most of their binaries with this enabled in production, so it's designed to be very low overhead and potentially useful for us in the field. This doesn't go so far as to enable it by default, but it does adjust the /pprof/heap endpoint to dump the sampled allocations instead of starting a profile. Sampling can be enabled using either the tcmalloc env variable (TCMALLOC_SAMPLE_PARAMETER) or a new command line flag --heap_sample_parameter. I tested this manually by running a kudu tserver with loadgen and sampling set to 500kb. I then used pprof to grab the heap sample information which looked like I expected it to (most of the memory from MRS) Change-Id: I939c2f01f17ceb4b9520bb566f66463952b2a255 --- M src/kudu/server/pprof_path_handlers.cc M src/kudu/util/flags.cc 2 files changed, 42 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/9263/2 -- To view, visit http://gerrit.cloudera.org:8080/9263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I939c2f01f17ceb4b9520bb566f66463952b2a255 Gerrit-Change-Number: 9263 Gerrit-PatchSet: 2 Gerrit-Owner: Todd LipconGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley
[kudu-CR] Make tcmalloc heap sampling more useful
Hello Will Berkeley, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/9263 to review the following change. Change subject: Make tcmalloc heap sampling more useful .. Make tcmalloc heap sampling more useful Previously our /pprof/heap path required that the process be started with lifetime heap profiling enabled. This mode of operation is very slow so we would never be able to use it in production. Even in tests it's of limited utility. The alternative is tcmalloc heap _sampling_ which only records periodic allocations. According to recent mailing list threads, Google runs most of their binaries with this enabled in production, so it's designed to be very low overhead and potentially useful for us in the field. This doesn't go so far as to enable it by default, but it does adjust the /pprof/heap endpoint to dump the sampled allocations instead of starting a profile. Sampling can be enabled using either the tcmalloc env variable (TCMALLOC_SAMPLE_PARAMETER) or a new command line flag --heap_sample_parameter. I tested this manually by running a kudu tserver with loadgen and sampling set to 500kb. I then used pprof to grab the heap sample information which looked like I expected it to (most of the memory from MRS) Change-Id: I939c2f01f17ceb4b9520bb566f66463952b2a255 --- M src/kudu/server/pprof_path_handlers.cc M src/kudu/util/flags.cc 2 files changed, 41 insertions(+), 23 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/9263/1 -- To view, visit http://gerrit.cloudera.org:8080/9263 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I939c2f01f17ceb4b9520bb566f66463952b2a255 Gerrit-Change-Number: 9263 Gerrit-PatchSet: 1 Gerrit-Owner: Todd LipconGerrit-Reviewer: Will Berkeley