Mike Percy has posted comments on this change.

Change subject: Add Google Breakpad support to Kudu
......................................................................


Patch Set 3:

(46 comments)

http://gerrit.cloudera.org:8080/#/c/5620/3//COMMIT_MSG
Commit Message:

Line 7: Add Google Breakpad support to Kudu
> What does breakpad support mean for regular core dumps? Are they no longer 
Done


PS3, Line 12: part
> Nit: is part
Done


PS3, Line 17: critical crash information
> Since your audience here is Kudu developers, it would be nice to add a touc
It's well documented; I'll add a link to the docs.


Line 21: the default log directory for a given Kudu daemon process.
> Why was the log directory chosen as a suitable default home?
Added a note to the commit message.


Line 36: 
> It would be good to include a section on how-to go about debugging the mini
I added a link to the breakpad docs, which explain the process. But we should 
also follow up with formal used documentation about use of breakpad in Kudu for 
the next release.


http://gerrit.cloudera.org:8080/#/c/5620/3/src/kudu/integration-tests/minidump_generation-itest.cc
File src/kudu/integration-tests/minidump_generation-itest.cc:

PS3, Line 21: #include <glog/logging.h>
> nit: is it needed?
Not anymore. Removed


Line 21: #include <glog/logging.h>
> nit: consider adding <gtest/gtest.h>
I don't think we need that. We get it from the base class.


PS3, Line 22: #include <glog/stl_logging.h>
> nit: is it needed?
Removed


PS3, Line 24: #include "kudu/gutil/strings/util.h"
> nit: is it needed?
Removed


PS3, Line 26: #include "kudu/util/logging.h"
> nit: is it needed?
Removed


Line 28: #include "kudu/util/subprocess.h"
> nit: consider including "kudu/util/monotime.h" (for MonoDelta)
Done


Line 66: 
> You might want to add a test for writing minidumps on SIGUSR1, especially s
Good idea. Done.


http://gerrit.cloudera.org:8080/#/c/5620/3/src/kudu/master/master_main.cc
File src/kudu/master/master_main.cc:

PS3, Line 60: argv[0]
> nit: consider adding a var for argv[0] and using it around instead.
Especially since it is read-only, I personally find passing in argv[0] to make 
the intention clearer.


http://gerrit.cloudera.org:8080/#/c/5620/3/src/kudu/util/env_posix.cc
File src/kudu/util/env_posix.cc:

Line 1246:       return IOError(Substitute("Unable to canonicalize $0", path), 
errno);
> Maybe this was intended for the path_util patch? Doesn't really matter thou
It could go either way. It doesn't really belong to anything, it's just a minor 
improvement.


http://gerrit.cloudera.org:8080/#/c/5620/3/src/kudu/util/minidump.cc
File src/kudu/util/minidump.cc:

Line 48: DEFINE_bool(enable_minidumps, true, "Whether to enable minidump 
collection upon process crash");
> Should also tag it with evolving, I think.
Done


Line 50: TAG_FLAG(enable_minidumps, runtime);
> How is this safe for changing at runtime? If the initial value was false, R
Yes, that changed during development. Removed.


PS3, Line 56: nly be written when a daemon exits "
            :     "unexpectedly,
> This is actually not true anymore (but used to be), even for Impala, since 
Thanks for the catch. Done


PS3, Line 57: Set to empty to "
            :     "disable writing minidump files.
> Agreed with Alexey; having two ways to disable minidumps (enable_minidumps=
You guys bring up a good point. I took your advice, Adar, thanks for the 
suggestion.


PS3, Line 62: Set to 0 to keep all minidump files
> Shouldn't we allow 0 for "delete all outstanding minidumps at startup", and
After discussing this with Lars and due to plans for adding a minidump reaper 
thread, if we changed to the suggested semantics then enable_minidumps=true 
with max_minidumps=0 would be nonintuitive and appear broken, since the deleter 
thread would remove the dumps created via SIGUSR1. So I think it's better to 
keep this flag the way it is.


Line 76: static google_breakpad::ExceptionHandler* minidump_exception_handler = 
nullptr;
> Our convention is to prefix global variables with g_.
Done


PS3, Line 84: by breakpad 
> Nit: drop this; it's implied by "Callback for breakpad".
Done


PS3, Line 85: It logs the event before breakpad crashes
            : // the process
> Oh, so breakpad crashes the process now? Previously glog was doing that for
I hadn't considered this, but it turned out to be an issue.

Per IMPALA-3656, Breakpad is supposed to work with glog 0.3.4 but I had to 
backport a patch from glog master to get CHECK() to both display a stack trace 
and generate a minidump. I'll have to check with Lars about whether they had to 
do the same backport.

In any case, was able to address the previous issues with this. Now, we get 
both a minidump and a stack trace on a CHECK() or a fatal signal.


PS3, Line 97: sizeof(msg) / sizeof(msg[0]) - 1;
> If we're ok calling into my_strlen() for the path, let's use it here too fo
Done


PS3, Line 99: libc
> Would be good to say _why_ we shouldn't call into libc here.
Done


Line 118:   minidump_exception_handler->WriteMinidump(FLAGS_minidump_path, 
DumpCallback, nullptr);
> You sure this is safe to call from a signal handler? The docs (https://chro
This is the USR1 signal handler. I'll rename it and add some comments to 
clarify.


Line 137:   Env* env = Env::Default();
> You can get this from the caller.
Done


PS3, Line 139:   // Search for minidumps. There could be multiple minidumps for 
a single second.
             :   multimap<int, string> timestamp_to_path;
> FWIW, I think the approach used in the glog cleanup thread is nicer: a vect
sgtm, done


Line 147:     unique_ptr<SequentialFile> in;
> I think this is overkill. We don't verify that files that with glog-looking
This code was borrowed from Impala, and I tend to agree with you that simpler 
is probably better. I reused the logic from the glog cleaner thread.


Line 158:     constexpr int kHeaderSize = sizeof(MDRawHeader);
> So breakpad doesn't namespace its classes either? Would have expected this 
yup. not namespaced.


PS3, Line 160: Slice
> Do we explicitly want to include #include "kudu/util/slice.h" for this ?
Done


PS3, Line 164: Attempting to delete it
> ok that makes sense, perhaps we could just reorder the error as "failed to 
I simplified this whole bit of logic. Now, if we can't delete the excess files 
on startup, it will be a fatal error. Otherwise, it generates a warning that 
blocks rotation of additional files.


Line 203:   static bool registered = false;
> Maybe you can use a google once on minidump_exception_handler instead?
I thought about it (this code came from Impala), and we could, but this is 
process-wide initialization stuff, so we don't need to worry about thread 
safety.


Line 207:   if (!FLAGS_enable_minidumps || FLAGS_minidump_path.empty()) return 
Status::OK();
> If you skip registering the signal handler for SIGUSR1 here and someone sen
Thanks for the catch. Fixed.


PS3, Line 215:   if (!env->FileExists(FLAGS_minidump_path)) {
             :     RETURN_NOT_OK(env->CreateDir(FLAGS_minidump_path));
             :   }
> Maybe CreateDirRecursively() here, so that it probably handles symlinks and
I intended to use CreateDirRecursively() here, which is why I wrote it, but 
Todd had a problem with it in a previous review. I tend to agree with you on 
this. I'll ask Todd to chime back in here.


PS3, Line 221: Impala
> I think the intent was to explain that this behavior (in kudu/util/minidump
Adar is correct


PS3, Line 225:   // Create the directory if it is not there. The minidump 
doesn't get written if there is
             :   // no directory.
             :   if (!env->FileExists(FLAGS_minidump_path)) {
             :     RETURN_NOT_OK(env->CreateDir(FLAGS_minidump_path));
             :   }
> Here too? Maybe combine them into one CreateDirRecursively() call, which is
See above; that's how I originally wrote it in CR1


PS3, Line 242: because usually only process crashes will trigger the
             :   // creation of new minidump files (they are also created on 
SIGUSR1).
> As you say, SIGUSR1 breaks that guarantee. What about doing what we did for
Done


PS3, Line 252: << "."
> Why the period at the end? Seems unusual for a Kudu LOG statement.
Done


PS3, Line 259: desc, FilterCallback, DumpCallback, nullptr, true, -1);
> Would be nice to reformat this so that one arg is on each line, then you ca
Done


http://gerrit.cloudera.org:8080/#/c/5620/3/src/kudu/util/minidump.h
File src/kudu/util/minidump.h:

Line 28: Status RegisterMinidump(const std::string& cmd_line_path);
> If this is just a way to get the executable's argv[0], you can use gflags' 
Done and done.


http://gerrit.cloudera.org:8080/#/c/5620/3/thirdparty/build-definitions.sh
File thirdparty/build-definitions.sh:

Line 563:   make -j$PARALLEL install
> Add $EXTRA_MAKEFLAGS too; see above.
Done


Line 573:     xargs perl -p -i -e '@pre = qw(client common google_breakpad 
processor third_party); for $p (@pre) { s{#include "$p/}{#include 
"breakpad/$p/}; }'
> Could you add perl to REQUIRED_TOOLS in preflight.py? This is the second us
Done


http://gerrit.cloudera.org:8080/#/c/5620/3/thirdparty/build-thirdparty.sh
File thirdparty/build-thirdparty.sh:

PS3, Line 284: F_ALL
> Agreed; did you mean F_UNINSTRUMENTED here and F_TSAN below?
Yeah, bad rebase. Fixed.


http://gerrit.cloudera.org:8080/#/c/5620/3/thirdparty/download-thirdparty.sh
File thirdparty/download-thirdparty.sh:

Line 244: 
> nit: an extra-line, seems like a random change
Done


http://gerrit.cloudera.org:8080/#/c/5620/3/thirdparty/scripts/make-breakpad-src-archive.sh
File thirdparty/scripts/make-breakpad-src-archive.sh:

Line 37: VERSION=${1:-$(git rev-parse HEAD)}
> Nit: call this REVISION to be consistent with LSS_REVISION (and to make it 
Done


Line 41: git archive --format=tar --prefix=$NAME/ $VERSION | (cd .. && tar xf -)
> Is this a fancy way to copy and rename the tree to another location?
Yeah... we export only checked-in files, without the git metadata, and 
construct the expected directory structure so we can run the build. Then we tar 
it up. I'll add some comments.


-- 
To view, visit http://gerrit.cloudera.org:8080/5620
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I495695cc38b75377f20b0497093a81ed5baa887f
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Dinesh Bhat <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to