Adar Dembo has posted comments on this change.

Change subject: Add Google Breakpad support to Kudu
......................................................................


Patch Set 3:

(32 comments)

http://gerrit.cloudera.org:8080/#/c/5620/3//COMMIT_MSG
Commit Message:

Line 7: Add Google Breakpad support to Kudu
What does breakpad support mean for regular core dumps? Are they no longer 
enabled? Should we disable them? Are they even still useful? I'd like to see a 
paragraph here describing how breakpad and regular core dumps co-exist (or not).


PS3, Line 12: part
Nit: is part


PS3, Line 17: critical crash information
Since your audience here is Kudu developers, it would be nice to add a touch 
more detail on what information we're talking about. With a core dump I can get 
a backtrace of every thread, all thread stacks, and the heap. What can I get 
with a minidump?


Line 21: the default log directory for a given Kudu daemon process.
Why was the log directory chosen as a suitable default home?


http://gerrit.cloudera.org:8080/#/c/5620/3/src/kudu/util/env_posix.cc
File src/kudu/util/env_posix.cc:

Line 1246:       return IOError(Substitute("Unable to canonicalize $0", path), 
errno);
Maybe this was intended for the path_util patch? Doesn't really matter though.


http://gerrit.cloudera.org:8080/#/c/5620/3/src/kudu/util/minidump.cc
File src/kudu/util/minidump.cc:

Line 48: DEFINE_bool(enable_minidumps, true, "Whether to enable minidump 
collection upon process crash");
Should also tag it with evolving, I think.


Line 50: TAG_FLAG(enable_minidumps, runtime);
How is this safe for changing at runtime? If the initial value was false, 
RegisterMinidump() will return early, and a subsequent change to true will have 
no effect.


PS3, Line 57: Set to empty to "
            :     "disable writing minidump files.
> I see there is also 'enable_minidumps' flag which is the explicit control f
Agreed with Alexey; having two ways to disable minidumps 
(enable_minidumps=false and minidump_path="") is confusing.

If an empty string is invalid input, enforce that with a gflag validator (see 
RegisterFlagValidator).


PS3, Line 62: Set to 0 to keep all minidump files
Shouldn't we allow 0 for "delete all outstanding minidumps at startup", and 
then reserve -1 for "keep all"?


Line 76: static google_breakpad::ExceptionHandler* minidump_exception_handler = 
nullptr;
Our convention is to prefix global variables with g_.


PS3, Line 84: by breakpad 
Nit: drop this; it's implied by "Callback for breakpad".


PS3, Line 85: It logs the event before breakpad crashes
            : // the process
Oh, so breakpad crashes the process now? Previously glog was doing that for us; 
does this mean the CHECK event (failure log + backtrace) in the glog no longer 
exists?


PS3, Line 97: sizeof(msg) / sizeof(msg[0]) - 1;
If we're ok calling into my_strlen() for the path, let's use it here too for 
consistency (and because this expression is...weird).


PS3, Line 99: libc
Would be good to say _why_ we shouldn't call into libc here.


Line 118:   minidump_exception_handler->WriteMinidump(FLAGS_minidump_path, 
DumpCallback, nullptr);
You sure this is safe to call from a signal handler? The docs 
(https://chromium.googlesource.com/external/google-breakpad/src/+/master/client/linux/handler/exception_handler.h)
 say:

  // Note that this method is not supposed to be called from a compromised
  // context as it uses the heap.


Line 137:   Env* env = Env::Default();
You can get this from the caller.


PS3, Line 139:   // Search for minidumps. There could be multiple minidumps for 
a single second.
             :   multimap<int, string> timestamp_to_path;
FWIW, I think the approach used in the glog cleanup thread is nicer: a 
vector<pair<time_t, string>> that is sorted using std::sort. Then you don't 
have to worry about collisions dropping items from the map, and thus mtime's 
time granularity (1s on HFS+, far less on Linux filesystems) is probably good 
enough?


Line 147:     unique_ptr<SequentialFile> in;
I think this is overkill. We don't verify that files that with glog-looking 
filenames are actually glog files before deleting them; why do we need to do 
for minidumps? Besides, the "failure policy" here deletes the files instead of 
ignoring them. What's the intent here?

On a second pass, I see that we're reading the header in order to get the 
timestamp out of it. Can't we just use mtime on the file itself instead? 
Minidumps are read-only once created, right; shouldn't that be safe enough?


Line 158:     constexpr int kHeaderSize = sizeof(MDRawHeader);
So breakpad doesn't namespace its classes either? Would have expected this to 
be google_breakpad::MDRawHeader.


Line 203:   static bool registered = false;
Maybe you can use a google once on minidump_exception_handler instead?


PS3, Line 215:   if (!env->FileExists(FLAGS_minidump_path)) {
             :     RETURN_NOT_OK(env->CreateDir(FLAGS_minidump_path));
             :   }
Maybe CreateDirRecursively() here, so that it probably handles symlinks and 
errors out if the path exists as a regular file? Unless you explicitly want an 
error when --minidump_path=/a/b/c/d and only /a/b exists...


PS3, Line 221: Impala
> s/Impala/kudu ?
I think the intent was to explain that this behavior (in kudu/util/minidump.cc) 
is consistent with Impala's behavior (in Impala's repo).


PS3, Line 225:   // Create the directory if it is not there. The minidump 
doesn't get written if there is
             :   // no directory.
             :   if (!env->FileExists(FLAGS_minidump_path)) {
             :     RETURN_NOT_OK(env->CreateDir(FLAGS_minidump_path));
             :   }
Here too? Maybe combine them into one CreateDirRecursively() call, which is 
more clear anyway.


PS3, Line 242: because usually only process crashes will trigger the
             :   // creation of new minidump files (they are also created on 
SIGUSR1).
As you say, SIGUSR1 breaks that guarantee. What about doing what we did for 
logging and having a background thread that peridiocally checks the number of 
minidumps and removes old ones? We could even reuse the same thread (it sort of 
makes sense; a minidump is a form of logging). See 
StartExcessGlogDeleterThread() in server_base.h.


PS3, Line 252: << "."
Why the period at the end? Seems unusual for a Kudu LOG statement.


PS3, Line 259: desc, FilterCallback, DumpCallback, nullptr, true, -1);
Would be nice to reformat this so that one arg is on each line, then you can 
add inline comments explaining what each arg means (especially for nullptr, 
true, and -1).


http://gerrit.cloudera.org:8080/#/c/5620/3/src/kudu/util/minidump.h
File src/kudu/util/minidump.h:

Line 28: Status RegisterMinidump(const std::string& cmd_line_path);
If this is just a way to get the executable's argv[0], you can use gflags' 
ProgramInvocationShortName() inside the function itself. AFAICT breakpad 
registration happens after gflags so it should be safe.

Also, it would be nice to see some non-integration tests for this. For example, 
would it be possible to call RegisterMinidump in a death-test, crash, and 
verify that a dump was written?


http://gerrit.cloudera.org:8080/#/c/5620/3/thirdparty/build-definitions.sh
File thirdparty/build-definitions.sh:

Line 563:   make -j$PARALLEL install
Add $EXTRA_MAKEFLAGS too; see above.


Line 573:     xargs perl -p -i -e '@pre = qw(client common google_breakpad 
processor third_party); for $p (@pre) { s{#include "$p/}{#include 
"breakpad/$p/}; }'
Could you add perl to REQUIRED_TOOLS in preflight.py? This is the second usage 
(not the first), but we should do it anyway.


http://gerrit.cloudera.org:8080/#/c/5620/3/thirdparty/build-thirdparty.sh
File thirdparty/build-thirdparty.sh:

PS3, Line 284: F_ALL
> What is F_ALL?  I.e., where is it defined and why does it affect only the b
Agreed; did you mean F_UNINSTRUMENTED here and F_TSAN below?


http://gerrit.cloudera.org:8080/#/c/5620/3/thirdparty/scripts/make-breakpad-src-archive.sh
File thirdparty/scripts/make-breakpad-src-archive.sh:

Line 37: VERSION=${1:-$(git rev-parse HEAD)}
Nit: call this REVISION to be consistent with LSS_REVISION (and to make it 
clear that it's a git hash, not a tag or something).


Line 41: git archive --format=tar --prefix=$NAME/ $VERSION | (cd .. && tar xf -)
Is this a fancy way to copy and rename the tree to another location?


-- 
To view, visit http://gerrit.cloudera.org:8080/5620
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I495695cc38b75377f20b0497093a81ed5baa887f
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Mike Percy <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Dinesh Bhat <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to