[Impala-CR](cdh5-trunk) IMPALA-3677: Write minidump on SIGUSR1
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-3677: Write minidump on SIGUSR1 .. IMPALA-3677: Write minidump on SIGUSR1 Sending SIGUSR1 to any of the impala daemons (catalogd, impalad, statestored) will trigger a minidump write. The hotspot JVM also uses SIGUSR1 internally. However the documentation explains, that existing signal handlers will be transparently wrapped by the JVM and no spurious signals should be received by the daemon signal handler: http://www.oracle.com/technetwork/java/javase/signals-139944.html Example: killall -SIGUSR1 catalogd Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 Reviewed-on: http://gerrit.cloudera.org:8080/3312 Reviewed-by: Lars Volker Tested-by: Internal Jenkins --- M be/src/util/minidump.cc M tests/custom_cluster/test_breakpad.py 2 files changed, 74 insertions(+), 10 deletions(-) Approvals: Lars Volker: Looks good to me, approved Internal Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/3312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 Gerrit-PatchSet: 10 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-CR](cdh5-trunk) IMPALA-3677: Write minidump on SIGUSR1
Internal Jenkins has posted comments on this change. Change subject: IMPALA-3677: Write minidump on SIGUSR1 .. Patch Set 9: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/3312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 Gerrit-PatchSet: 9 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-CR](cdh5-trunk) IMPALA-3677: Write minidump on SIGUSR1
Lars Volker has posted comments on this change. Change subject: IMPALA-3677: Write minidump on SIGUSR1 .. Patch Set 9: Code-Review+2 +2 from Dan. Thanks for the reviews. -- To view, visit http://gerrit.cloudera.org:8080/3312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 Gerrit-PatchSet: 9 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-CR](cdh5-trunk) IMPALA-3677: Write minidump on SIGUSR1
Lars Volker has posted comments on this change.
Change subject: IMPALA-3677: Write minidump on SIGUSR1
..
Patch Set 8:
(1 comment)
http://gerrit.cloudera.org:8080/#/c/3312/8/be/src/util/minidump.cc
File be/src/util/minidump.cc:
Line 94: static void SetupSignalHandler() {
> DCHECK(minidump_exception_handler != NULL);
Done
--
To view, visit http://gerrit.cloudera.org:8080/3312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513
Gerrit-PatchSet: 8
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Lars Volker
Gerrit-Reviewer: Dan Hecht
Gerrit-Reviewer: David Knupp
Gerrit-Reviewer: Lars Volker
Gerrit-Reviewer: Tim Armstrong
Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-3677: Write minidump on SIGUSR1
Hello David Knupp, Dan Hecht, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3312 to look at the new patch set (#9). Change subject: IMPALA-3677: Write minidump on SIGUSR1 .. IMPALA-3677: Write minidump on SIGUSR1 Sending SIGUSR1 to any of the impala daemons (catalogd, impalad, statestored) will trigger a minidump write. The hotspot JVM also uses SIGUSR1 internally. However the documentation explains, that existing signal handlers will be transparently wrapped by the JVM and no spurious signals should be received by the daemon signal handler: http://www.oracle.com/technetwork/java/javase/signals-139944.html Example: killall -SIGUSR1 catalogd Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 --- M be/src/util/minidump.cc M tests/custom_cluster/test_breakpad.py 2 files changed, 74 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/12/3312/9 -- To view, visit http://gerrit.cloudera.org:8080/3312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 Gerrit-PatchSet: 9 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-CR](cdh5-trunk) IMPALA-3677: Write minidump on SIGUSR1
Dan Hecht has posted comments on this change.
Change subject: IMPALA-3677: Write minidump on SIGUSR1
..
Patch Set 8: Code-Review+2
(1 comment)
http://gerrit.cloudera.org:8080/#/c/3312/8/be/src/util/minidump.cc
File be/src/util/minidump.cc:
Line 94: static void SetupSignalHandler() {
DCHECK(minidump_exception_handler != NULL);
--
To view, visit http://gerrit.cloudera.org:8080/3312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513
Gerrit-PatchSet: 8
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Lars Volker
Gerrit-Reviewer: Dan Hecht
Gerrit-Reviewer: David Knupp
Gerrit-Reviewer: Lars Volker
Gerrit-Reviewer: Tim Armstrong
Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-3677: Write minidump on SIGUSR1
David Knupp has posted comments on this change. Change subject: IMPALA-3677: Write minidump on SIGUSR1 .. Patch Set 8: Code-Review+1 (1 comment) I think someone else (other than me) needs to give a +2 because of the .cc file, yes? http://gerrit.cloudera.org:8080/#/c/3312/7/tests/custom_cluster/test_breakpad.py File tests/custom_cluster/test_breakpad.py: Line 117: def wait_for_num_processes(self, daemon, num_expected, timeout=60): > Yes, my original assumption was that the time we need to write all minidump Sounds good to me. -- To view, visit http://gerrit.cloudera.org:8080/3312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 Gerrit-PatchSet: 8 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars Volker Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-3677: Write minidump on SIGUSR1
Lars Volker has posted comments on this change. Change subject: IMPALA-3677: Write minidump on SIGUSR1 .. Patch Set 7: (2 comments) Thanks for the reviews, please see PS8. http://gerrit.cloudera.org:8080/#/c/3312/7/tests/custom_cluster/test_breakpad.py File tests/custom_cluster/test_breakpad.py: Line 117: def wait_for_num_processes(self, daemon, num_expected, timeout=60): > This method seems to be doing a bit more than it needs to, which I'm mainly Yes, my original assumption was that the time we need to write all minidumps should never exceed 5 seconds (which was an more of an educated guess). The number of expected processes is at most cluster_size, which is the number of local impalads running on the same machine. At 100MB/sec and 2.5MB per file we should be able to write about 200 minidumps in 5 seconds, so that looks safe to me. However there could be unknown things affecting this time. Anyways, now there's a wait loop in place. Should I modify it to scale the timeout with num_expected. If you're ok I'd leave it at 5sec, as I'd be very interested to have pop up as a failure if it ever takes longer than this. PS7, Line 173: 3 > This could still be replaced with 'cluster_size', right? Done -- To view, visit http://gerrit.cloudera.org:8080/3312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 Gerrit-PatchSet: 7 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars Volker Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-3677: Write minidump on SIGUSR1
Hello David Knupp, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3312 to look at the new patch set (#8). Change subject: IMPALA-3677: Write minidump on SIGUSR1 .. IMPALA-3677: Write minidump on SIGUSR1 Sending SIGUSR1 to any of the impala daemons (catalogd, impalad, statestored) will trigger a minidump write. The hotspot JVM also uses SIGUSR1 internally. However the documentation explains, that existing signal handlers will be transparently wrapped by the JVM and no spurious signals should be received by the daemon signal handler: http://www.oracle.com/technetwork/java/javase/signals-139944.html Example: killall -SIGUSR1 catalogd Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 --- M be/src/util/minidump.cc M tests/custom_cluster/test_breakpad.py 2 files changed, 73 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/12/3312/8 -- To view, visit http://gerrit.cloudera.org:8080/3312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 Gerrit-PatchSet: 8 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars Volker Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-CR](cdh5-trunk) IMPALA-3677: Write minidump on SIGUSR1
David Knupp has posted comments on this change. Change subject: IMPALA-3677: Write minidump on SIGUSR1 .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/3312/7/tests/custom_cluster/test_breakpad.py File tests/custom_cluster/test_breakpad.py: Line 117: def wait_for_num_processes(self, daemon, num_expected, timeout=60): This method seems to be doing a bit more than it needs to, which I'm mainly deriving from the fact that it raises two different kinds of exceptions (a RuntimeException, and an AssertionError) for essentially the same reason -- finding the wrong number of processes. Since you're already calling calling this method within the context of an assert statement (down in your test function) I would consider simplifying wait_for_num_processes() so that it merely returns the number of processes it finds after trying for the allotted timeout. Then you could give the *caller* the responsibility of deciding what to do with that information, e.g., in your test, you could do something like: assert self.wait_for_num_processes(daemon='impalad', num_expected=cluster_size) == cluster_size Also, for what it's worth, this solution kind of misses the intent of my original comment/question, since it basically still just introduces a static sleep time (although it does have the possibility of returning sooner if the terminating condition of the while loop is met.) My original question (which no doubt stems from my inexperience with all of the testing permutations here) was really more about finding out whether a single static timeout would be valid in all possible test scenarios. In other words, will we be running this test on varying-sized clusters, or clusters that might be deployed in the cloud, versus locally, versus as a mini-cluster on a single machine? Does the size of the cluster or where it's deployed even matter? I didn't know. For what it's worth, that aspect could be addressed easily by making the default timeout a factor of cluster_size if None is given, e.g.: def wait_for_num_processes(self, daemon, num_expected, timeout=None): if timeout is None: timeout = 5 * num_expected end = time.time() + timeout etc... ...or something similar. I'm sorry if this all became more complicated than it deserved to be. PS7, Line 173: 3 This could still be replaced with 'cluster_size', right? -- To view, visit http://gerrit.cloudera.org:8080/3312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 Gerrit-PatchSet: 7 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars Volker Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-3677: Write minidump on SIGUSR1
Hello David Knupp, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3312 to look at the new patch set (#7). Change subject: IMPALA-3677: Write minidump on SIGUSR1 .. IMPALA-3677: Write minidump on SIGUSR1 Sending SIGUSR1 to any of the impala daemons (catalogd, impalad, statestored) will trigger a minidump write. The hotspot JVM also uses SIGUSR1 internally. However the documentation explains, that existing signal handlers will be transparently wrapped by the JVM and no spurious signals should be received by the daemon signal handler: http://www.oracle.com/technetwork/java/javase/signals-139944.html Example: killall -SIGUSR1 catalogd Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 --- M be/src/util/minidump.cc M tests/custom_cluster/test_breakpad.py 2 files changed, 78 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/12/3312/7 -- To view, visit http://gerrit.cloudera.org:8080/3312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 Gerrit-PatchSet: 7 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars Volker Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-CR](cdh5-trunk) IMPALA-3677: Write minidump on SIGUSR1
Lars Volker has posted comments on this change. Change subject: IMPALA-3677: Write minidump on SIGUSR1 .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/3312/6/tests/custom_cluster/test_breakpad.py File tests/custom_cluster/test_breakpad.py: PS6, Line 121: while time.time() < end: This loop also needs to sleep some instead of spinning. -- To view, visit http://gerrit.cloudera.org:8080/3312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 Gerrit-PatchSet: 6 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars Volker Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-3677: Write minidump on SIGUSR1
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3677: Write minidump on SIGUSR1 .. Patch Set 6: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/3312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 Gerrit-PatchSet: 6 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars Volker Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-CR](cdh5-trunk) IMPALA-3677: Write minidump on SIGUSR1
Lars Volker has posted comments on this change. Change subject: IMPALA-3677: Write minidump on SIGUSR1 .. Patch Set 6: Please ignore PS5 and see PS6 instead. -- To view, visit http://gerrit.cloudera.org:8080/3312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 Gerrit-PatchSet: 6 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars Volker Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-CR](cdh5-trunk) IMPALA-3677: Write minidump on SIGUSR1
Lars Volker has posted comments on this change. Change subject: IMPALA-3677: Write minidump on SIGUSR1 .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/3312/3/tests/custom_cluster/test_breakpad.py File tests/custom_cluster/test_breakpad.py: PS3, Line 149: time.sleep(5) > Oops -- sorry! I didn't see Tim's comment. If there's consensus that adding I added a timed loop in PS5. It was a bit of a hassle, as the breakpad forks to write minidumps on request, so there were extra processes in the way, even when the files had already started to show up. I ended up waiting on the number or processes instead, and found IMPALA-3898 on the way. :) -- To view, visit http://gerrit.cloudera.org:8080/3312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 Gerrit-PatchSet: 3 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars Volker Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-3677: Write minidump on SIGUSR1
Hello David Knupp, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3312 to look at the new patch set (#6). Change subject: IMPALA-3677: Write minidump on SIGUSR1 .. IMPALA-3677: Write minidump on SIGUSR1 Sending SIGUSR1 to any of the impala daemons (catalogd, impalad, statestored) will trigger a minidump write. The hotspot JVM also uses SIGUSR1 internally. However the documentation explains, that existing signal handlers will be transparently wrapped by the JVM and no spurious signals should be received by the daemon signal handler: http://www.oracle.com/technetwork/java/javase/signals-139944.html Example: killall -SIGUSR1 catalogd Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 --- M be/src/util/minidump.cc M tests/custom_cluster/test_breakpad.py 2 files changed, 76 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/12/3312/6 -- To view, visit http://gerrit.cloudera.org:8080/3312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 Gerrit-PatchSet: 6 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars Volker Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-CR](cdh5-trunk) IMPALA-3677: Write minidump on SIGUSR1
Hello David Knupp, Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3312 to look at the new patch set (#5). Change subject: IMPALA-3677: Write minidump on SIGUSR1 .. IMPALA-3677: Write minidump on SIGUSR1 Sending SIGUSR1 to any of the impala daemons (catalogd, impalad, statestored) will trigger a minidump write. The hotspot JVM also uses SIGUSR1 internally. However the documentation explains, that existing signal handlers will be transparently wrapped by the JVM and no spurious signals should be received by the daemon signal handler: http://www.oracle.com/technetwork/java/javase/signals-139944.html Example: killall -SIGUSR1 catalogd Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 --- M be/src/util/minidump.cc M tests/custom_cluster/test_breakpad.py 2 files changed, 77 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/12/3312/5 -- To view, visit http://gerrit.cloudera.org:8080/3312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 Gerrit-PatchSet: 5 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars Volker Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-CR](cdh5-trunk) IMPALA-3677: Write minidump on SIGUSR1
David Knupp has posted comments on this change. Change subject: IMPALA-3677: Write minidump on SIGUSR1 .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/3312/3/tests/custom_cluster/test_breakpad.py File tests/custom_cluster/test_breakpad.py: PS3, Line 149: time.sleep(5) > If it's not worth the effort, then I agree - even if it's not ideal, leave Oops -- sorry! I didn't see Tim's comment. If there's consensus that adding a loop is the right thing to do, then that's obviously fine with me too. -- To view, visit http://gerrit.cloudera.org:8080/3312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 Gerrit-PatchSet: 3 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars Volker Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-3677: Write minidump on SIGUSR1
David Knupp has posted comments on this change. Change subject: IMPALA-3677: Write minidump on SIGUSR1 .. Patch Set 4: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/3312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 Gerrit-PatchSet: 4 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars Volker Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-CR](cdh5-trunk) IMPALA-3677: Write minidump on SIGUSR1
David Knupp has posted comments on this change. Change subject: IMPALA-3677: Write minidump on SIGUSR1 .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/3312/3/tests/custom_cluster/test_breakpad.py File tests/custom_cluster/test_breakpad.py: PS3, Line 143: assert self.count_all_minidumps() == 0 > We use self.tmp_dir to write the minidumps to, so it should always be empty I think it's probably fine to leave as is -- I do see that most of the tests in this module follow this same pattern. For what it's worth, in my previous experience, I've tended to see assertions used for failures to meet the passing criteria of the test expectations, and other kinds of exceptions for setup / framework / environment errors that had nothing to do with the product under test. (e.g. the test framework I worked with in the past had its own custom exceptions library for this kind of thing.) As a general practice, this check probably is a little more defensive than necessary. Assuming the call to tempfile.mkdtemp() works, there's no reason to believe it would have anything other than 0 files in it. If there were a problem creating the temp dir, one presumes that the tempfile module would itself throw an exception before this line ever executes. PS3, Line 149: time.sleep(5) > I agree with you, this is not optimal. For killing the processes we already If it's not worth the effort, then I agree - even if it's not ideal, leave it as is until it poses a problem. -- To view, visit http://gerrit.cloudera.org:8080/3312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 Gerrit-PatchSet: 3 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars Volker Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-3677: Write minidump on SIGUSR1
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3677: Write minidump on SIGUSR1 .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/3312/3/tests/custom_cluster/test_breakpad.py File tests/custom_cluster/test_breakpad.py: PS3, Line 149: time.sleep(5) > I agree with you, this is not optimal. For killing the processes we already I think David makes a good point. I think it's less effort to add the retry loop now than it is to debug a flaky test in the future, especially if we run into problems with code coverage. -- To view, visit http://gerrit.cloudera.org:8080/3312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 Gerrit-PatchSet: 3 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars Volker Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-3677: Write minidump on SIGUSR1
Lars Volker has posted comments on this change. Change subject: IMPALA-3677: Write minidump on SIGUSR1 .. Patch Set 3: (3 comments) Thank you David for the review. Let me know if those need more explanation in the comments. http://gerrit.cloudera.org:8080/#/c/3312/3/tests/custom_cluster/test_breakpad.py File tests/custom_cluster/test_breakpad.py: PS3, Line 143: assert self.count_all_minidumps() == 0 > Is raising an AssertionError the right thing to do if count_all_minidumps() We use self.tmp_dir to write the minidumps to, so it should always be empty at the beginning of a test. I thought of these assertions more as a way to document what the expectations of the tests are, plus a last line of defense in case something goes wrong, or someone disables using a temporary directory. They should never fail, and if they do we can investigate why. Should I add a comment or a log message to explain this? PS3, Line 149: time.sleep(5) > Bearing in mind that I don't know really anything about how long it takes t I agree with you, this is not optimal. For killing the processes we already have a timed wait loop, and they can sometimes take a long time to terminate, especially when writing core dumps, too. Minidumps of short-lived daemons are usually around 2.5MB large, so I expect them to be written fairly quickly and would be surprised if it ever took longer than 5 seconds. I also thought about adding a wait loop here, but then I'd set it to 5 seconds still, just so we get notified in case this takes longer. Therefore I figured it's not worth the effort for now, and that we should add a loop if 5s ever becomes a problem. Do you think I should elaborate on this in a comment? PS3, Line 152: assert len(self.cluster.impalads) == 3 > Should we always assume that the number of impalads is 3? Since you saved t Done -- To view, visit http://gerrit.cloudera.org:8080/3312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 Gerrit-PatchSet: 3 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars Volker Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-3677: Write minidump on SIGUSR1
Hello Tim Armstrong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3312 to look at the new patch set (#4). Change subject: IMPALA-3677: Write minidump on SIGUSR1 .. IMPALA-3677: Write minidump on SIGUSR1 Sending SIGUSR1 to any of the impala daemons (catalogd, impalad, statestored) will trigger a minidump write. The hotspot JVM also uses SIGUSR1 internally. However the documentation explains, that existing signal handlers will be transparently wrapped by the JVM and no spurious signals should be received by the daemon signal handler: http://www.oracle.com/technetwork/java/javase/signals-139944.html Example: killall -SIGUSR1 catalogd Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 --- M be/src/util/minidump.cc M tests/custom_cluster/test_breakpad.py 2 files changed, 53 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/12/3312/4 -- To view, visit http://gerrit.cloudera.org:8080/3312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 Gerrit-PatchSet: 4 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars Volker Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-CR](cdh5-trunk) IMPALA-3677: Write minidump on SIGUSR1
David Knupp has posted comments on this change.
Change subject: IMPALA-3677: Write minidump on SIGUSR1
..
Patch Set 3:
(3 comments)
http://gerrit.cloudera.org:8080/#/c/3312/3/tests/custom_cluster/test_breakpad.py
File tests/custom_cluster/test_breakpad.py:
PS3, Line 143: assert self.count_all_minidumps() == 0
Is raising an AssertionError the right thing to do if count_all_minidumps() is
not 0 at the very start of the test? Does that just mean that there are
minidump artifacts left over from a previous test before this one has even
started? If that's so, it doesn't seem to really be a case of SIGUSR1 failing
to write minidumps.
Perhaps there should be some kind cleanup process we could implement instead --
to clear out any *.dmp files before calling start_cluster()? If that were
written as, say, a local test fixture or setup method, then perhaps all of the
tests in this module could use it?
PS3, Line 149: time.sleep(5)
Bearing in mind that I don't know really anything about how long it takes to
write a minidump, I'm always a bit leery when I see a hard-coded sleep in a
test. It generally tends to be something of a test anti-pattern. E.g., if we
ever start to run tests on a much larger cluster, would 5 still suffice? Or is
5 just such an imaginably long period time that it's guaranteed to always work?
Would it make sense to move the various asserts involving count_minidumps()
here -- since this is where we expect them to be created -- and instead use
some kind of loop to keep checking until self.count_minidumps('impalad')
returns cluster_size, setting some limit to the number of iterations before we
force the test to fail?
Just asking, by the way. Maybe that's more trouble than it's worth. I'm
perfectly happy to accept that 5 is the right answer.
PS3, Line 152: assert len(self.cluster.impalads) == 3
Should we always assume that the number of impalads is 3? Since you saved the
cluster_size to a variable in an earlier line before sending the SIGUSR1,
wouldn't it work here to assert that len(self.cluster.impalads) is the same
after the refresh as whatever cluster_size was before?
--
To view, visit http://gerrit.cloudera.org:8080/3312
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513
Gerrit-PatchSet: 3
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Lars Volker
Gerrit-Reviewer: David Knupp
Gerrit-Reviewer: Lars Volker
Gerrit-Reviewer: Tim Armstrong
Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-3677: Write minidump on SIGUSR1
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3677: Write minidump on SIGUSR1 .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/3312/2//COMMIT_MSG Commit Message: Line 13: Example: killall -SIGUSR1 catalogd > True. This has happened to me multiple times now. Any ideas what might caus Not sure. I don't use the gerrit hook so I haven't seen the problem. -- To view, visit http://gerrit.cloudera.org:8080/3312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars Volker Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-3677: Write minidump on SIGUSR1
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3677: Write minidump on SIGUSR1 .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/3312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 Gerrit-PatchSet: 3 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars Volker Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-CR](cdh5-trunk) IMPALA-3677: Write minidump on SIGUSR1
Lars Volker has posted comments on this change. Change subject: IMPALA-3677: Write minidump on SIGUSR1 .. Patch Set 2: (1 comment) Thanks for the review. I went over the java documentation, too, and could not find any reason why we should receive spurious signals. I added the link to the commit message. Please see PS3. http://gerrit.cloudera.org:8080/#/c/3312/2//COMMIT_MSG Commit Message: Line 13: Example: killall -SIGUSR1 catalogd > The Change-Id got put in a weird place True. This has happened to me multiple times now. Any ideas what might cause this to happen? -- To view, visit http://gerrit.cloudera.org:8080/3312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars Volker Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-3677: Write minidump on SIGUSR1
Lars Volker has uploaded a new patch set (#3). Change subject: IMPALA-3677: Write minidump on SIGUSR1 .. IMPALA-3677: Write minidump on SIGUSR1 Sending SIGUSR1 to any of the impala daemons (catalogd, impalad, statestored) will trigger a minidump write. The hotspot JVM also uses SIGUSR1 internally. However the documentation explains, that existing signal handlers will be transparently wrapped by the JVM and no spurious signals should be received by the daemon signal handler: http://www.oracle.com/technetwork/java/javase/signals-139944.html Example: killall -SIGUSR1 catalogd Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 --- M be/src/util/minidump.cc M tests/custom_cluster/test_breakpad.py 2 files changed, 53 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/12/3312/3 -- To view, visit http://gerrit.cloudera.org:8080/3312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 Gerrit-PatchSet: 3 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars Volker Gerrit-Reviewer: Tim Armstrong
[Impala-CR](cdh5-trunk) IMPALA-3677: Write minidump on SIGUSR1
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3677: Write minidump on SIGUSR1 .. Patch Set 2: (1 comment) It looks like there's some possible interaction with the JVM, which uses the signal for implementing a thread interrupt method: http://www.oracle.com/technetwork/java/javase/signals-139944.html Based on my reading of the docs, it sounds like we should be ok (the JVM should not send us spurious signals SIGUSR1 signals on Linux), but could you also double-check the docs? http://gerrit.cloudera.org:8080/#/c/3312/2//COMMIT_MSG Commit Message: Line 13: Example: killall -SIGUSR1 catalogd The Change-Id got put in a weird place -- To view, visit http://gerrit.cloudera.org:8080/3312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars Volker Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-CR](cdh5-trunk) IMPALA-3677: Write minidump on SIGUSR1
Lars Volker has uploaded a new patch set (#2). Change subject: IMPALA-3677: Write minidump on SIGUSR1 .. IMPALA-3677: Write minidump on SIGUSR1 Sending SIGUSR1 to any of the impala daemons (catalogd, impalad, statestored) will trigger a minidump write. Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 Example: killall -SIGUSR1 catalogd --- M be/src/util/minidump.cc M tests/custom_cluster/test_breakpad.py 2 files changed, 50 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/12/3312/2 -- To view, visit http://gerrit.cloudera.org:8080/3312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 Gerrit-PatchSet: 2 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars Volker
[Impala-CR](cdh5-trunk) IMPALA-3677: Write minidump on SIGUSR1
Lars Volker has uploaded a new change for review. http://gerrit.cloudera.org:8080/3312 Change subject: IMPALA-3677: Write minidump on SIGUSR1 .. IMPALA-3677: Write minidump on SIGUSR1 Sending SIGUSR1 to any of the impala daemons (catalogd, impalad, statestored) will trigger a minidump write. Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 Example: killall -SIGUSR1 catalogd --- M be/src/util/minidump.cc M tests/custom_cluster/test_breakpad.py 2 files changed, 42 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/12/3312/1 -- To view, visit http://gerrit.cloudera.org:8080/3312 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I40149e48e391451de21a5c8bda18e2307fc89513 Gerrit-PatchSet: 1 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Lars Volker
