[Impala-ASF-CR] IMPALA-8191: Wait for additional breakpad processes during test

2019-02-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12501 )

Change subject: IMPALA-8191: Wait for additional breakpad processes during test
..


Patch Set 3: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/12501
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4dcc5fecb9b5f38ae1504aae40f099837cf1bca
Gerrit-Change-Number: 12501
Gerrit-PatchSet: 3
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 20 Feb 2019 04:06:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8191: Wait for additional breakpad processes during test

2019-02-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/12501 )

Change subject: IMPALA-8191: Wait for additional breakpad processes during test
..

IMPALA-8191: Wait for additional breakpad processes during test

The Breakpad signal handler forks off a process to write a minidump.
During the breakpad tests we send signals to the Impala daemons and then
wait for all processes to go away. Prior to this change we did this by
waiting on the PID returned by process.get_pid(). It is determined by
iterating over psutil.get_pid_list() which is an ordered list of PIDs
running on the system. We return the first process in the list with a
matching command line. In cases where the PID space rolled over, this
could have been the forked off breakpad process and we'd wait on that
one. During the subsequent check that all processes are indeed gone, we
could then pick up the original Impala daemon that had forked off to
write the minidump and was still in the process of shutting down.

To fix this, we wait for every process twice. Processes are identified
by their command and iterating through them twice makes sure we catch
both the original daemon and it's breakpad child.

This change also contains improvements to the logging of processes in
our tests. This should make it easier to identify similar issues in the
future.

Testing: I ran the breakpad tests in exhaustive mode. I didn't try to
exercise it around a PID roll-over, but we shouldn't see the issue in
IMPALA-8191 again.

Change-Id: Ia4dcc5fecb9b5f38ae1504aae40f099837cf1bca
Reviewed-on: http://gerrit.cloudera.org:8080/12501
Reviewed-by: Lars Volker 
Tested-by: Impala Public Jenkins 
---
M tests/common/impala_cluster.py
M tests/custom_cluster/test_breakpad.py
2 files changed, 55 insertions(+), 22 deletions(-)

Approvals:
  Lars Volker: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/12501
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ia4dcc5fecb9b5f38ae1504aae40f099837cf1bca
Gerrit-Change-Number: 12501
Gerrit-PatchSet: 4
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-8191: Wait for additional breakpad processes during test

2019-02-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12501 )

Change subject: IMPALA-8191: Wait for additional breakpad processes during test
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/2161/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12501
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4dcc5fecb9b5f38ae1504aae40f099837cf1bca
Gerrit-Change-Number: 12501
Gerrit-PatchSet: 2
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 20 Feb 2019 00:22:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8191: Wait for additional breakpad processes during test

2019-02-19 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12501 )

Change subject: IMPALA-8191: Wait for additional breakpad processes during test
..


Patch Set 2: Code-Review+2

I think get_pids() works better here. Thanks!


--
To view, visit http://gerrit.cloudera.org:8080/12501
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4dcc5fecb9b5f38ae1504aae40f099837cf1bca
Gerrit-Change-Number: 12501
Gerrit-PatchSet: 2
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 20 Feb 2019 00:03:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8191: Wait for additional breakpad processes during test

2019-02-19 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12501 )

Change subject: IMPALA-8191: Wait for additional breakpad processes during test
..


Patch Set 3: Code-Review+2

Rebased, carrying Phil's +2


--
To view, visit http://gerrit.cloudera.org:8080/12501
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4dcc5fecb9b5f38ae1504aae40f099837cf1bca
Gerrit-Change-Number: 12501
Gerrit-PatchSet: 3
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 20 Feb 2019 00:08:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8191: Wait for additional breakpad processes during test

2019-02-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12501 )

Change subject: IMPALA-8191: Wait for additional breakpad processes during test
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3795/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/12501
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4dcc5fecb9b5f38ae1504aae40f099837cf1bca
Gerrit-Change-Number: 12501
Gerrit-PatchSet: 3
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 20 Feb 2019 00:08:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8191: Wait for additional breakpad processes during test

2019-02-19 Thread Lars Volker (Code Review)
Hello Philip Zeyliger, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12501

to look at the new patch set (#2).

Change subject: IMPALA-8191: Wait for additional breakpad processes during test
..

IMPALA-8191: Wait for additional breakpad processes during test

The Breakpad signal handler forks off a process to write a minidump.
During the breakpad tests we send signals to the Impala daemons and then
wait for all processes to go away. Prior to this change we did this by
waiting on the PID returned by process.get_pid(). It is determined by
iterating over psutil.get_pid_list() which is an ordered list of PIDs
running on the system. We return the first process in the list with a
matching command line. In cases where the PID space rolled over, this
could have been the forked off breakpad process and we'd wait on that
one. During the subsequent check that all processes are indeed gone, we
could then pick up the original Impala daemon that had forked off to
write the minidump and was still in the process of shutting down.

To fix this, we wait for every process twice. Processes are identified
by their command and iterating through them twice makes sure we catch
both the original daemon and it's breakpad child.

This change also contains improvements to the logging of processes in
our tests. This should make it easier to identify similar issues in the
future.

Testing: I ran the breakpad tests in exhaustive mode. I didn't try to
exercise it around a PID roll-over, but we shouldn't see the issue in
IMPALA-8191 again.

Change-Id: Ia4dcc5fecb9b5f38ae1504aae40f099837cf1bca
---
M tests/common/impala_cluster.py
M tests/custom_cluster/test_breakpad.py
2 files changed, 55 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/12501/2
--
To view, visit http://gerrit.cloudera.org:8080/12501
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia4dcc5fecb9b5f38ae1504aae40f099837cf1bca
Gerrit-Change-Number: 12501
Gerrit-PatchSet: 2
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-8191: Wait for additional breakpad processes during test

2019-02-15 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12501 )

Change subject: IMPALA-8191: Wait for additional breakpad processes during test
..


Patch Set 1:

(3 comments)

Ok. Thanks for explaining. I'm fine with the tactical fix, but I think doing 
process.get_pids() is probably easy enough and will be less confusing.

http://gerrit.cloudera.org:8080/#/c/12501/1/tests/common/impala_cluster.py
File tests/common/impala_cluster.py:

http://gerrit.cloudera.org:8080/#/c/12501/1/tests/common/impala_cluster.py@321
PS1, Line 321: if set(self.cmd) == set(process.cmdline):
Do you know why this is set?


http://gerrit.cloudera.org:8080/#/c/12501/1/tests/common/impala_cluster.py@322
PS1, Line 322:   return pid
Instead of returning pid here, you could collect all matches, and then either 
pick the better one (parent) or fail? Would that have made this more obvious?


http://gerrit.cloudera.org:8080/#/c/12501/1/tests/custom_cluster/test_breakpad.py
File tests/custom_cluster/test_breakpad.py:

http://gerrit.cloudera.org:8080/#/c/12501/1/tests/custom_cluster/test_breakpad.py@88
PS1, Line 88:   def wait_for_all_processes_dead(self, processes, timeout=300):
: # For every process in the list we might see the original 
Impala process plus a forked
: # off child that is writing the minidump. To catch both we go 
through the list twice.
: for process in processes * 2:
> These are Process objects (https://github.com/apache/impala/blob/master/tes
OMG.

I would never have thought that "Process" is really "output of pgrep".

How would you feel about changing this to process.get_pids() here, so that it's 
more obvious that a process can get represented multiple times in this case, 
and we have an answer?



--
To view, visit http://gerrit.cloudera.org:8080/12501
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4dcc5fecb9b5f38ae1504aae40f099837cf1bca
Gerrit-Change-Number: 12501
Gerrit-PatchSet: 1
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Sat, 16 Feb 2019 00:27:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8191: Wait for additional breakpad processes during test

2019-02-15 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12501 )

Change subject: IMPALA-8191: Wait for additional breakpad processes during test
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12501/1/tests/custom_cluster/test_breakpad.py
File tests/custom_cluster/test_breakpad.py:

http://gerrit.cloudera.org:8080/#/c/12501/1/tests/custom_cluster/test_breakpad.py@88
PS1, Line 88:   def wait_for_all_processes_dead(self, processes, timeout=300):
: # For every process in the list we might see the original 
Impala process plus a forked
: # off child that is writing the minidump. To catch both we go 
through the list twice.
: for process in processes * 2:
> I don't understand this. processes here is a list of pids, and we wait() on
These are Process objects 
(https://github.com/apache/impala/blob/master/tests/common/impala_cluster.py#L280).
 They store their command and look up their PID by iterating through all PIDs 
on the system comparing them to their cmd. I think the idea behind this is that 
processes can restart, but of course we could track the expected state 
including its PID more explicitly. Let me know if you'd like me to explain what 
these are in the comment and/or file a Jira to improve the Process class to 
know its PID.



--
To view, visit http://gerrit.cloudera.org:8080/12501
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4dcc5fecb9b5f38ae1504aae40f099837cf1bca
Gerrit-Change-Number: 12501
Gerrit-PatchSet: 1
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Fri, 15 Feb 2019 23:10:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8191: Wait for additional breakpad processes during test

2019-02-15 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12501 )

Change subject: IMPALA-8191: Wait for additional breakpad processes during test
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12501/1/tests/custom_cluster/test_breakpad.py
File tests/custom_cluster/test_breakpad.py:

http://gerrit.cloudera.org:8080/#/c/12501/1/tests/custom_cluster/test_breakpad.py@88
PS1, Line 88:   def wait_for_all_processes_dead(self, processes, timeout=300):
: # For every process in the list we might see the original 
Impala process plus a forked
: # off child that is writing the minidump. To catch both we go 
through the list twice.
: for process in processes * 2:
I don't understand this. processes here is a list of pids, and we wait() on 
each of them. Why does waiting on them multiple times help things?

It sounded from your description in the commit message is that the issue is 
where we enumarate which process to wait for (the impalad or its fork), but 
that's not what this code is doing.

I'm sure I'm confused; let me know where!



--
To view, visit http://gerrit.cloudera.org:8080/12501
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4dcc5fecb9b5f38ae1504aae40f099837cf1bca
Gerrit-Change-Number: 12501
Gerrit-PatchSet: 1
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Fri, 15 Feb 2019 23:00:54 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8191: Wait for additional breakpad processes during test

2019-02-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12501 )

Change subject: IMPALA-8191: Wait for additional breakpad processes during test
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/2142/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12501
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4dcc5fecb9b5f38ae1504aae40f099837cf1bca
Gerrit-Change-Number: 12501
Gerrit-PatchSet: 1
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 15 Feb 2019 21:12:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8191: Wait for additional breakpad processes during test

2019-02-15 Thread Lars Volker (Code Review)
Lars Volker has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12501


Change subject: IMPALA-8191: Wait for additional breakpad processes during test
..

IMPALA-8191: Wait for additional breakpad processes during test

The Breakpad signal handler forks off a process to write a minidump.
During the breakpad tests we send signals to the Impala daemons and then
wait for all processes to go away. Prior to this change we did this by
waiting on the PID returned by process.get_pid(). It is determined by
iterating over psutil.get_pid_list() which is an ordered list of PIDs
running on the system. We return the first process in the list with a
matching command line. In cases where the PID space rolled over, this
could have been the forked off breakpad process and we'd wait on that
one. During the subsequent check that all processes are indeed gone, we
could then pick up the original Impala daemon that had forked off to
write the minidump and was still in the process of shutting down.

To fix this, we wait for every process twice. Processes are identified
by their command and iterating through them twice makes sure we catch
both the original daemon and it's breakpad child.

This change also contains improvements to the logging of processes in
our tests. This should make it easier to identify similar issues in the
future.

Testing: I ran the breakpad tests in exhaustive mode. I didn't try to
exercise it around a PID roll-over, but we shouldn't see the issue in
IMPALA-8191 again.

Change-Id: Ia4dcc5fecb9b5f38ae1504aae40f099837cf1bca
---
M tests/common/impala_cluster.py
M tests/custom_cluster/test_breakpad.py
2 files changed, 27 insertions(+), 14 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/12501/1
--
To view, visit http://gerrit.cloudera.org:8080/12501
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia4dcc5fecb9b5f38ae1504aae40f099837cf1bca
Gerrit-Change-Number: 12501
Gerrit-PatchSet: 1
Gerrit-Owner: Lars Volker