[kudu-CR] KUDU-2208 Add unit test to detect Subprocess Interruption Handling failure and patch to fix the bug

2018-01-12 Thread Jeffrey F. Lukman (Code Review)
Hello Kudu Jenkins, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9015

to look at the new patch set (#8).

Change subject: KUDU-2208 Add unit test to detect Subprocess Interruption 
Handling failure and patch to fix the bug
..

KUDU-2208 Add unit test to detect Subprocess Interruption
Handling failure and patch to fix the bug

This patch submits a unit test that create a Subprocess thread
and while it is starting and waiting, another thread sends kill signals
to the Subprocess thread. Since the pthread_kill() signal is sent
asynchronously, sometimes the pthread_kill() raises error signal 3.
In that condition, the unit test logs the incident as an INFO.

Prior to this bug fix patch, the Subprocess throws an exception
because it cannot handle the kill signal in Wait() state. To fix the bug,
we add RETRY_ON_EINTR() inside Subprocess::DoWait() function.

Furthermore, this patch also moves the RETRY_ON_EINTR() function
that occurs in many places across the code to os-util.h.

Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333
---
M src/kudu/consensus/log_index.cc
M src/kudu/util/env_posix.cc
M src/kudu/util/net/socket.cc
M src/kudu/util/os-util.h
M src/kudu/util/subprocess-test.cc
M src/kudu/util/subprocess.cc
6 files changed, 69 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/9015/8
--
To view, visit http://gerrit.cloudera.org:8080/9015
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333
Gerrit-Change-Number: 9015
Gerrit-PatchSet: 8
Gerrit-Owner: Jeffrey F. Lukman 
Gerrit-Reviewer: Jeffrey F. Lukman 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2208 Add unit test to detect Subprocess Interruption Handling failure and patch to fix the bug

2018-01-12 Thread Jeffrey F. Lukman (Code Review)
Hello Kudu Jenkins, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9015

to look at the new patch set (#7).

Change subject: KUDU-2208 Add unit test to detect Subprocess Interruption 
Handling failure and patch to fix the bug
..

KUDU-2208 Add unit test to detect Subprocess Interruption
Handling failure and patch to fix the bug

This patch submits a unit test that create a Subprocess thread
and while it is starting and waiting, another thread sends kill signals
to the Subprocess thread. Since the pthread_kill() signal is sent
asynchronously, sometimes the pthread_kill() raises error signal 3.
In that condition, the unit test logs the incident as an INFO.

Prior to this bug fix patch, the Subprocess throws an exception
because it cannot handle the kill signal in Wait() state. To fix the bug,
we add RETRY_ON_EINTR() inside Subprocess::DoWait() function.

Furthermore, this patch also moves the RETRY_ON_EINTR() function
that occurs in many places across the code to os-util.h.

Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333
---
M src/kudu/consensus/log_index.cc
M src/kudu/util/env_posix.cc
M src/kudu/util/net/socket.cc
M src/kudu/util/os-util.h
M src/kudu/util/subprocess-test.cc
M src/kudu/util/subprocess.cc
6 files changed, 65 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/9015/7
--
To view, visit http://gerrit.cloudera.org:8080/9015
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333
Gerrit-Change-Number: 9015
Gerrit-PatchSet: 7
Gerrit-Owner: Jeffrey F. Lukman 
Gerrit-Reviewer: Jeffrey F. Lukman 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2208 Add unit test to detect Subprocess Interruption Handling failure and patch to fix the bug

2018-01-12 Thread Jeffrey F. Lukman (Code Review)
Jeffrey F. Lukman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9015 )

Change subject: KUDU-2208 Add unit test to detect Subprocess Interruption 
Handling failure and patch to fix the bug
..


Patch Set 6:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/9015/5/src/kudu/util/os-util.h
File src/kudu/util/os-util.h:

http://gerrit.cloudera.org:8080/#/c/9015/5/src/kudu/util/os-util.h@33
PS5, Line 33: #define RETRY_ON_EINTR(err, expr) do { \
> need to #include  in this header so that we get access to std:
Done


http://gerrit.cloudera.org:8080/#/c/9015/5/src/kudu/util/subprocess-test.cc
File src/kudu/util/subprocess-test.cc:

http://gerrit.cloudera.org:8080/#/c/9015/5/src/kudu/util/subprocess-test.cc@28
PS5, Line 28: #include 
> per Alexey's comment please separate out the C includes from the C++ ones h
Done


http://gerrit.cloudera.org:8080/#/c/9015/5/src/kudu/util/subprocess-test.cc@312
PS5, Line 312: t_started = true;
> still think we should add a short sleep here so that we ensure that the sig
Done


http://gerrit.cloudera.org:8080/#/c/9015/5/src/kudu/util/subprocess-test.cc@313
PS5, Line 313: SleepFor(MonoDelta::FromMilliseconds(50));
> nit: we have to use CHECK_OK here because gtest isn't thread-safe to run as
Done


http://gerrit.cloudera.org:8080/#/c/9015/5/src/kudu/util/subprocess-test.cc@332
PS5, Line 332:   ASSERT_TRUE(pthread_kill(t, SIGUSR2) == 0);
> but the "user defined signal 2 error" is the bug we're trying to provoke he
Done



--
To view, visit http://gerrit.cloudera.org:8080/9015
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333
Gerrit-Change-Number: 9015
Gerrit-PatchSet: 6
Gerrit-Owner: Jeffrey F. Lukman 
Gerrit-Reviewer: Jeffrey F. Lukman 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 12 Jan 2018 23:51:13 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2208 Add unit test to detect Subprocess Interruption Handling failure and patch to fix the bug

2018-01-12 Thread Jeffrey F. Lukman (Code Review)
Hello Kudu Jenkins, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9015

to look at the new patch set (#6).

Change subject: KUDU-2208 Add unit test to detect Subprocess Interruption 
Handling failure and patch to fix the bug
..

KUDU-2208 Add unit test to detect Subprocess Interruption
Handling failure and patch to fix the bug

This patch submits a unit test that create a Subprocess thread
and while it is starting and waiting, another thread sends kill signals
to the Subprocess thread. Prior to the bug fix in this patch,
the Subprocess throws an exception because it cannot handle
the kill signal in Wait() state. The bug fix adds RETRY_ON_EINTR()
at Subprocess::DoWait() function.

Furthermore, this patch also moves the RETRY_ON_EINTR() function
that occurs in many places across the code to os-util.h.

Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333
---
M src/kudu/consensus/log_index.cc
M src/kudu/util/env_posix.cc
M src/kudu/util/net/socket.cc
M src/kudu/util/os-util.h
M src/kudu/util/subprocess-test.cc
M src/kudu/util/subprocess.cc
6 files changed, 60 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/9015/6
--
To view, visit http://gerrit.cloudera.org:8080/9015
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333
Gerrit-Change-Number: 9015
Gerrit-PatchSet: 6
Gerrit-Owner: Jeffrey F. Lukman 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2208 Add unit test to detect Subprocess Interruption Handling failure and patch to fix the bug

2018-01-12 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9015 )

Change subject: KUDU-2208 Add unit test to detect Subprocess Interruption 
Handling failure and patch to fix the bug
..


Patch Set 5:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/9015/5/src/kudu/util/os-util.h
File src/kudu/util/os-util.h:

http://gerrit.cloudera.org:8080/#/c/9015/5/src/kudu/util/os-util.h@33
PS5, Line 33:   static_assert(std::is_signed::value, \
need to #include  in this header so that we get access to 
std::is_signed. Also whatever header has EINTR in it (not sure which one)


http://gerrit.cloudera.org:8080/#/c/9015/5/src/kudu/util/subprocess-test.cc
File src/kudu/util/subprocess-test.cc:

http://gerrit.cloudera.org:8080/#/c/9015/5/src/kudu/util/subprocess-test.cc@28
PS5, Line 28: #include 
per Alexey's comment please separate out the C includes from the C++ ones 
https://google.github.io/styleguide/cppguide.html#Names_and_Order_of_Includes


http://gerrit.cloudera.org:8080/#/c/9015/5/src/kudu/util/subprocess-test.cc@312
PS5, Line 312: t_started = true;
still think we should add a short sleep here so that we ensure that the 
signal-sending starts before we call p.Start


http://gerrit.cloudera.org:8080/#/c/9015/5/src/kudu/util/subprocess-test.cc@313
PS5, Line 313: ASSERT_OK(p.Start());
nit: we have to use CHECK_OK here because gtest isn't thread-safe to run 
assertions from non-main threads


http://gerrit.cloudera.org:8080/#/c/9015/5/src/kudu/util/subprocess-test.cc@332
PS5, Line 332:   // Add delay to avoid user defined signal 2 error
but the "user defined signal 2 error" is the bug we're trying to provoke here. 
So if we are still seeing a crash due to it, then that means we haven't fully 
fixed the bug, right?



--
To view, visit http://gerrit.cloudera.org:8080/9015
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333
Gerrit-Change-Number: 9015
Gerrit-PatchSet: 5
Gerrit-Owner: Jeffrey F. Lukman 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 12 Jan 2018 20:56:24 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2208 Add unit test to detect Subprocess Interruption Handling failure and patch to fix the bug

2018-01-12 Thread Jeffrey F. Lukman (Code Review)
Hello Kudu Jenkins, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9015

to look at the new patch set (#5).

Change subject: KUDU-2208 Add unit test to detect Subprocess Interruption 
Handling failure and patch to fix the bug
..

KUDU-2208 Add unit test to detect Subprocess Interruption
Handling failure and patch to fix the bug

This patch submits a unit test that create a Subprocess thread
and while it is starting and waiting, another thread sends kill signals
to the Subprocess thread. Prior to the bug fix in this patch,
the Subprocess throws an exception because it cannot handle
the kill signal in Wait() state. The bug fix adds RETRY_ON_EINTR()
at Subprocess::DoWait() function.

Furthermore, this patch also moves the RETRY_ON_EINTR() function
that occurs in many places across the code to os-util.h.

Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333
---
M src/kudu/consensus/log_index.cc
M src/kudu/util/env_posix.cc
M src/kudu/util/net/socket.cc
M src/kudu/util/os-util.h
M src/kudu/util/subprocess-test.cc
M src/kudu/util/subprocess.cc
6 files changed, 60 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/9015/5
--
To view, visit http://gerrit.cloudera.org:8080/9015
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333
Gerrit-Change-Number: 9015
Gerrit-PatchSet: 5
Gerrit-Owner: Jeffrey F. Lukman 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2208 Add unit test to detect Subprocess Interruption Handling failure and patch to fix the bug

2018-01-12 Thread Jeffrey F. Lukman (Code Review)
Hello Kudu Jenkins, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9015

to look at the new patch set (#3).

Change subject: KUDU-2208 Add unit test to detect Subprocess Interruption 
Handling failure and patch to fix the bug
..

KUDU-2208 Add unit test to detect Subprocess Interruption Handling failure and 
patch to fix the bug

This patch submits a unit test that create a Subprocess thread
and while it is starting and waiting, another thread sends kill signals
to the Subprocess thread. Prior to the bug fix in this patch,
the Subprocess throws an exception because it cannot handle
the kill signal in Wait() state. The bug fix adds RETRY_ON_EINTR()
at Subprocess::DoWait() function.

Furthermore, this patch also moves the RETRY_ON_EINTR() function
that occurs in many places across the code to os-util.h.

Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333
---
M src/kudu/consensus/log_index.cc
M src/kudu/util/env_posix.cc
M src/kudu/util/net/socket.cc
M src/kudu/util/os-util.h
M src/kudu/util/subprocess-test.cc
M src/kudu/util/subprocess.cc
6 files changed, 57 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/9015/3
--
To view, visit http://gerrit.cloudera.org:8080/9015
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333
Gerrit-Change-Number: 9015
Gerrit-PatchSet: 3
Gerrit-Owner: Jeffrey F. Lukman 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2208 Add unit test to detect Subprocess Interruption Handling failure and patch to fix the bug

2018-01-11 Thread Jeffrey F. Lukman (Code Review)
Hello Kudu Jenkins, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9015

to look at the new patch set (#2).

Change subject: KUDU-2208 Add unit test to detect Subprocess Interruption 
Handling failure and patch to fix the bug
..

KUDU-2208 Add unit test to detect Subprocess Interruption Handling failure and 
patch to fix the bug

This patch submits a unit test that create a Subprocess thread
and while it is starting and waiting, another thread sends kill signals
to the Subprocess thread. Prior to the bug fix in this patch,
the Subprocess throws an exception because it cannot handle
the kill signal in Wait() state. The bug fix adds RETRY_ON_EINTR()
at Subprocess::DoWait() function.

Furthermore, this patch also moves the RETRY_ON_EINTR() function
that occurs in many places across the code to os-util.h.

Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333
---
M src/kudu/consensus/log_index.cc
M src/kudu/util/env_posix.cc
M src/kudu/util/net/socket.cc
M src/kudu/util/os-util.h
M src/kudu/util/subprocess-test.cc
M src/kudu/util/subprocess.cc
6 files changed, 57 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/9015/2
--
To view, visit http://gerrit.cloudera.org:8080/9015
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333
Gerrit-Change-Number: 9015
Gerrit-PatchSet: 2
Gerrit-Owner: Jeffrey F. Lukman 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon