[kudu-CR] KUDU-2126: Add conditional check to prevent unnecessary fsyncs
Jeffrey F. Lukman has posted comments on this change. ( http://gerrit.cloudera.org:8080/9069 ) Change subject: KUDU-2126: Add conditional check to prevent unnecessary fsyncs .. Patch Set 3: > Quick question: did you try running this test without the fix? Does > the test fail? Yes, I did. Without the fix, the test failed. -- To view, visit http://gerrit.cloudera.org:8080/9069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idde8b413374f43ca2ef09339f0f412208f03685e Gerrit-Change-Number: 9069 Gerrit-PatchSet: 3 Gerrit-Owner: Jeffrey F. LukmanGerrit-Reviewer: Jeffrey F. Lukman Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Sat, 20 Jan 2018 03:50:38 + Gerrit-HasComments: No
[kudu-CR] KUDU-2126: Add conditional check to prevent unnecessary fsyncs
Jeffrey F. Lukman has posted comments on this change. ( http://gerrit.cloudera.org:8080/9069 ) Change subject: KUDU-2126: Add conditional check to prevent unnecessary fsyncs .. Patch Set 3: (9 comments) http://gerrit.cloudera.org:8080/#/c/9069/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9069/2//COMMIT_MSG@9 PS2, Line 9: > nit: a unit test Done http://gerrit.cloudera.org:8080/#/c/9069/2//COMMIT_MSG@11 PS2, Line 11: add > nit: adds Done http://gerrit.cloudera.org:8080/#/c/9069/2//COMMIT_MSG@12 PS2, Line 12: additio > nit: additional Done http://gerrit.cloudera.org:8080/#/c/9069/2/src/kudu/integration-tests/delete_tablet-itest.cc File src/kudu/integration-tests/delete_tablet-itest.cc: http://gerrit.cloudera.org:8080/#/c/9069/2/src/kudu/integration-tests/delete_tablet-itest.cc@169 PS2, Line 169: Regression test for KUDU-2126 : Ensure that a DeleteTablet() RPC > how about: Done http://gerrit.cloudera.org:8080/#/c/9069/2/src/kudu/integration-tests/delete_tablet-itest.cc@191 PS2, Line 191: scoped_refptr tablet_replica; > Is this test flaky if you remove this ASSERT_EVENTUALLY? I don't think this Done http://gerrit.cloudera.org:8080/#/c/9069/2/src/kudu/tablet/tablet_metadata.h File src/kudu/tablet/tablet_metadata.h: http://gerrit.cloudera.org:8080/#/c/9069/2/src/kudu/tablet/tablet_metadata.h@379 PS2, Line 379: int flush_count_for_tests_; > This seems fine, but I wonder if you considered creating a metric for # of I did, but this method is just easier to implement and yet it achieves the goal that the unit test wants to do. http://gerrit.cloudera.org:8080/#/c/9069/2/src/kudu/tablet/tablet_metadata.cc File src/kudu/tablet/tablet_metadata.cc: http://gerrit.cloudera.org:8080/#/c/9069/2/src/kudu/tablet/tablet_metadata.cc@554 PS2, Line 554: l_flush.Unlock(); > I think this line should go inside TabletMetadata::ReplaceSuperBlockUnlocke Done http://gerrit.cloudera.org:8080/#/c/9069/2/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/9069/2/src/kudu/tserver/ts_tablet_manager.cc@797 PS2, Line 797: If a tablet i > No need to mention KUDU-2126 here. It should be mentioned in the regression Done http://gerrit.cloudera.org:8080/#/c/9069/2/src/kudu/tserver/ts_tablet_manager.cc@798 PS2, Line 798: > style nit: end comments with punctuation (i.e. a period). We follow the Goo Done -- To view, visit http://gerrit.cloudera.org:8080/9069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idde8b413374f43ca2ef09339f0f412208f03685e Gerrit-Change-Number: 9069 Gerrit-PatchSet: 3 Gerrit-Owner: Jeffrey F. LukmanGerrit-Reviewer: Jeffrey F. Lukman Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Sat, 20 Jan 2018 02:18:07 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2126: Add conditional check to prevent unnecessary fsyncs
Hello Mike Percy, Kudu Jenkins, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9069 to look at the new patch set (#3). Change subject: KUDU-2126: Add conditional check to prevent unnecessary fsyncs .. KUDU-2126: Add conditional check to prevent unnecessary fsyncs This patch includes a unit test to detect fsyncs that happened when a DeleteTablet RPC tries to tombstone an already tombstoned tablet. In order to fix the issue, this patch adds additional checking before executing another DeleteTabletData that leads to additional fsyncs. Change-Id: Idde8b413374f43ca2ef09339f0f412208f03685e --- M src/kudu/integration-tests/delete_tablet-itest.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tserver/ts_tablet_manager.cc 4 files changed, 69 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/9069/3 -- To view, visit http://gerrit.cloudera.org:8080/9069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idde8b413374f43ca2ef09339f0f412208f03685e Gerrit-Change-Number: 9069 Gerrit-PatchSet: 3 Gerrit-Owner: Jeffrey F. LukmanGerrit-Reviewer: Jeffrey F. Lukman Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2126: Add conditional check to prevent unnecessary fsyncs
Hello Mike Percy, Kudu Jenkins, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9069 to look at the new patch set (#2). Change subject: KUDU-2126: Add conditional check to prevent unnecessary fsyncs .. KUDU-2126: Add conditional check to prevent unnecessary fsyncs This patch includes unit test to detect fsyncs that happened when a DeleteTablet RPC tries to tombstone an already tombstoned tablet. In order to fix the issue, this patch add additional checking before executing another DeleteTabletData that leads to another fsyncs. Change-Id: Idde8b413374f43ca2ef09339f0f412208f03685e --- M src/kudu/integration-tests/delete_tablet-itest.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tserver/ts_tablet_manager.cc 4 files changed, 70 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/9069/2 -- To view, visit http://gerrit.cloudera.org:8080/9069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idde8b413374f43ca2ef09339f0f412208f03685e Gerrit-Change-Number: 9069 Gerrit-PatchSet: 2 Gerrit-Owner: Jeffrey F. LukmanGerrit-Reviewer: Jeffrey F. Lukman Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Kudu-2126: Add conditional check to prevent unnecessary fsyncs
Jeffrey F. Lukman has posted comments on this change. ( http://gerrit.cloudera.org:8080/9069 ) Change subject: Kudu-2126: Add conditional check to prevent unnecessary fsyncs .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/9069/1/src/kudu/tablet/metadata.proto File src/kudu/tablet/metadata.proto: http://gerrit.cloudera.org:8080/#/c/9069/1/src/kudu/tablet/metadata.proto@66 PS1, Line 66: // A tablet state is set to TABLET_DATA_DELETED when a tablet is permanently deleted : // from all tablet servers and this state is a terminal state. : // Therefore, the tablet should not be allowed to transition back to any other state. : > I think the bit about it being a "roll-forward" state is useful. Particular I will remove this change for now. If this change is still needed, Mike will submit a separate change later. -- To view, visit http://gerrit.cloudera.org:8080/9069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idde8b413374f43ca2ef09339f0f412208f03685e Gerrit-Change-Number: 9069 Gerrit-PatchSet: 1 Gerrit-Owner: Jeffrey F. LukmanGerrit-Reviewer: Jeffrey F. Lukman Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 19 Jan 2018 01:30:21 + Gerrit-HasComments: Yes
[kudu-CR] Kudu-2126: Add conditional check to prevent unnecessary fsyncs
Jeffrey F. Lukman has posted comments on this change. ( http://gerrit.cloudera.org:8080/9069 ) Change subject: Kudu-2126: Add conditional check to prevent unnecessary fsyncs .. Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/9069/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9069/1//COMMIT_MSG@7 PS1, Line 7: Kudu-2126: Add conditional check to prevent unnecessary fsyncs > nit: we always capitalize jira names like KUDU-2126 Done http://gerrit.cloudera.org:8080/#/c/9069/1/src/kudu/integration-tests/delete_tablet-itest.cc File src/kudu/integration-tests/delete_tablet-itest.cc: http://gerrit.cloudera.org:8080/#/c/9069/1/src/kudu/integration-tests/delete_tablet-itest.cc@168 PS1, Line 168: // Kudu-2126 : Ensure that DeleteTablet RPCs does not do double fsyncs. > nit: same thing about capitalization Done http://gerrit.cloudera.org:8080/#/c/9069/1/src/kudu/integration-tests/delete_tablet-itest.cc@183 PS1, Line 183: table > tablets Done http://gerrit.cloudera.org:8080/#/c/9069/1/src/kudu/tablet/tablet_metadata.cc File src/kudu/tablet/tablet_metadata.cc: http://gerrit.cloudera.org:8080/#/c/9069/1/src/kudu/tablet/tablet_metadata.cc@528 PS1, Line 528: flush_count_for_tests_++; > seems we should only increment this down below where we actually do the flu Done http://gerrit.cloudera.org:8080/#/c/9069/1/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: http://gerrit.cloudera.org:8080/#/c/9069/1/src/kudu/tserver/ts_tablet_manager.cc@797 PS1, Line 797: // Kudu-2126 : if tablet is already tombstoned, then a request to tombstone > nit: same formatting Done -- To view, visit http://gerrit.cloudera.org:8080/9069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idde8b413374f43ca2ef09339f0f412208f03685e Gerrit-Change-Number: 9069 Gerrit-PatchSet: 1 Gerrit-Owner: Jeffrey F. LukmanGerrit-Reviewer: Jeffrey F. Lukman Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 19 Jan 2018 01:18:25 + Gerrit-HasComments: Yes
[kudu-CR] Kudu-2126: Add conditional check to prevent unnecessary fsyncs
Jeffrey F. Lukman has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9069 Change subject: Kudu-2126: Add conditional check to prevent unnecessary fsyncs .. Kudu-2126: Add conditional check to prevent unnecessary fsyncs This patch includes unit test to detect fsyncs that happened when a DeleteTablet RPC tries to tombstone an already tombstoned tablet. In order to fix the issue, this patch add additional checking before executing another DeleteTabletData that leads to another fsyncs. Furthermore, this patch also refine the TABLET_DATA_DELETED comment. Change-Id: Idde8b413374f43ca2ef09339f0f412208f03685e --- M src/kudu/integration-tests/delete_tablet-itest.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tserver/ts_tablet_manager.cc 5 files changed, 75 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/9069/1 -- To view, visit http://gerrit.cloudera.org:8080/9069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Idde8b413374f43ca2ef09339f0f412208f03685e Gerrit-Change-Number: 9069 Gerrit-PatchSet: 1 Gerrit-Owner: Jeffrey F. Lukman
[kudu-CR] KUDU-2208 Add RETRY ON EINTR() to Subprocess
Jeffrey F. Lukman has posted comments on this change. ( http://gerrit.cloudera.org:8080/9015 ) Change subject: KUDU-2208 Add RETRY_ON_EINTR() to Subprocess .. Patch Set 12: (4 comments) http://gerrit.cloudera.org:8080/#/c/9015/10/src/kudu/util/os-util.h File src/kudu/util/os-util.h: http://gerrit.cloudera.org:8080/#/c/9015/10/src/kudu/util/os-util.h@27 PS10, Line 27: > should probably #include here since EINTR is defined there Done http://gerrit.cloudera.org:8080/#/c/9015/10/src/kudu/util/subprocess-test.cc File src/kudu/util/subprocess-test.cc: http://gerrit.cloudera.org:8080/#/c/9015/10/src/kudu/util/subprocess-test.cc@338 PS10, Line 338: ASSERT_TRUE(err == 0 || err == ESRCH); > nit: I think this would read better as ASSERT_TRUE(err == 0 || err == ESRCH Done http://gerrit.cloudera.org:8080/#/c/9015/10/src/kudu/util/subprocess-test.cc@340 PS10, Line 340: LOG(INFO) << "Async kill signal failed with err=" << err << > can you add an ASSERT_TRUE(t_finished) here? we should only get ESRCH in th Done http://gerrit.cloudera.org:8080/#/c/9015/10/src/kudu/util/subprocess-test.cc@344 PS10, Line 344: // Add microseconds delay to make the unit test runs faster and more reliable > can you change this to rand() % 1? It still runs quickly on my machine with Done -- To view, visit http://gerrit.cloudera.org:8080/9015 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333 Gerrit-Change-Number: 9015 Gerrit-PatchSet: 12 Gerrit-Owner: Jeffrey F. LukmanGerrit-Reviewer: Jeffrey F. Lukman Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 17 Jan 2018 06:44:32 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2208 Add RETRY ON EINTR() to Subprocess
Hello Kudu Jenkins, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9015 to look at the new patch set (#12). Change subject: KUDU-2208 Add RETRY_ON_EINTR() to Subprocess .. KUDU-2208 Add RETRY_ON_EINTR() to Subprocess This patch submits a unit test that create a Subprocess thread and while it is starting and waiting, another thread sends kill signals to the Subprocess thread. Since the pthread_kill() signal is sent asynchronously, sometimes the pthread_kill() raises error signal 3. In that condition, the unit test logs the incident as an INFO. Prior to this bug fix patch, the Subprocess throws an exception because it cannot handle the kill signal in Wait() state. To fix the bug, we add RETRY_ON_EINTR() inside Subprocess::DoWait() function. Furthermore, this patch also moves the RETRY_ON_EINTR() function that occurs in many places across the code to os-util.h and it adds alarm reset in the end of TestReadFromStdoutAndStderr in Subprocess-test. Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333 --- M src/kudu/consensus/log_index.cc M src/kudu/util/env_posix.cc M src/kudu/util/net/socket.cc M src/kudu/util/os-util.h M src/kudu/util/subprocess-test.cc M src/kudu/util/subprocess.cc 6 files changed, 77 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/9015/12 -- To view, visit http://gerrit.cloudera.org:8080/9015 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333 Gerrit-Change-Number: 9015 Gerrit-PatchSet: 12 Gerrit-Owner: Jeffrey F. LukmanGerrit-Reviewer: Jeffrey F. Lukman Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2208 Add RETRY ON EINTR() to Subprocess
Jeffrey F. Lukman has abandoned this change. ( http://gerrit.cloudera.org:8080/9042 ) Change subject: KUDU-2208 Add RETRY_ON_EINTR() to Subprocess .. Abandoned Wrong commit -- To view, visit http://gerrit.cloudera.org:8080/9042 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: Iffabada54eb2359730e9e2973b263d71645fa572 Gerrit-Change-Number: 9042 Gerrit-PatchSet: 1 Gerrit-Owner: Jeffrey F. LukmanGerrit-Reviewer: Kudu Jenkins
[kudu-CR] KUDU-2208 Add RETRY ON EINTR() to Subprocess
Jeffrey F. Lukman has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9042 Change subject: KUDU-2208 Add RETRY_ON_EINTR() to Subprocess .. KUDU-2208 Add RETRY_ON_EINTR() to Subprocess This patch submits a unit test that create a Subprocess thread and while it is starting and waiting, another thread sends kill signals to the Subprocess thread. Since the pthread_kill() signal is sent asynchronously, sometimes the pthread_kill() raises error signal 3. In that condition, the unit test logs the incident as an INFO. Prior to this bug fix patch, the Subprocess throws an exception because it cannot handle the kill signal in Wait() state. To fix the bug, we add RETRY_ON_EINTR() inside Subprocess::DoWait() function. Furthermore, this patch also moves the RETRY_ON_EINTR() function that occurs in many places across the code to os-util.h and it adds alarm reset in the end of TestReadFromStdoutAndStderr in Subprocess-test Change-Id: Iffabada54eb2359730e9e2973b263d71645fa572 --- M src/kudu/util/os-util.h M src/kudu/util/subprocess-test.cc 2 files changed, 5 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/42/9042/1 -- To view, visit http://gerrit.cloudera.org:8080/9042 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Iffabada54eb2359730e9e2973b263d71645fa572 Gerrit-Change-Number: 9042 Gerrit-PatchSet: 1 Gerrit-Owner: Jeffrey F. Lukman
[kudu-CR] KUDU-2208 Add RETRY ON EINTR() to Subprocess
Hello Kudu Jenkins, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9015 to look at the new patch set (#10). Change subject: KUDU-2208 Add RETRY_ON_EINTR() to Subprocess .. KUDU-2208 Add RETRY_ON_EINTR() to Subprocess This patch submits a unit test that create a Subprocess thread and while it is starting and waiting, another thread sends kill signals to the Subprocess thread. Since the pthread_kill() signal is sent asynchronously, sometimes the pthread_kill() raises error signal 3. In that condition, the unit test logs the incident as an INFO. Prior to this bug fix patch, the Subprocess throws an exception because it cannot handle the kill signal in Wait() state. To fix the bug, we add RETRY_ON_EINTR() inside Subprocess::DoWait() function. Furthermore, this patch also moves the RETRY_ON_EINTR() function that occurs in many places across the code to os-util.h and it adds alarm reset in the end of TestReadFromStdoutAndStderr in Subprocess-test. Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333 --- M src/kudu/consensus/log_index.cc M src/kudu/util/env_posix.cc M src/kudu/util/net/socket.cc M src/kudu/util/os-util.h M src/kudu/util/subprocess-test.cc M src/kudu/util/subprocess.cc 6 files changed, 74 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/9015/10 -- To view, visit http://gerrit.cloudera.org:8080/9015 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333 Gerrit-Change-Number: 9015 Gerrit-PatchSet: 10 Gerrit-Owner: Jeffrey F. LukmanGerrit-Reviewer: Jeffrey F. Lukman Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2208 Add RETRY ON EINTR() to Subprocess
Hello Kudu Jenkins, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9015 to look at the new patch set (#9). Change subject: KUDU-2208 Add RETRY_ON_EINTR() to Subprocess .. KUDU-2208 Add RETRY_ON_EINTR() to Subprocess This patch submits a unit test that create a Subprocess thread and while it is starting and waiting, another thread sends kill signals to the Subprocess thread. Since the pthread_kill() signal is sent asynchronously, sometimes the pthread_kill() raises error signal 3. In that condition, the unit test logs the incident as an INFO. Prior to this bug fix patch, the Subprocess throws an exception because it cannot handle the kill signal in Wait() state. To fix the bug, we add RETRY_ON_EINTR() inside Subprocess::DoWait() function. Furthermore, this patch also moves the RETRY_ON_EINTR() function that occurs in many places across the code to os-util.h. Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333 --- M src/kudu/consensus/log_index.cc M src/kudu/util/env_posix.cc M src/kudu/util/net/socket.cc M src/kudu/util/os-util.h M src/kudu/util/subprocess-test.cc M src/kudu/util/subprocess.cc 6 files changed, 73 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/9015/9 -- To view, visit http://gerrit.cloudera.org:8080/9015 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333 Gerrit-Change-Number: 9015 Gerrit-PatchSet: 9 Gerrit-Owner: Jeffrey F. LukmanGerrit-Reviewer: Jeffrey F. Lukman Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2208 Add unit test to detect Subprocess Interruption Handling failure and patch to fix the bug
Hello Kudu Jenkins, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9015 to look at the new patch set (#8). Change subject: KUDU-2208 Add unit test to detect Subprocess Interruption Handling failure and patch to fix the bug .. KUDU-2208 Add unit test to detect Subprocess Interruption Handling failure and patch to fix the bug This patch submits a unit test that create a Subprocess thread and while it is starting and waiting, another thread sends kill signals to the Subprocess thread. Since the pthread_kill() signal is sent asynchronously, sometimes the pthread_kill() raises error signal 3. In that condition, the unit test logs the incident as an INFO. Prior to this bug fix patch, the Subprocess throws an exception because it cannot handle the kill signal in Wait() state. To fix the bug, we add RETRY_ON_EINTR() inside Subprocess::DoWait() function. Furthermore, this patch also moves the RETRY_ON_EINTR() function that occurs in many places across the code to os-util.h. Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333 --- M src/kudu/consensus/log_index.cc M src/kudu/util/env_posix.cc M src/kudu/util/net/socket.cc M src/kudu/util/os-util.h M src/kudu/util/subprocess-test.cc M src/kudu/util/subprocess.cc 6 files changed, 69 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/9015/8 -- To view, visit http://gerrit.cloudera.org:8080/9015 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333 Gerrit-Change-Number: 9015 Gerrit-PatchSet: 8 Gerrit-Owner: Jeffrey F. LukmanGerrit-Reviewer: Jeffrey F. Lukman Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2208 Add unit test to detect Subprocess Interruption Handling failure and patch to fix the bug
Hello Kudu Jenkins, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9015 to look at the new patch set (#7). Change subject: KUDU-2208 Add unit test to detect Subprocess Interruption Handling failure and patch to fix the bug .. KUDU-2208 Add unit test to detect Subprocess Interruption Handling failure and patch to fix the bug This patch submits a unit test that create a Subprocess thread and while it is starting and waiting, another thread sends kill signals to the Subprocess thread. Since the pthread_kill() signal is sent asynchronously, sometimes the pthread_kill() raises error signal 3. In that condition, the unit test logs the incident as an INFO. Prior to this bug fix patch, the Subprocess throws an exception because it cannot handle the kill signal in Wait() state. To fix the bug, we add RETRY_ON_EINTR() inside Subprocess::DoWait() function. Furthermore, this patch also moves the RETRY_ON_EINTR() function that occurs in many places across the code to os-util.h. Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333 --- M src/kudu/consensus/log_index.cc M src/kudu/util/env_posix.cc M src/kudu/util/net/socket.cc M src/kudu/util/os-util.h M src/kudu/util/subprocess-test.cc M src/kudu/util/subprocess.cc 6 files changed, 65 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/9015/7 -- To view, visit http://gerrit.cloudera.org:8080/9015 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333 Gerrit-Change-Number: 9015 Gerrit-PatchSet: 7 Gerrit-Owner: Jeffrey F. LukmanGerrit-Reviewer: Jeffrey F. Lukman Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2208 Add unit test to detect Subprocess Interruption Handling failure and patch to fix the bug
Jeffrey F. Lukman has posted comments on this change. ( http://gerrit.cloudera.org:8080/9015 ) Change subject: KUDU-2208 Add unit test to detect Subprocess Interruption Handling failure and patch to fix the bug .. Patch Set 6: (5 comments) http://gerrit.cloudera.org:8080/#/c/9015/5/src/kudu/util/os-util.h File src/kudu/util/os-util.h: http://gerrit.cloudera.org:8080/#/c/9015/5/src/kudu/util/os-util.h@33 PS5, Line 33: #define RETRY_ON_EINTR(err, expr) do { \ > need to #include in this header so that we get access to std: Done http://gerrit.cloudera.org:8080/#/c/9015/5/src/kudu/util/subprocess-test.cc File src/kudu/util/subprocess-test.cc: http://gerrit.cloudera.org:8080/#/c/9015/5/src/kudu/util/subprocess-test.cc@28 PS5, Line 28: #include > per Alexey's comment please separate out the C includes from the C++ ones h Done http://gerrit.cloudera.org:8080/#/c/9015/5/src/kudu/util/subprocess-test.cc@312 PS5, Line 312: t_started = true; > still think we should add a short sleep here so that we ensure that the sig Done http://gerrit.cloudera.org:8080/#/c/9015/5/src/kudu/util/subprocess-test.cc@313 PS5, Line 313: SleepFor(MonoDelta::FromMilliseconds(50)); > nit: we have to use CHECK_OK here because gtest isn't thread-safe to run as Done http://gerrit.cloudera.org:8080/#/c/9015/5/src/kudu/util/subprocess-test.cc@332 PS5, Line 332: ASSERT_TRUE(pthread_kill(t, SIGUSR2) == 0); > but the "user defined signal 2 error" is the bug we're trying to provoke he Done -- To view, visit http://gerrit.cloudera.org:8080/9015 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333 Gerrit-Change-Number: 9015 Gerrit-PatchSet: 6 Gerrit-Owner: Jeffrey F. LukmanGerrit-Reviewer: Jeffrey F. Lukman Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 12 Jan 2018 23:51:13 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2208 Add unit test to detect Subprocess Interruption Handling failure and patch to fix the bug
Hello Kudu Jenkins, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9015 to look at the new patch set (#6). Change subject: KUDU-2208 Add unit test to detect Subprocess Interruption Handling failure and patch to fix the bug .. KUDU-2208 Add unit test to detect Subprocess Interruption Handling failure and patch to fix the bug This patch submits a unit test that create a Subprocess thread and while it is starting and waiting, another thread sends kill signals to the Subprocess thread. Prior to the bug fix in this patch, the Subprocess throws an exception because it cannot handle the kill signal in Wait() state. The bug fix adds RETRY_ON_EINTR() at Subprocess::DoWait() function. Furthermore, this patch also moves the RETRY_ON_EINTR() function that occurs in many places across the code to os-util.h. Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333 --- M src/kudu/consensus/log_index.cc M src/kudu/util/env_posix.cc M src/kudu/util/net/socket.cc M src/kudu/util/os-util.h M src/kudu/util/subprocess-test.cc M src/kudu/util/subprocess.cc 6 files changed, 60 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/9015/6 -- To view, visit http://gerrit.cloudera.org:8080/9015 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333 Gerrit-Change-Number: 9015 Gerrit-PatchSet: 6 Gerrit-Owner: Jeffrey F. LukmanGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Kudu-2208 Avoid system call interruption in Subprocess
Jeffrey F. Lukman has posted comments on this change. ( http://gerrit.cloudera.org:8080/9011 ) Change subject: Kudu-2208 Avoid system call interruption in Subprocess .. Patch Set 1: (12 comments) http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc File src/kudu/util/subprocess-test.cc: http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc@301 PS1, Line 301: thd > nit: either abbreviate to just 't' or 'thr' would be more in line with what Done http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc@302 PS1, Line 302: 0.5 > nit: why not just '1'? Using non-integer numbers for /bin/sleep is conside Done http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc@303 PS1, Line 303: subprocessThread > nit: we use snake_case style in C++ Done http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc@305 PS1, Line 305: ASSERT_OK(p.Start()); > I think adding a short sleep here (eg 50ms) is a good idea to ensure that t Done http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc@310 PS1, Line 310: // create signal > you aren't creating a signal here, but rather setting up a signal handler Done http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc@317 PS1, Line 317: // send at most 10 kill signals to subprocess > why at most 10? Why not loop until we see that the subprocess has exited? I Done http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc@319 PS1, Line 319: int signalCounter = 0; : bool hasError = false; : > nit: snake_case Done http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc@322 PS1, Line 322: if (signalCounter > 10 || hasError) break; > why not just put this into the while condition above? Done http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc@325 PS1, Line 325: hasError = pthread_kill(thd, SIGUSR2) == 0; > do we expect any errors from this? if not, could we just ASSERT this? Done http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess-test.cc@330 PS1, Line 330: SleepFor(MonoDelta::FromMilliseconds(10)); > why sleep in between? if our goal is to provoke races we want to send as ma I added sleep time here to avoid undefined signal 2 exception. http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess.cc File src/kudu/util/subprocess.cc: http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess.cc@67 PS1, Line 67: // Retry on EINTR for functions like read() that return -1 on error. > can we move this function to a utility header since it's used in a few plac Done http://gerrit.cloudera.org:8080/#/c/9011/1/src/kudu/util/subprocess.cc@69 PS1, Line 69: == true > nit: can we drop this extra? Done -- To view, visit http://gerrit.cloudera.org:8080/9011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I66ab2ec391eced5ea1c37d4294d151fe03c7d586 Gerrit-Change-Number: 9011 Gerrit-PatchSet: 1 Gerrit-Owner: Jeffrey F. LukmanGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 12 Jan 2018 19:30:59 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2208 Add unit test to detect Subprocess Interruption Handling failure and patch to fix the bug
Hello Kudu Jenkins, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9015 to look at the new patch set (#5). Change subject: KUDU-2208 Add unit test to detect Subprocess Interruption Handling failure and patch to fix the bug .. KUDU-2208 Add unit test to detect Subprocess Interruption Handling failure and patch to fix the bug This patch submits a unit test that create a Subprocess thread and while it is starting and waiting, another thread sends kill signals to the Subprocess thread. Prior to the bug fix in this patch, the Subprocess throws an exception because it cannot handle the kill signal in Wait() state. The bug fix adds RETRY_ON_EINTR() at Subprocess::DoWait() function. Furthermore, this patch also moves the RETRY_ON_EINTR() function that occurs in many places across the code to os-util.h. Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333 --- M src/kudu/consensus/log_index.cc M src/kudu/util/env_posix.cc M src/kudu/util/net/socket.cc M src/kudu/util/os-util.h M src/kudu/util/subprocess-test.cc M src/kudu/util/subprocess.cc 6 files changed, 60 insertions(+), 21 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/9015/5 -- To view, visit http://gerrit.cloudera.org:8080/9015 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333 Gerrit-Change-Number: 9015 Gerrit-PatchSet: 5 Gerrit-Owner: Jeffrey F. LukmanGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2208 Add unit test to detect Subprocess Interruption Handling failure and patch to fix the bug
Hello Kudu Jenkins, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9015 to look at the new patch set (#3). Change subject: KUDU-2208 Add unit test to detect Subprocess Interruption Handling failure and patch to fix the bug .. KUDU-2208 Add unit test to detect Subprocess Interruption Handling failure and patch to fix the bug This patch submits a unit test that create a Subprocess thread and while it is starting and waiting, another thread sends kill signals to the Subprocess thread. Prior to the bug fix in this patch, the Subprocess throws an exception because it cannot handle the kill signal in Wait() state. The bug fix adds RETRY_ON_EINTR() at Subprocess::DoWait() function. Furthermore, this patch also moves the RETRY_ON_EINTR() function that occurs in many places across the code to os-util.h. Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333 --- M src/kudu/consensus/log_index.cc M src/kudu/util/env_posix.cc M src/kudu/util/net/socket.cc M src/kudu/util/os-util.h M src/kudu/util/subprocess-test.cc M src/kudu/util/subprocess.cc 6 files changed, 57 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/9015/3 -- To view, visit http://gerrit.cloudera.org:8080/9015 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333 Gerrit-Change-Number: 9015 Gerrit-PatchSet: 3 Gerrit-Owner: Jeffrey F. LukmanGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2208 Add unit test to detect Subprocess Interruption Handling failure and patch to fix the bug
Hello Kudu Jenkins, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9015 to look at the new patch set (#2). Change subject: KUDU-2208 Add unit test to detect Subprocess Interruption Handling failure and patch to fix the bug .. KUDU-2208 Add unit test to detect Subprocess Interruption Handling failure and patch to fix the bug This patch submits a unit test that create a Subprocess thread and while it is starting and waiting, another thread sends kill signals to the Subprocess thread. Prior to the bug fix in this patch, the Subprocess throws an exception because it cannot handle the kill signal in Wait() state. The bug fix adds RETRY_ON_EINTR() at Subprocess::DoWait() function. Furthermore, this patch also moves the RETRY_ON_EINTR() function that occurs in many places across the code to os-util.h. Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333 --- M src/kudu/consensus/log_index.cc M src/kudu/util/env_posix.cc M src/kudu/util/net/socket.cc M src/kudu/util/os-util.h M src/kudu/util/subprocess-test.cc M src/kudu/util/subprocess.cc 6 files changed, 57 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/9015/2 -- To view, visit http://gerrit.cloudera.org:8080/9015 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333 Gerrit-Change-Number: 9015 Gerrit-PatchSet: 2 Gerrit-Owner: Jeffrey F. LukmanGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2208 Refine unit test and move RETRY ON EINTR() to os-util.h
Jeffrey F. Lukman has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9015 Change subject: KUDU-2208 Refine unit test and move RETRY_ON_EINTR() to os-util.h .. KUDU-2208 Refine unit test and move RETRY_ON_EINTR() to os-util.h This patch refines some nits and also change unit test scenario, such that the send kill signals are sent continously without any delay until the kill signals succeeded thousands of times. In the buggy code, the subprocess_thread will throws Interrupted system call error. Furthermore, this bug also move the RETRY_ON_EINTR() function that occurs in many places across the system into the os-util.h. Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333 --- M src/kudu/consensus/log_index.cc M src/kudu/util/env_posix.cc M src/kudu/util/net/socket.cc M src/kudu/util/os-util.h M src/kudu/util/subprocess-test.cc M src/kudu/util/subprocess.cc 6 files changed, 25 insertions(+), 45 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/9015/1 -- To view, visit http://gerrit.cloudera.org:8080/9015 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I148b4619a8dda8e3e95dd6ea5c6e993a9e37a333 Gerrit-Change-Number: 9015 Gerrit-PatchSet: 1 Gerrit-Owner: Jeffrey F. Lukman
[kudu-CR] Kudu-2208 Avoid system call interruption in Subprocess
Jeffrey F. Lukman has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9011 Change subject: Kudu-2208 Avoid system call interruption in Subprocess .. Kudu-2208 Avoid system call interruption in Subprocess This patch includes additional unit test to detect this bug. To fix the issue, this patch added RETRY_ON_EINTR in Subprocess::DoWait() if the waitpid() return -1. Change-Id: I66ab2ec391eced5ea1c37d4294d151fe03c7d586 --- M src/kudu/util/subprocess-test.cc M src/kudu/util/subprocess.cc 2 files changed, 52 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/11/9011/1 -- To view, visit http://gerrit.cloudera.org:8080/9011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I66ab2ec391eced5ea1c37d4294d151fe03c7d586 Gerrit-Change-Number: 9011 Gerrit-PatchSet: 1 Gerrit-Owner: Jeffrey F. Lukman