[kudu-CR] catalog manager: various boring cleanup
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7990 to look at the new patch set (#2). Change subject: catalog_manager: various boring cleanup .. catalog_manager: various boring cleanup Change-Id: I028611361ae7d5ce2818707c203c045dbce294c6 --- M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h 2 files changed, 51 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/90/7990/2 -- To view, visit http://gerrit.cloudera.org:8080/7990 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I028611361ae7d5ce2818707c203c045dbce294c6 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [iwyu tool.py] fix on the IWYU tool invocation
Alexey Serbin has posted comments on this change. Change subject: [iwyu_tool.py] fix on the IWYU tool invocation .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7989 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifa4284857583fff6543f65a9c2a71a9e445f39e2 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] [iwyu tool.py] fix on the IWYU tool invocation
Alexey Serbin has posted comments on this change. Change subject: [iwyu_tool.py] fix on the IWYU tool invocation .. Patch Set 4: Verified+1 Unrelated flake in AlterTableTest.TestAddRangePartitionConflictExhaustive. I filed a new JIRA item for that: https://issues.apache.org/jira/browse/KUDU-2137 -- To view, visit http://gerrit.cloudera.org:8080/7989 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifa4284857583fff6543f65a9c2a71a9e445f39e2 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] [iwyu tool.py] fix on the IWYU tool invocation
Alexey Serbin has submitted this change and it was merged. Change subject: [iwyu_tool.py] fix on the IWYU tool invocation .. [iwyu_tool.py] fix on the IWYU tool invocation Fixed the way include-what-you-use binary is invoked by the iwyu_tool.py wrapper script. Prior to this fix, subprocess.Popen() had the 'shell' argument set to 'True', so any issue with starting the binary was silenced because there would be no OSError exception raised. Overall, if using 'shell=True' for subprocess.Popen(), it wouldn't be feasible to detect that sort of error via examining the exit code of the started process because include-what-you-use always returns exit code 1 regardless of its findings. I also updated the iwyu.sh script to be more robust in handling possible errors, if any. Prior to this fix, the 'iwyu' CMake target returned success if include-what-you-use binary was not in place. This patch fixes that problem and introduces more robust handling of other possible issues while working with the include-what-you-use tool. Change-Id: Ifa4284857583fff6543f65a9c2a71a9e445f39e2 Reviewed-on: http://gerrit.cloudera.org:8080/7989 Reviewed-by: Alexey SerbinTested-by: Alexey Serbin --- M build-support/iwyu/iwyu.sh M build-support/iwyu/iwyu_tool.py 2 files changed, 30 insertions(+), 16 deletions(-) Approvals: Alexey Serbin: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/7989 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ifa4284857583fff6543f65a9c2a71a9e445f39e2 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [iwyu tool.py] fix on the IWYU tool invocation
Alexey Serbin has posted comments on this change. Change subject: [iwyu_tool.py] fix on the IWYU tool invocation .. Patch Set 4: Code-Review+2 -Verified Carrying over Adar's +2 from PS3. -- To view, visit http://gerrit.cloudera.org:8080/7989 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifa4284857583fff6543f65a9c2a71a9e445f39e2 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-2124. Don't hold session lock while initializing a TabletCopySession
Dan Burkert has posted comments on this change. Change subject: KUDU-2124. Don't hold session lock while initializing a TabletCopySession .. Patch Set 1: (7 comments) http://gerrit.cloudera.org:8080/#/c/7985/1/src/kudu/tserver/tablet_copy_service.cc File src/kudu/tserver/tablet_copy_service.cc: Line 171: RPC_RETURN_NOT_OK(session->Init(), > It would be nice to remember if we were the thread to insert the session in Done Line 249: RPC_RETURN_NOT_OK(session->Init(), > Instead of waiting, we could simply reject the request if (!initted()) Done http://gerrit.cloudera.org:8080/#/c/7985/1/src/kudu/tserver/tablet_copy_source_session.cc File src/kudu/tserver/tablet_copy_source_session.cc: Line 166: shared_ptr reader = tablet_replica_->log()->reader(); > This is an existing bug, but we should take a ref to tablet_replica_->log() Hmm, I'm not seeing where the log_ field in TabletReplica is getting reset? The header docs for log() also don't mention this. Line 389: Status s; > no longer used Done http://gerrit.cloudera.org:8080/#/c/7985/1/src/kudu/tserver/tablet_copy_source_session.h File src/kudu/tserver/tablet_copy_source_session.h: PS1, Line 33: "kudu/gutil/integral_types.h" > Huh. I didn't know about this file. I've always used not sure, this is just a reorder. PS1, Line 192: scoped_refptr > nit: This can be made const (I know you haven't worked on this code before Done Line 197: KuduOnceDynamic init_once_; > Would be nice to add a comment noting that once the object is initialized, Added something to that effect. -- To view, visit http://gerrit.cloudera.org:8080/7985 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If8bd295a59ade8c89fdf1853dd64c4bceca8da91 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: Yes
[kudu-CR] KUDU-2124. Don't hold session lock while initializing a TabletCopySession
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7985 to look at the new patch set (#2). Change subject: KUDU-2124. Don't hold session lock while initializing a TabletCopySession .. KUDU-2124. Don't hold session lock while initializing a TabletCopySession This patch replaces the interior mutex in TabletCopySourceSession with a dynamic once, so that initialization can happen concurrently. This allows initialization to happen outside of the critical section in the tablet copy service. Implemented a regression test that injects latency into BeginTabletCopySession() calls. Change-Id: If8bd295a59ade8c89fdf1853dd64c4bceca8da91 --- M src/kudu/integration-tests/cluster_itest_util.cc M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/tserver/tablet_copy_service-test.cc M src/kudu/tserver/tablet_copy_service.cc M src/kudu/tserver/tablet_copy_source_session.cc M src/kudu/tserver/tablet_copy_source_session.h M src/kudu/tserver/tablet_server-test-base.h M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/tablet_server_test_util.cc M src/kudu/tserver/tablet_server_test_util.h 11 files changed, 241 insertions(+), 63 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/85/7985/2 -- To view, visit http://gerrit.cloudera.org:8080/7985 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: If8bd295a59ade8c89fdf1853dd64c4bceca8da91 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy
[kudu-CR] [iwyu tool.py] fix on the IWYU tool invocation
Hello Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7989 to look at the new patch set (#4). Change subject: [iwyu_tool.py] fix on the IWYU tool invocation .. [iwyu_tool.py] fix on the IWYU tool invocation Fixed the way include-what-you-use binary is invoked by the iwyu_tool.py wrapper script. Prior to this fix, subprocess.Popen() had the 'shell' argument set to 'True', so any issue with starting the binary was silenced because there would be no OSError exception raised. Overall, if using 'shell=True' for subprocess.Popen(), it wouldn't be feasible to detect that sort of error via examining the exit code of the started process because include-what-you-use always returns exit code 1 regardless of its findings. I also updated the iwyu.sh script to be more robust in handling possible errors, if any. Prior to this fix, the 'iwyu' CMake target returned success if include-what-you-use binary was not in place. This patch fixes that problem and introduces more robust handling of other possible issues while working with the include-what-you-use tool. Change-Id: Ifa4284857583fff6543f65a9c2a71a9e445f39e2 --- M build-support/iwyu/iwyu.sh M build-support/iwyu/iwyu_tool.py 2 files changed, 30 insertions(+), 16 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/89/7989/4 -- To view, visit http://gerrit.cloudera.org:8080/7989 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifa4284857583fff6543f65a9c2a71a9e445f39e2 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [iwyu tool.py] fix on the IWYU tool invocation
Adar Dembo has posted comments on this change. Change subject: [iwyu_tool.py] fix on the IWYU tool invocation .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7989 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifa4284857583fff6543f65a9c2a71a9e445f39e2 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] catalog manager: various boring cleanup
Hello Dan Burkert, Todd Lipcon, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/7990 to review the following change. Change subject: catalog_manager: various boring cleanup .. catalog_manager: various boring cleanup Change-Id: I028611361ae7d5ce2818707c203c045dbce294c6 --- M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h 2 files changed, 49 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/90/7990/1 -- To view, visit http://gerrit.cloudera.org:8080/7990 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I028611361ae7d5ce2818707c203c045dbce294c6 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [iwyu tool.py] fix on the IWYU tool invocation
Alexey Serbin has posted comments on this change. Change subject: [iwyu_tool.py] fix on the IWYU tool invocation .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/7989/2//COMMIT_MSG Commit Message: PS2, Line 9: how > Nit: don't need Done PS2, Line 14: errors > error Done -- To view, visit http://gerrit.cloudera.org:8080/7989 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifa4284857583fff6543f65a9c2a71a9e445f39e2 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [iwyu tool.py] fix on the IWYU tool invocation
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7989 to look at the new patch set (#3). Change subject: [iwyu_tool.py] fix on the IWYU tool invocation .. [iwyu_tool.py] fix on the IWYU tool invocation Fixed the way include-what-you-use binary is invoked by the iwyu_tool.py wrapper script. Prior to this fix, subprocess.Popen() had the 'shell' argument set to 'True', so any issue with starting the binary was silenced because there would be no OSError exception raised. Overall, if using 'shell=True' for subprocess.Popen(), it wouldn't be feasible to detect that sort of error via examining the exit code of the started process because include-what-you-use always returns exit code 1 regardless of its findings. I also updated the iwyu.sh script to be more robust in handling possible errors, if any. Prior to this fix, the 'iwyu' CMake target returned success if include-what-you-use binary was not in place. This patch fixes that problem and introduces more robust handling of other possible issues while working with the include-what-you-use tool. Change-Id: Ifa4284857583fff6543f65a9c2a71a9e445f39e2 --- M build-support/iwyu/iwyu.sh M build-support/iwyu/iwyu_tool.py 2 files changed, 24 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/89/7989/3 -- To view, visit http://gerrit.cloudera.org:8080/7989 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifa4284857583fff6543f65a9c2a71a9e445f39e2 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR](branch-1.5.x) KUDU-2130: java client: handle termination during negotiation edge case
Dan Burkert has submitted this change and it was merged. Change subject: KUDU-2130: java client: handle termination during negotiation edge case .. KUDU-2130: java client: handle termination during negotiation edge case There was an edge case where a Connection can be terminated while negotiation is completing. This would result in a scary looking log message: 18:24:07.776 [DEBUG - New I/O worker #8112] (Connection.java:649) [peer master-127.32.133.1:64032] cleaning up while in state NEGOTIATING due to: connection disconnected 18:24:07.781 [ERROR - New I/O worker #8112] (Connection.java:418) [peer master-127.32.133.1:64032] unexpected exception from downstream on [id: 0xdd52bacc, /127.0.0.1:55318 :> /127.32.133.1:64032] java.lang.NullPointerException at org.apache.kudu.client.Connection.messageReceived(Connection.java:271) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.apache.kudu.client.Connection.handleUpstream(Connection.java:236) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) but in reality the error message is harmless; it just indicates that the connection has been terminated while the connection's messageReceived handler is clearing the message queue. This interruption is possible because of 82a8e9f99, which fixed a deadlock in Connection. This commit recognizes when this race has occured, and early exits from messageReceived. Change-Id: I3e9d4a6535ae82973743e4ac1071de0aac92b08b Reviewed-on: http://gerrit.cloudera.org:8080/7960 Reviewed-by: Alexey SerbinTested-by: Kudu Jenkins (cherry picked from commit f0aa3b3f194146760597e6ab88c304c6f408073c) Reviewed-on: http://gerrit.cloudera.org:8080/7978 --- M java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java 1 file changed, 9 insertions(+), 4 deletions(-) Approvals: Alexey Serbin: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7978 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I3e9d4a6535ae82973743e4ac1071de0aac92b08b Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: branch-1.5.x Gerrit-Owner: Dan Burkert Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] test workload: make table creation more robust
Adar Dembo has submitted this change and it was merged. Change subject: test_workload: make table creation more robust .. test_workload: make table creation more robust The table creation here already works around the lack of Exactly Once semantics by considering certain kinds of failures to mean success. However, these failures imply that the client didn't finish waiting for the table to be created, so we need to do that ourselves. Without the patch, ClientStressTest_MultiMaster.TestLeaderResolutionTimeout failed 1/1000 times in DEBUG mode. With it, there were no failures. Change-Id: Ic1e76e502359b499466cfa90d21ac22f35928261 Reviewed-on: http://gerrit.cloudera.org:8080/7982 Tested-by: Kudu Jenkins Reviewed-by: Alexey SerbinReviewed-by: Hao Hao --- M src/kudu/integration-tests/test_workload.cc 1 file changed, 20 insertions(+), 3 deletions(-) Approvals: Hao Hao: Looks good to me, but someone else must approve Alexey Serbin: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ic1e76e502359b499466cfa90d21ac22f35928261 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [iwyu tool.py] fix on the IWYU tool invocation
Adar Dembo has posted comments on this change. Change subject: [iwyu_tool.py] fix on the IWYU tool invocation .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/7989/2//COMMIT_MSG Commit Message: PS2, Line 9: how Nit: don't need PS2, Line 14: errors error -- To view, visit http://gerrit.cloudera.org:8080/7989 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifa4284857583fff6543f65a9c2a71a9e445f39e2 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [iwyu tool.py] fix on the IWYU tool invocation
Alexey Serbin has uploaded a new patch set (#2). Change subject: [iwyu_tool.py] fix on the IWYU tool invocation .. [iwyu_tool.py] fix on the IWYU tool invocation Fixed the way how include-what-you-use binary is invoked by the iwyu_tool.py wrapper script. Prior to this fix, subprocess.Popen() had the 'shell' argument set to 'True', so any issue with starting the binary was silenced because there would be no OSError exception raised. Overall, if using 'shell=True' for subprocess.Popen(), it wouldn't be feasible to detect that sort of errors via examining the exit code of the started process because include-what-you-use always returns exit code 1 regardless of its findings. I also updated the iwyu.sh script to be more robust in handling possible errors, if any. Prior to this fix, the 'iwyu' CMake target returned success if include-what-you-use binary was not in place. This patch fixes that problem and introduces more robust handling of other possible issues while working with the include-what-you-use tool. Change-Id: Ifa4284857583fff6543f65a9c2a71a9e445f39e2 --- M build-support/iwyu/iwyu.sh M build-support/iwyu/iwyu_tool.py 2 files changed, 24 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/89/7989/2 -- To view, visit http://gerrit.cloudera.org:8080/7989 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifa4284857583fff6543f65a9c2a71a9e445f39e2 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [iwyu tool.py] fix on the IWYU tool invocation
Alexey Serbin has uploaded a new change for review. http://gerrit.cloudera.org:8080/7989 Change subject: [iwyu_tool.py] fix on the IWYU tool invocation .. [iwyu_tool.py] fix on the IWYU tool invocation Fixed the way how the include-what-you-use binary is invoked by the iwyu_tool.py wrapper script. Prior to this fix, the subprocess.Popen() had the 'shell' argument set to 'True', so any issue with starting the binary was silenced because there would be no OSError exception raised. Overall, if using 'shell=True' for subprocess.Popen(), it wouldn't be feasible to detect that sort of errors via examining the exit code of the started process because the include-what-you-use binary always returns exit code 1 regardless of its findings. I also updated the iwyu.sh script to be more robust in handling possible errors, if any. Prior to this fix, the 'iwyu' CMake target returned success if the include-what-you-use binary was not in place. E.g., Todd recently found that running 'ninja iwyu' reported no errors when the include-what-you-use tool was not present. The same behavior was observed with 'make iwyu' as well. This patch fixes that problem and introduces more robust handling of other possible issues while working with the include-what-you-use tool. Change-Id: Ifa4284857583fff6543f65a9c2a71a9e445f39e2 --- M build-support/iwyu/iwyu.sh M build-support/iwyu/iwyu_tool.py 2 files changed, 24 insertions(+), 13 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/89/7989/1 -- To view, visit http://gerrit.cloudera.org:8080/7989 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ifa4284857583fff6543f65a9c2a71a9e445f39e2 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin
[kudu-CR](branch-1.5.x) KUDU-2130 (part 2): more fixes for ITClientStress
Dan Burkert has submitted this change and it was merged. Change subject: KUDU-2130 (part 2): more fixes for ITClientStress .. KUDU-2130 (part 2): more fixes for ITClientStress This fixes some more race conditions in connection termination in the same vein as part 1. It also filters benign SSLException from being returned back to callers. Change-Id: Ic3f518513931c660fd93b4272d1b1fceb268f191 Reviewed-on: http://gerrit.cloudera.org:8080/7964 Reviewed-by: Alexey SerbinTested-by: Kudu Jenkins (cherry picked from commit f41a5c2b3afc8b4703c8047b347490b24c19) Reviewed-on: http://gerrit.cloudera.org:8080/7979 --- M java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java M java/kudu-client/src/test/java/org/apache/kudu/client/ITClientStress.java 2 files changed, 22 insertions(+), 8 deletions(-) Approvals: Alexey Serbin: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ic3f518513931c660fd93b4272d1b1fceb268f191 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: branch-1.5.x Gerrit-Owner: Dan Burkert Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] web ui: fix "percentage consumed" calculation
Dan Burkert has posted comments on this change. Change subject: web ui: fix "percentage consumed" calculation .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I07d5b7d5f44548120a9b31bfef43e23051e27d8e Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] WIP: Auto-vivify cmeta if doesn't exist at startup
Hello Todd Lipcon, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/7988 to review the following change. Change subject: WIP: Auto-vivify cmeta if doesn't exist at startup .. WIP: Auto-vivify cmeta if doesn't exist at startup Change-Id: I8ff6255b1fcbb12417b82853bcde9b239291492b --- M src/kudu/consensus/consensus_meta_manager.cc M src/kudu/consensus/consensus_meta_manager.h M src/kudu/consensus/raft_consensus.cc 3 files changed, 27 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/88/7988/1 -- To view, visit http://gerrit.cloudera.org:8080/7988 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I8ff6255b1fcbb12417b82853bcde9b239291492b Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Todd Lipcon
[kudu-CR] web ui: fix "percentage consumed" calculation
Hello Dan Burkert, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/7987 to review the following change. Change subject: web ui: fix "percentage consumed" calculation .. web ui: fix "percentage consumed" calculation There was a misplaced cast, so the division of consumption by limit was using an integer rather than floating point calculation. This results in the "percentage consumed" always showing 0. Change-Id: I07d5b7d5f44548120a9b31bfef43e23051e27d8e --- M src/kudu/server/default-path-handlers.cc 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/87/7987/1 -- To view, visit http://gerrit.cloudera.org:8080/7987 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I07d5b7d5f44548120a9b31bfef43e23051e27d8e Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Dan Burkert
[kudu-CR] tablet copy: Allow voting from crashed initial tablet copies
Mike Percy has submitted this change and it was merged. Change subject: tablet copy: Allow voting from crashed initial tablet copies .. tablet copy: Allow voting from crashed initial tablet copies The change in c35e4f0093dcdec625d1f647924d854c7bc9f3de missed an important case, which is when the target tablet server in a tablet copy operation crashes while in the middle of a tablet copy. In that case, the 'tombstone_last_logged_opid' of 1.0 was not persisted in the superblock. This manifested as a very flaky TabletCopyITest.TestTabletCopyNewReplicaFailureCanVote test, which would fail about 15% of the time. With the change in this patch, that test now passes reliably: http://dist-test.cloudera.org/job?job_id=mpercy.1504654266.31446 Change-Id: Idbc9cecac701c24ebe05759adbbb1fbe2b140506 Reviewed-on: http://gerrit.cloudera.org:8080/7961 Reviewed-by: Todd LipconTested-by: Kudu Jenkins Reviewed-by: Alexey Serbin --- M src/kudu/consensus/consensus-test-util.h M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/tablet-harness.h M src/kudu/tablet/tablet_bootstrap-test.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tools/kudu-tool-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 11 files changed, 182 insertions(+), 35 deletions(-) Approvals: Todd Lipcon: Looks good to me, but someone else must approve Alexey Serbin: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7961 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Idbc9cecac701c24ebe05759adbbb1fbe2b140506 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2124. Don't hold session lock while initializing a TabletCopySession
Mike Percy has posted comments on this change. Change subject: KUDU-2124. Don't hold session lock while initializing a TabletCopySession .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7985/1/src/kudu/tserver/tablet_copy_source_session.h File src/kudu/tserver/tablet_copy_source_session.h: Line 194 It's nice how much simpler the lifecycle of TabletCopySourceSession is with this change. -- To view, visit http://gerrit.cloudera.org:8080/7985 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If8bd295a59ade8c89fdf1853dd64c4bceca8da91 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: Yes
[kudu-CR] KUDU-2124. Don't hold session lock while initializing a TabletCopySession
Mike Percy has posted comments on this change. Change subject: KUDU-2124. Don't hold session lock while initializing a TabletCopySession .. Patch Set 1: (7 comments) http://gerrit.cloudera.org:8080/#/c/7985/1/src/kudu/tserver/tablet_copy_service.cc File src/kudu/tserver/tablet_copy_service.cc: Line 171: RPC_RETURN_NOT_OK(session->Init(), It would be nice to remember if we were the thread to insert the session into the sessions_ map and, if not, reject the request if (!initted()) to free up handler thread capacity. Line 249: RPC_RETURN_NOT_OK(session->Init(), Instead of waiting, we could simply reject the request if (!initted()) http://gerrit.cloudera.org:8080/#/c/7985/1/src/kudu/tserver/tablet_copy_source_session.cc File src/kudu/tserver/tablet_copy_source_session.cc: Line 166: shared_ptr reader = tablet_replica_->log()->reader(); This is an existing bug, but we should take a ref to tablet_replica_->log() and ensure it's non-false before calling reader() on it, since TabletReplica::Stop() will reset it and this could cause a segfault. Line 389: Status s; no longer used http://gerrit.cloudera.org:8080/#/c/7985/1/src/kudu/tserver/tablet_copy_source_session.h File src/kudu/tserver/tablet_copy_source_session.h: PS1, Line 33: "kudu/gutil/integral_types.h" Huh. I didn't know about this file. I've always used PS1, Line 192: scoped_refptr nit: This can be made const (I know you haven't worked on this code before and didn't add this member) Line 197: KuduOnceDynamic init_once_; Would be nice to add a comment noting that once the object is initialized, all fields are thereafter const. -- To view, visit http://gerrit.cloudera.org:8080/7985 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If8bd295a59ade8c89fdf1853dd64c4bceca8da91 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: Yes
[kudu-CR] tablet copy: Allow voting from crashed initial tablet copies
Mike Percy has posted comments on this change. Change subject: tablet copy: Allow voting from crashed initial tablet copies .. Patch Set 4: I ran another 200x loop w/ 4 stress threads and I got 3 failures, all of which are unrelated to this patch or these tests: http://dist-test.cloudera.org/job?job_id=mpercy.1504738557.28828 So I think this test is fine as-is. -- To view, visit http://gerrit.cloudera.org:8080/7961 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idbc9cecac701c24ebe05759adbbb1fbe2b140506 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] tablet copy: Allow voting from crashed initial tablet copies
Alexey Serbin has posted comments on this change. Change subject: tablet copy: Allow voting from crashed initial tablet copies .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/7961/3/src/kudu/integration-tests/tablet_copy-itest.cc File src/kudu/integration-tests/tablet_copy-itest.cc: PS3, Line 1155: ASSERT_EVENTUALLY > That assumes that we expect the timeouts in this test to actually max out t SGTM -- thanks for explaining this! -- To view, visit http://gerrit.cloudera.org:8080/7961 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idbc9cecac701c24ebe05759adbbb1fbe2b140506 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] tablet copy: Allow voting from crashed initial tablet copies
Mike Percy has posted comments on this change. Change subject: tablet copy: Allow voting from crashed initial tablet copies .. Patch Set 3: > BTW, what was the setting for --stress_cpu_threads for that 200x > run? That one didn't have any stress threads. I'll run one with 4 stress threads and report back. -- To view, visit http://gerrit.cloudera.org:8080/7961 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idbc9cecac701c24ebe05759adbbb1fbe2b140506 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] tablet copy: Allow voting from crashed initial tablet copies
Mike Percy has posted comments on this change. Change subject: tablet copy: Allow voting from crashed initial tablet copies .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/7961/3/src/kudu/tserver/tablet_copy_client.cc File src/kudu/tserver/tablet_copy_client.cc: PS3, Line 401: RETURN_NOT_OK(meta_->ReplaceSuperBlock(*superblock_)); > Great -- thank you for the clarification. Did you find that some additiona I added an additional test to cover tablet copy over a tombstoned tablet since I don't think we had comprehensive test coverage there earlier. -- To view, visit http://gerrit.cloudera.org:8080/7961 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idbc9cecac701c24ebe05759adbbb1fbe2b140506 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] docs: update disk failure recovery notes
Todd Lipcon has posted comments on this change. Change subject: docs: update disk failure recovery notes .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/7984/1/docs/administration.adoc File docs/administration.adoc: Line 597: empty all of the server's existing directories. For example, if a tablet server do you think a WARNING bar is appropriate here? I wonder if someone would follow these directions without making sure they have other good replicas or backups of their data? PS1, Line 608: along with any newly-added data : directories you mean 'and new data directories have been created with appropriate permissions' or something? PS1, Line 630: suicide_on_eio=false`. When set, tablets with data on a failed disk : will not be opened and will be re-replicated as necessary given that we are currently defaulting to spreading tablets across all disks, is this really relevant? ie this would start up with no tablets, but still have a bunch of data on those disks? I also wonder whether we should be documenting this flag considering it is tagged experimental. Also I wonder if some people may find the name offensive/upsetting and we short rename to 'abort_on_eio'? -- To view, visit http://gerrit.cloudera.org:8080/7984 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I732286d0f56f7a15705ad544fc7dfc426287714e Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] tablet copy: Allow voting from crashed initial tablet copies
Alexey Serbin has posted comments on this change. Change subject: tablet copy: Allow voting from crashed initial tablet copies .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/7961/3/src/kudu/tserver/tablet_copy_client.cc File src/kudu/tserver/tablet_copy_client.cc: PS3, Line 401: RETURN_NOT_OK(meta_->ReplaceSuperBlock(*superblock_)); > To answer your question, yes, according to the tests and the header doc for Great -- thank you for the clarification. Did you find that some additional coverage is needed for the case of aborted copy operation or it's already covered somewhere? -- To view, visit http://gerrit.cloudera.org:8080/7961 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idbc9cecac701c24ebe05759adbbb1fbe2b140506 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-2124. Don't hold session lock while initializing a TabletCopySession
Dan Burkert has uploaded a new change for review. http://gerrit.cloudera.org:8080/7985 Change subject: KUDU-2124. Don't hold session lock while initializing a TabletCopySession .. KUDU-2124. Don't hold session lock while initializing a TabletCopySession This patch replaces the interior mutex in TabletCopySourceSession with a dynamic once, so that initialization can happen concurrently. This allows initialization to happen outside of the critical section in the tablet copy service. Implemented a regression test that injects latency into BeginTabletCopySession() calls. Change-Id: If8bd295a59ade8c89fdf1853dd64c4bceca8da91 --- M src/kudu/integration-tests/cluster_itest_util.cc M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/tserver/tablet_copy_service-test.cc M src/kudu/tserver/tablet_copy_service.cc M src/kudu/tserver/tablet_copy_source_session.cc M src/kudu/tserver/tablet_copy_source_session.h M src/kudu/tserver/tablet_server-test-base.h M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/tablet_server_test_util.cc M src/kudu/tserver/tablet_server_test_util.h 11 files changed, 205 insertions(+), 51 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/85/7985/1 -- To view, visit http://gerrit.cloudera.org:8080/7985 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: If8bd295a59ade8c89fdf1853dd64c4bceca8da91 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Mike Percy
[kudu-CR] tablet copy: Allow voting from crashed initial tablet copies
Alexey Serbin has posted comments on this change. Change subject: tablet copy: Allow voting from crashed initial tablet copies .. Patch Set 4: Code-Review+2 (1 comment) BTW, what was the setting for --stress_cpu_threads for that 200x run? Yesterday I ran delete_table-itest and found that it failed with 1/500 ratio at AddServer(), so I posted a fix (with similar workaround) for that. http://gerrit.cloudera.org:8080/#/c/7961/3/src/kudu/integration-tests/tablet_copy-itest.cc File src/kudu/integration-tests/tablet_copy-itest.cc: PS3, Line 1187: LOG(INFO) << "Restarting tablet servers. New replica TS UUID: " << new_replica_uuid : << ", tablet data state: " : << tablet::TabletDataState_Name(superblock.tablet_data_state()) : << ", last-logged opid: " << superblock.tombstone_last_logged_opid(); > I understand wanting to keep the logs short, but that's a lot to ask for th SGTM -- thank you for the clarification. -- To view, visit http://gerrit.cloudera.org:8080/7961 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idbc9cecac701c24ebe05759adbbb1fbe2b140506 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Add tablet state summary metrics
Adar Dembo has posted comments on this change. Change subject: Add tablet state summary metrics .. Patch Set 4: Code-Review+1 Please get an additional review from Mike and/or Todd. -- To view, visit http://gerrit.cloudera.org:8080/7980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9fda2061d341586f0aa343787af59984a627080a Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] tablet copy: Allow voting from crashed initial tablet copies
Mike Percy has posted comments on this change. Change subject: tablet copy: Allow voting from crashed initial tablet copies .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/7961/3/src/kudu/integration-tests/tablet_copy-itest.cc File src/kudu/integration-tests/tablet_copy-itest.cc: PS3, Line 1187: LOG(INFO) << "Restarting tablet servers. New replica TS UUID: " << new_replica_uuid : << ", tablet data state: " : << tablet::TabletDataState_Name(superblock.tablet_data_state()) : << ", last-logged opid: " << superblock.tombstone_last_logged_opid(); > OK, that sounds good to me. I understand wanting to keep the logs short, but that's a lot to ask for this test, which uses ExternalMiniCluster with a master and 3 tablet servers. It's very loud. This single line is very useful when debugging a failure, and it's just a drop in the ocean of log output. -- To view, visit http://gerrit.cloudera.org:8080/7961 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idbc9cecac701c24ebe05759adbbb1fbe2b140506 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-2044 Tombstoned tablets show up in /metrics
Adar Dembo has posted comments on this change. Change subject: KUDU-2044 Tombstoned tablets show up in /metrics .. Patch Set 3: Code-Review+1 (2 comments) Please solicit reviews from Todd and Mike. http://gerrit.cloudera.org:8080/#/c/7981/3//COMMIT_MSG Commit Message: PS3, Line 10: it is http://gerrit.cloudera.org:8080/#/c/7981/3/src/kudu/tserver/tablet_server-test.cc File src/kudu/tserver/tablet_server-test.cc: PS3, Line 2582: do go? -- To view, visit http://gerrit.cloudera.org:8080/7981 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21b52262203410ded1e514502b59a79be12f8fb3 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] tablet copy: Allow voting from crashed initial tablet copies
Alexey Serbin has posted comments on this change. Change subject: tablet copy: Allow voting from crashed initial tablet copies .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/7961/3/src/kudu/integration-tests/tablet_copy-itest.cc File src/kudu/integration-tests/tablet_copy-itest.cc: PS3, Line 1155: ASSERT_EVENTUALLY > That might be due to high load on the test machine. Just a clarification: I don't insist on updating the timeout for ASSERT_EVENTUALLY(), just wanted to understand this. If you think that's all right and 30 seconds should be also used for the overall timeout here -- that's fine with me. PS3, Line 1187: LOG(INFO) << "Restarting tablet servers. New replica TS UUID: " << new_replica_uuid : << ", tablet data state: " : << tablet::TabletDataState_Name(superblock.tablet_data_state()) : << ", last-logged opid: " << superblock.tombstone_last_logged_opid(); > I don't think so, because logging before a failure would make that failure OK, that sounds good to me. Just another nit: is this information is crucial for the successful run as well? If not, consider using SCOPED_TRACE() here instead of LOG(INFO). My point is: if the information is not needed in case of successful run of the test, I would prefer that information not to be output at all. It's not a crucial thing, just my 2 cents -- feel free to ignore, though. -- To view, visit http://gerrit.cloudera.org:8080/7961 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idbc9cecac701c24ebe05759adbbb1fbe2b140506 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] tablet copy: Allow voting from crashed initial tablet copies
Mike Percy has posted comments on this change. Change subject: tablet copy: Allow voting from crashed initial tablet copies .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/7961/3/src/kudu/integration-tests/tablet_copy-itest.cc File src/kudu/integration-tests/tablet_copy-itest.cc: PS3, Line 1155: ASSERT_EVENTUALLY > That might be due to high load on the test machine. That assumes that we expect the timeouts in this test to actually max out the timeout. The kTimeout in this test is really there to cap how long a hang takes to fail the test. If this was production, I would agree with you that we would need to think carefully about the value. But in a test, the goal of the timeouts is different, and the only flakiness I saw in this test before wrapping this section in an ASSERT_EVENTUALLY was that AddServer() would immediately return with an error indicating that the server we contacted was no longer the leader. -- To view, visit http://gerrit.cloudera.org:8080/7961 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idbc9cecac701c24ebe05759adbbb1fbe2b140506 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] tablet copy: Allow voting from crashed initial tablet copies
Mike Percy has posted comments on this change. Change subject: tablet copy: Allow voting from crashed initial tablet copies .. Patch Set 4: FYI I looped tablet_copy-itest 200x on the latest rev and didn't see any failures: http://dist-test.cloudera.org/job?job_id=mpercy.1504736695.21394 -- To view, visit http://gerrit.cloudera.org:8080/7961 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idbc9cecac701c24ebe05759adbbb1fbe2b140506 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] Add tablet state summary metrics
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7980 to look at the new patch set (#4). Change subject: Add tablet state summary metrics .. Add tablet state summary metrics This patch adds metrics for the number of tablets in each state (i.e. the number of tablets for each value of TabletStatePB). To avoid contention in the tablet manager, the frequency with which the state numbers are computed is limited. Requests that come too soon for an update will receive cached counts. The maximum frequency of updates is controlled by a new flag. Change-Id: I9fda2061d341586f0aa343787af59984a627080a --- M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/util/metrics.cc M src/kudu/util/metrics.h 5 files changed, 306 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/80/7980/4 -- To view, visit http://gerrit.cloudera.org:8080/7980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9fda2061d341586f0aa343787af59984a627080a Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley
[kudu-CR] Add tablet state summary metrics
Will Berkeley has posted comments on this change. Change subject: Add tablet state summary metrics .. Patch Set 3: (5 comments) http://gerrit.cloudera.org:8080/#/c/7980/1/src/kudu/integration-tests/tablet_copy-itest.cc File src/kudu/integration-tests/tablet_copy-itest.cc: PS1, Line 1545: ASSERT_EVENTUALLY([&] { : ASSERT_EQ(1, CountRunningTablets(cluster_->tablet_server(follower_index)) > Can you use ASSERT_EVENTUALLY rather than sleeping? Done http://gerrit.cloudera.org:8080/#/c/7980/3/src/kudu/integration-tests/tablet_copy-itest.cc File src/kudu/integration-tests/tablet_copy-itest.cc: Line 1555: SleepFor(MonoDelta::FromMilliseconds(15)); > Shouldn't need an explicit sleep anymore; let ASSERT_EVENTUALLY() take care :( http://gerrit.cloudera.org:8080/#/c/7980/2/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: Line 1227: if (last_walked_ + period < MonoTime::Now()) { > Nit: maybe refactor: Done http://gerrit.cloudera.org:8080/#/c/7980/3/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: PS3, Line 1230: auto & > Nit: 'const auto&' is actually more GSG-compliant, since the '&' is part of I plead innocent. My IDE moves *'s and &'s when I'm not looking. I did go fix the setting for it, so shouldn't happen anymore. http://gerrit.cloudera.org:8080/#/c/7980/2/src/kudu/tserver/ts_tablet_manager.h File src/kudu/tserver/ts_tablet_manager.h: Line 329: // Update the TabletStateSummary for this tablet manager and return the > Update this. Done -- To view, visit http://gerrit.cloudera.org:8080/7980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9fda2061d341586f0aa343787af59984a627080a Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] tablet copy: Allow voting from crashed initial tablet copies
Alexey Serbin has posted comments on this change. Change subject: tablet copy: Allow voting from crashed initial tablet copies .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/7961/3/src/kudu/integration-tests/tablet_copy-itest.cc File src/kudu/integration-tests/tablet_copy-itest.cc: PS3, Line 1155: ASSERT_EVENTUALLY > That would make this test flaky. But why would AddServer() take 28 seconds? That might be due to high load on the test machine. My point was: if you set the timeout for the individual operations to X seconds, then I would expect the overall timeout for that operation wrapped with ASSERT_EVENTUALLY() be X * M, where M is the maximum number of anticipated retries. -- To view, visit http://gerrit.cloudera.org:8080/7961 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idbc9cecac701c24ebe05759adbbb1fbe2b140506 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-2119. Fix failure in encoding-test
Todd Lipcon has submitted this change and it was merged. Change subject: KUDU-2119. Fix failure in encoding-test .. KUDU-2119. Fix failure in encoding-test In commit d1f53cc32 I introduced randomization for the format string used for the generated string data in this test. The random format string could sometimes incorporate '\0' bytes, which, in the worst case, could result in a string of length 0 or 1. This would then cause a later assertion to fail that was checking that the encoded data be at least two bytes per string. The fix switches from using a printf-style string to instead use a std::function to generate the data. The implementation of the function avoids using C strings and thus permits embedded null bytes. Change-Id: Ic46d2a993235e560475d931c7b023eb5b4faf7c2 Reviewed-on: http://gerrit.cloudera.org:8080/7967 Tested-by: Kudu Jenkins Reviewed-by: Hao HaoReviewed-by: Alexey Serbin Reviewed-by: Andrew Wong Reviewed-by: Adar Dembo --- M src/kudu/cfile/encoding-test.cc M src/kudu/util/random_util.cc M src/kudu/util/random_util.h 3 files changed, 49 insertions(+), 38 deletions(-) Approvals: Hao Hao: Looks good to me, but someone else must approve Andrew Wong: Looks good to me, but someone else must approve Adar Dembo: Looks good to me, approved Alexey Serbin: Looks good to me, but someone else must approve Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ic46d2a993235e560475d931c7b023eb5b4faf7c2 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] tablet copy: Allow voting from crashed initial tablet copies
Todd Lipcon has posted comments on this change. Change subject: tablet copy: Allow voting from crashed initial tablet copies .. Patch Set 4: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/7961 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idbc9cecac701c24ebe05759adbbb1fbe2b140506 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1755 Part 2: Improve tablet on-disk size metric
Will Berkeley has posted comments on this change. Change subject: KUDU-1755 Part 2: Improve tablet on-disk size metric .. Patch Set 13: (1 comment) http://gerrit.cloudera.org:8080/#/c/6968/13/src/kudu/consensus/consensus_meta.h File src/kudu/consensus/consensus_meta.h: PS13, Line 244: int64 > Thanks. Just FYI, here is some more fun facts about that: https://news.yco Show Mike :) -- To view, visit http://gerrit.cloudera.org:8080/6968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] KUDU-2044 Tombstoned tablets show up in /metrics
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7981 to look at the new patch set (#3). Change subject: KUDU-2044 Tombstoned tablets show up in /metrics .. KUDU-2044 Tombstoned tablets show up in /metrics This patch stops tombstoned tablets from showing up in /metrics. When a tablet replica is shut down, the tablet's metric entity it marked as unpublished. All its child metrics will be marked for retirement, then retired after the retirement interval has passed. Then the entity itself is unregistered, i.e. removed from the metric registry's map of entities. If a new replica of the same tablet is added to the server, it will create a new entity that will be registered with the metric_registry, either as a new insertion or overwriting the old replica's entity if the entity had been unpublished but not yet unregistered. The tombstoned's tablets metric entity is not destroyed when it's deregistered, since there are still refs to it and its metrics in the TabletReplica class, Tablet class, and many other classes associated with a TabletReplica. The entity will be destroyed when the TabletReplica is deleted (since it either contains or holds final references to all the other classes), which happens if the tablet is replaced or deleted. While it's not ideal to hold the memory for the entity when it's no longer used, the reason this occurs is because the TabletReplica instance mostly stays alive. Releasing all the metric references would require specifically dropping refs to those metrics in all the surviving subcomponents of a TabletReplica instance that has been shut down; I think this problem would be better solved by more completely cleaning up a shut down TabletReplica instance, but that's a much larger scope than suppressing the metrics. Change-Id: I21b52262203410ded1e514502b59a79be12f8fb3 --- M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/tablet/tablet.cc M src/kudu/tserver/tablet_server-test.cc M src/kudu/util/metrics.cc M src/kudu/util/metrics.h 5 files changed, 144 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/81/7981/3 -- To view, visit http://gerrit.cloudera.org:8080/7981 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I21b52262203410ded1e514502b59a79be12f8fb3 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2044 Tombstoned tablets show up in /metrics
Will Berkeley has posted comments on this change. Change subject: KUDU-2044 Tombstoned tablets show up in /metrics .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/7981/1//COMMIT_MSG Commit Message: PS1, Line 12: If a new copy of the tablet is placed on the server, the new : tablet's metric entity will replace the old one, if the old one's : hasn't been deregistered yet (steps toward deregistration occur only : when the entity publishes metrics e.g. on a call to /metrics). > I'm still finding this sentence confusing. The old entity is replaced if it Will reword. I think the confusion here is that deregister != unpublish. Unpublishing just marks the entity for eventual deregistration. Deregistration is simply the metric registry erasing the entity from its map of entities. Metrics ultimately stop showing up only when the entity is deregistered, not merely when it is unpublished. http://gerrit.cloudera.org:8080/#/c/7981/2/src/kudu/integration-tests/tablet_copy-itest.cc File src/kudu/integration-tests/tablet_copy-itest.cc: Line 111: using std::vector; > warning: redundant 'METRIC_ENTITY_server' declaration [readability-redundan Done http://gerrit.cloudera.org:8080/#/c/7981/2/src/kudu/tserver/tablet_server-test.cc File src/kudu/tserver/tablet_server-test.cc: PS2, Line 2582: // The next two calls to /jsonmetricz should show tablet metrics: : // 1. The first call causes the registry to discover the tablet's metric entity : // has been de-published and marks it for retirement after the expiration period. : // 2. The second call causes the entity to retire all its metrics, since we've : // passed the retirement period. Then, the registry de-registers the entity. : EasyCurl c; > Nit: since you have a 2-item list already set up, I think formatting the th Done -- To view, visit http://gerrit.cloudera.org:8080/7981 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21b52262203410ded1e514502b59a79be12f8fb3 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] KUDU-2119. Fix failure in encoding-test
Adar Dembo has posted comments on this change. Change subject: KUDU-2119. Fix failure in encoding-test .. Patch Set 4: Code-Review+2 I didn't actually review this, but three +1s should warrant a +2. -- To view, visit http://gerrit.cloudera.org:8080/7967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic46d2a993235e560475d931c7b023eb5b4faf7c2 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-2134: Defer block transaction commit to the end of a tablet copy
Adar Dembo has posted comments on this change. Change subject: KUDU-2134: Defer block transaction commit to the end of a tablet copy .. Patch Set 4: Code-Review+1 (1 comment) Please get a +2 from Mike. http://gerrit.cloudera.org:8080/#/c/7966/4/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: Line 574: // Nit: maybe revert this change to avoid touching this file at all? -- To view, visit http://gerrit.cloudera.org:8080/7966 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4feabc08c2199aad8d08be56f09ac06924345f2b Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-2119. Fix failure in encoding-test
Alexey Serbin has posted comments on this change. Change subject: KUDU-2119. Fix failure in encoding-test .. Patch Set 4: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/7967/2/src/kudu/cfile/encoding-test.cc File src/kudu/cfile/encoding-test.cc: PS2, Line 271: return RandomString(len, > yea I think that's fine and desirable since it tests zero-length strings. T ok, I see -- thank you for the clarification. -- To view, visit http://gerrit.cloudera.org:8080/7967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic46d2a993235e560475d931c7b023eb5b4faf7c2 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] Add tablet state summary metrics
Adar Dembo has posted comments on this change. Change subject: Add tablet state summary metrics .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/7980/3/src/kudu/integration-tests/tablet_copy-itest.cc File src/kudu/integration-tests/tablet_copy-itest.cc: Line 1555: SleepFor(MonoDelta::FromMilliseconds(15)); Shouldn't need an explicit sleep anymore; let ASSERT_EVENTUALLY() take care of it. Below too. http://gerrit.cloudera.org:8080/#/c/7980/3/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: PS3, Line 1230: auto & Nit: 'const auto&' is actually more GSG-compliant, since the '&' is part of the type. -- To view, visit http://gerrit.cloudera.org:8080/7980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9fda2061d341586f0aa343787af59984a627080a Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] KUDU-2119. Fix failure in encoding-test
Andrew Wong has posted comments on this change. Change subject: KUDU-2119. Fix failure in encoding-test .. Patch Set 4: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/7967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic46d2a993235e560475d931c7b023eb5b4faf7c2 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1755 Part 2: Improve tablet on-disk size metric
Alexey Serbin has posted comments on this change. Change subject: KUDU-1755 Part 2: Improve tablet on-disk size metric .. Patch Set 15: (1 comment) http://gerrit.cloudera.org:8080/#/c/6968/13/src/kudu/consensus/consensus_meta.h File src/kudu/consensus/consensus_meta.h: PS13, Line 244: d to > It's just for consistency with the other on disk size metrics. They were a Thanks. Just FYI, here is some more fun facts about that: https://news.ycombinator.com/item?id=8690952 https://research.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html -- To view, visit http://gerrit.cloudera.org:8080/6968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315 Gerrit-PatchSet: 15 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] docs: update disk failure recovery notes
Andrew Wong has uploaded a new change for review. http://gerrit.cloudera.org:8080/7984 Change subject: docs: update disk failure recovery notes .. docs: update disk failure recovery notes Adds more context around disk failures and separates out steps to rebuild a server with a different directory configuration. Change-Id: I732286d0f56f7a15705ad544fc7dfc426287714e --- M docs/administration.adoc 1 file changed, 36 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/84/7984/1 -- To view, visit http://gerrit.cloudera.org:8080/7984 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I732286d0f56f7a15705ad544fc7dfc426287714e Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong
[kudu-CR] KUDU-2119. Fix failure in encoding-test
Hao Hao has posted comments on this change. Change subject: KUDU-2119. Fix failure in encoding-test .. Patch Set 4: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/7967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic46d2a993235e560475d931c7b023eb5b4faf7c2 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] tablet copy: Allow voting from crashed initial tablet copies
Mike Percy has posted comments on this change. Change subject: tablet copy: Allow voting from crashed initial tablet copies .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/7961/3/src/kudu/integration-tests/tablet_copy-itest.cc File src/kudu/integration-tests/tablet_copy-itest.cc: PS3, Line 1155: ASSERT_EVENTUALLY > Yep, but what if AddServer() takes 28 seconds to complete, and returns Stat That would make this test flaky. But why would AddServer() take 28 seconds? -- To view, visit http://gerrit.cloudera.org:8080/7961 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idbc9cecac701c24ebe05759adbbb1fbe2b140506 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] tablet copy: Allow voting from crashed initial tablet copies
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7961 to look at the new patch set (#4). Change subject: tablet copy: Allow voting from crashed initial tablet copies .. tablet copy: Allow voting from crashed initial tablet copies The change in c35e4f0093dcdec625d1f647924d854c7bc9f3de missed an important case, which is when the target tablet server in a tablet copy operation crashes while in the middle of a tablet copy. In that case, the 'tombstone_last_logged_opid' of 1.0 was not persisted in the superblock. This manifested as a very flaky TabletCopyITest.TestTabletCopyNewReplicaFailureCanVote test, which would fail about 15% of the time. With the change in this patch, that test now passes reliably: http://dist-test.cloudera.org/job?job_id=mpercy.1504654266.31446 Change-Id: Idbc9cecac701c24ebe05759adbbb1fbe2b140506 --- M src/kudu/consensus/consensus-test-util.h M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/tablet-harness.h M src/kudu/tablet/tablet_bootstrap-test.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tools/kudu-tool-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 11 files changed, 182 insertions(+), 35 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/7961/4 -- To view, visit http://gerrit.cloudera.org:8080/7961 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idbc9cecac701c24ebe05759adbbb1fbe2b140506 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] test workload: make table creation more robust
Hao Hao has posted comments on this change. Change subject: test_workload: make table creation more robust .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/7982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic1e76e502359b499466cfa90d21ac22f35928261 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] tablet copy: Allow voting from crashed initial tablet copies
Mike Percy has posted comments on this change. Change subject: tablet copy: Allow voting from crashed initial tablet copies .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/7961/3/src/kudu/master/sys_catalog.cc File src/kudu/master/sys_catalog.cc: Line 243: boost::none, > mind adding a /* name= */ comment here so it's clearer the argument? the re Done http://gerrit.cloudera.org:8080/#/c/7961/3/src/kudu/tablet/tablet-harness.h File src/kudu/tablet/tablet-harness.h: Line 103:boost::none, > same Done http://gerrit.cloudera.org:8080/#/c/7961/3/src/kudu/tserver/tablet_copy_client.cc File src/kudu/tserver/tablet_copy_client.cc: PS3, Line 293: if (last_logged_opid) { : *superblock_->mutable_tombstone_last_logged_opid() = *last_logged_opid; : } : : > I don't think it functionally matters, but I think this probably should go Done PS3, Line 401: RETURN_NOT_OK(meta_->ReplaceSuperBlock(*superblock_)); > Hmm. Let me make sure we have coverage for this case by adding a test. To answer your question, yes, according to the tests and the header doc for TabletMetadata::DeleteTabletData(), if last_logged_opid is passed as boost::none, then the previous value will be retained. But I've added it to the header doc of TSTabletManager::DeleteTabletData() as well. -- To view, visit http://gerrit.cloudera.org:8080/7961 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idbc9cecac701c24ebe05759adbbb1fbe2b140506 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Add tablet state summary metrics
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7980 to look at the new patch set (#3). Change subject: Add tablet state summary metrics .. Add tablet state summary metrics This patch adds metrics for the number of tablets in each state (i.e. the number of tablets for each value of TabletStatePB). To avoid contention in the tablet manager, the frequency with which the state numbers are computed is limited. Requests that come too soon for an update will receive cached counts. The maximum frequency of updates is controlled by a new flag. Change-Id: I9fda2061d341586f0aa343787af59984a627080a --- M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/util/metrics.cc M src/kudu/util/metrics.h 5 files changed, 308 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/80/7980/3 -- To view, visit http://gerrit.cloudera.org:8080/7980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9fda2061d341586f0aa343787af59984a627080a Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2134: Defer block transaction commit to the end of a tablet copy
Hao Hao has posted comments on this change. Change subject: KUDU-2134: Defer block transaction commit to the end of a tablet copy .. Patch Set 4: > > The test failure is unrelated to your change. > > I put up a fix for this test failure: http://gerrit.cloudera.org:8080/7982 Thanks a lot for doing that! -- To view, visit http://gerrit.cloudera.org:8080/7966 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4feabc08c2199aad8d08be56f09ac06924345f2b Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: No
[kudu-CR] KUDU-2134: Defer block transaction commit to the end of a tablet copy
Hao Hao has posted comments on this change. Change subject: KUDU-2134: Defer block transaction commit to the end of a tablet copy .. Patch Set 4: (8 comments) http://gerrit.cloudera.org:8080/#/c/7966/3/src/kudu/tserver/tablet_copy_client.cc File src/kudu/tserver/tablet_copy_client.cc: Line 365: // Defer the closures of all downloaded blocks to here, but before superblock > Add a comment explaining why this happens before superblock replacement. Done http://gerrit.cloudera.org:8080/#/c/7966/3/src/kudu/tserver/tablet_copy_client.h File src/kudu/tserver/tablet_copy_client.h: PS3, Line 115: Add > Adds, but this comment isn't quite right since the transaction is no longer Done PS3, Line 168: ransaction. > tablet copy's transaction Done PS3, Line 168: to the tablet > Just remove this. Done PS3, Line 176: tablet copy's transaction > tablet copy's transaction Done PS3, Line 189: tablet copy's transaction > tablet copy's transaction Done http://gerrit.cloudera.org:8080/#/c/7966/3/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: PS3, Line 577: if (!s.ok()) { : LOG(WARNING) << LogPrefix(tablet_id) << "Tablet Copy: Unable to fetch data from remote peer " : << kSrcPeerInfo << ": " << s.ToString(); : return; > This comment is no longer relevant here. Done Line 590: << s.ToString(); > This comment is no longer relevant here. Done -- To view, visit http://gerrit.cloudera.org:8080/7966 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4feabc08c2199aad8d08be56f09ac06924345f2b Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-2134: Defer block transaction commit to the end of a tablet copy
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7966 to look at the new patch set (#4). Change subject: KUDU-2134: Defer block transaction commit to the end of a tablet copy .. KUDU-2134: Defer block transaction commit to the end of a tablet copy KUDU-2131 uncovered a regression that causes a tablet copy session to expire reliably if the copied tablet is large and the tserver already has a high number of containers. Even though the issue is mitigated by LIFO container selection, we can completely avoid it by deferring the block transaction commit to the end of tablet copy seesion. There are mainly two reasons we may still want to do so: 1) If DownloadWALs fails there's no reason to pay the fsync() price and commit all those blocks. 2) While DownloadWALs is running the kernel has more time to eagerly flush the blocks, so the fsync()s at the end could be cheaper. This patch moves the block transaction commit out from FetchAll() and commits all downloaded blocks before replacing the superblock. Change-Id: I4feabc08c2199aad8d08be56f09ac06924345f2b --- M src/kudu/tserver/tablet_copy_client-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/ts_tablet_manager.cc 4 files changed, 41 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/7966/4 -- To view, visit http://gerrit.cloudera.org:8080/7966 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4feabc08c2199aad8d08be56f09ac06924345f2b Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] test workload: make table creation more robust
Alexey Serbin has posted comments on this change. Change subject: test_workload: make table creation more robust .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic1e76e502359b499466cfa90d21ac22f35928261 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1807: improve IsCreateInProgress performance
Todd Lipcon has posted comments on this change. Change subject: KUDU-1807: improve IsCreateInProgress performance .. Patch Set 3: (11 comments) http://gerrit.cloudera.org:8080/#/c/7957/3/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: Line 554 hrm, why'd you remove the anonymous namespace here? actually it appears it should probably be higher up to encompass the visitors as well? PS3, Line 634: the same order as tablet : // lock acquisition I thought we usually acquire the table lock first, but here we are acquiring the tablet lock in commit mode before we are trying to acquire some table-level lock in ApplyTabletStateChange, no? PS3, Line 639: const auto* change = FindOrNull(state_changes_, t); : if (change) { : t->mutable_metadata()->CommitMutation([&]() { : t->table()->ApplyTabletStateChange(change->first, change->second); : }); : } else { : t->mutable_metadata()->CommitMutation(); : } hrm, I think this might be easier to read as: t->mutable_metadata()->CommitMutation([&]() { if (const auto* change = FindOrNull(state_changes_, t)) { t->table()->ApplyTabletStateChange(change->first, change->second); } }); i.e fold the branch inside the lambda rather than outside, so you get rid of the multiple call sites to CommitMutation PS3, Line 3747: unlocker_out update this name Line 4573: VLOG(3) << Substitute("$0: incrementing tablet state $1", this log might be more useful if it included either the old or new value of the count Line 4582: VLOG(3) << Substitute("$0: decrementing tablet state $1", same PS3, Line 4589: DCHECK I think this is worth a CHECK because otherwise we will be decrementing some random memory Line 4592: if (it->second == 0) { maybe DCHECK_GE(it->second, 0)? PS3, Line 4597: VerifyTabletStateCountsUnlocked maybe expand this to also verify teh schema version counts since it's a similar use case? http://gerrit.cloudera.org:8080/#/c/7957/3/src/kudu/master/catalog_manager.h File src/kudu/master/catalog_manager.h: PS3, Line 298: ApplyTabletStateChange how about AccountTabletStateChange since it's only accounting and not actually changing any states? PS3, Line 739: tablet_lock change param name -- To view, visit http://gerrit.cloudera.org:8080/7957 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idca379bb1249e54c0b7fbb0df408ed829ab9fbd2 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-2044 Tombstoned tablets show up in /metrics
Adar Dembo has posted comments on this change. Change subject: KUDU-2044 Tombstoned tablets show up in /metrics .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/7981/1//COMMIT_MSG Commit Message: PS1, Line 12: If a new copy of the tablet is placed on the server, the new : tablet's metric entity will replace the old one, if the old one's : hasn't been deregistered yet (steps toward deregistration occur only : when the entity publishes metrics e.g. on a call to /metrics). > The old entity is deregistered. This situation is what's tested in tablet_c I'm still finding this sentence confusing. The old entity is replaced if it _was not_ deregistered? And if it was already deregistered, then it's deregistered now, and replaced? Could you maybe reword this to clarify exactly what's going on? http://gerrit.cloudera.org:8080/#/c/7981/2/src/kudu/tserver/tablet_server-test.cc File src/kudu/tserver/tablet_server-test.cc: PS2, Line 2582: // The first two calls to /jsonmetricz should show tablet metrics: : // 1. The first call causes the registry to discover the tablet's metric entity : // has been unpublished and marks it for retirement after the expiration period. : // 2. The second call causes the entity to retire all its metrics, since we've : // passed the retirement period. Then, the registry de-registers the entity. : // Therefore, a third call will not see tablet metrics. Nit: since you have a 2-item list already set up, I think formatting the third call as the third item would be more readable. Something like: // We make three calls to /jsonmetrics. // 1. Return the metric, but mark it for retirement. // 2. Return it, but also retire it. // 3. Don't return it because it's been retired. -- To view, visit http://gerrit.cloudera.org:8080/7981 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21b52262203410ded1e514502b59a79be12f8fb3 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] KUDU-2044 Tombstoned tablets show up in /metrics
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7981 to look at the new patch set (#2). Change subject: KUDU-2044 Tombstoned tablets show up in /metrics .. KUDU-2044 Tombstoned tablets show up in /metrics This patch stops tombstoned tablets from showing up in /metrics. It marks the tablet metric entity as unpublished, which causes all of its metrics to be retired and causes the metric registry to de-register it. If a new copy of the tablet is placed on the server, the new tablet's metric entity will replace the old one, if the old one's hasn't been deregistered yet (steps toward deregistration occur only when the entity publishes metrics e.g. on a call to /metrics). The tombstoned's tablets metric entity is not destroyed when it's deregistered, since there are still refs to it and its metrics in the TabletReplica class, Tablet class, and many other classes associated with a TabletReplica. The entity will be destroyed when the TabletReplica is deleted (since it either contains or holds final references to all the other classes), which happens if the tablet is replaced or deleted. While it's not ideal to hold the memory for the entity when it's no longer used, the reason this occurs is because the TabletReplica instance mostly stays alive. Releasing all the metric references would require specifically dropping refs to those metrics in all the surviving subcomponents of a TabletReplica instance that has been shut down; I think this problem would be better solved by more completely cleaning up a shut down TabletReplica instance, but that's a much larger scope than suppressing the metrics. Change-Id: I21b52262203410ded1e514502b59a79be12f8fb3 --- M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/tablet/tablet.cc M src/kudu/tserver/tablet_server-test.cc M src/kudu/util/metrics.cc M src/kudu/util/metrics.h 5 files changed, 146 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/81/7981/2 -- To view, visit http://gerrit.cloudera.org:8080/7981 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I21b52262203410ded1e514502b59a79be12f8fb3 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2044 Tombstoned tablets show up in /metrics
Will Berkeley has posted comments on this change. Change subject: KUDU-2044 Tombstoned tablets show up in /metrics .. Patch Set 1: (7 comments) http://gerrit.cloudera.org:8080/#/c/7981/1//COMMIT_MSG Commit Message: PS1, Line 12: If a new copy of the tablet is placed on the server, the new : tablet's metric entity will replace the old one, if the old one's : hasn't been deregistered yet (steps toward deregistration occur only : when the entity publishes metrics e.g. on a call to /metrics). > So what happens if a new copy lands _before_ the old entity was deregistere The old entity is deregistered. This situation is what's tested in tablet_copy-itest, since nothing is calling /metrics to cause the retirement + deregistration. http://gerrit.cloudera.org:8080/#/c/7981/1/src/kudu/integration-tests/tablet_copy-itest.cc File src/kudu/integration-tests/tablet_copy-itest.cc: PS1, Line 79: METRIC_DECLARE_entity(tablet); : METRIC_DECLARE_counter(rows_inserted); > Nit: move this to the declarations below. Done Line 1540: > Would it be useful to also check at this point that GetInt64Metric fails, b It would be, but there's a small chance for flakiness since the entity may already have been republished (tablet recreated) after the itest::DeleteTablet call returns, depending on timing. It'd be unusual, but possible. I think it's ok to leave out since there's another test checking that the metric doesn't show up after unpublishing. The purpose of this test is to make sure the metrics reset if the tablet is revived. http://gerrit.cloudera.org:8080/#/c/7981/1/src/kudu/tablet/tablet.cc File src/kudu/tablet/tablet.cc: Line 304: metric_entity_->Depublish(); > Don't you have to check if this is non-null first? Doesn't look like it's g Done http://gerrit.cloudera.org:8080/#/c/7981/1/src/kudu/tserver/tablet_server-test.cc File src/kudu/tserver/tablet_server-test.cc: Line 2595: // Therefore, a third call will not see tablet metrics. > Maybe integrate this into the loop and check the value of 'i' to decide whe Done http://gerrit.cloudera.org:8080/#/c/7981/1/src/kudu/util/metrics.cc File src/kudu/util/metrics.cc: Line 483: UpdateReturnCopy(_, id, e, nullptr); > Since you're not interested in the returned value, a simpler "entities_[id] Done http://gerrit.cloudera.org:8080/#/c/7981/1/src/kudu/util/metrics.h File src/kudu/util/metrics.h: PS1, Line 503: depublished > Nit: "unpublish" is more grammatically correct. Could you use that instead Done -- To view, visit http://gerrit.cloudera.org:8080/7981 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21b52262203410ded1e514502b59a79be12f8fb3 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] Add tablet state summary metrics
Adar Dembo has posted comments on this change. Change subject: Add tablet state summary metrics .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/7980/2/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: Line 1227: if (last_walked_ + period > MonoTime::Now()) { Nit: maybe refactor: if (last_walked_ + period < MonoTime::Now()) { // Our cached counts are too old, regenerate them. last_walked_ = Now(); } return CountForState(st); http://gerrit.cloudera.org:8080/#/c/7980/2/src/kudu/tserver/ts_tablet_manager.h File src/kudu/tserver/ts_tablet_manager.h: Line 329: // Update the TabletStateSummary for this tablet manager. Update this. -- To view, visit http://gerrit.cloudera.org:8080/7980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9fda2061d341586f0aa343787af59984a627080a Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] Add tablet state summary metrics
Will Berkeley has posted comments on this change. Change subject: Add tablet state summary metrics .. Patch Set 2: (4 comments) I added a test and fixed a bug (plus the changes from your comments) http://gerrit.cloudera.org:8080/#/c/7980/1/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: PS1, Line 1259: // pass : > Got some tabs in here. Done PS1, Line 1267: int TSTabletManager::NumNotInitializedTablets() { : return UpdateTabletStateSummaryAndReturnCount(tablet::NOT_INITIALIZED); : } > Since this pattern is repeated a lot, perhaps you could change UpdateTablet Done http://gerrit.cloudera.org:8080/#/c/7980/1/src/kudu/tserver/ts_tablet_manager.h File src/kudu/tserver/ts_tablet_manager.h: Line 94: return num_not_initialized; > warning: missing username/bug in TODO [google-readability-todo] git blame'd on todd Line 308: // Handle the case on startup where we find a tablet that is not in > Nit: convention is to add the "Unlocked" suffix to functions that must be c Superseded by later comment + fix. -- To view, visit http://gerrit.cloudera.org:8080/7980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9fda2061d341586f0aa343787af59984a627080a Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] KUDU-1755 Part 2: Improve tablet on-disk size metric
Hello Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6968 to look at the new patch set (#15). Change subject: KUDU-1755 Part 2: Improve tablet on-disk size metric .. KUDU-1755 Part 2: Improve tablet on-disk size metric This adds WAL segment and cmeta sizes to the tablet on-disk metric. It also fixes the tablet superblock on disk size, which was being underestimated because the size was measured as the PB size instead of counting the container file the PB is stored in. It also exposes a second metric, Tablet Data Size On Disk, which measures just the size of data blocks and does not include metadata or WAL segments. Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315 --- M src/kudu/consensus/consensus_meta-test.cc M src/kudu/consensus/consensus_meta.cc M src/kudu/consensus/consensus_meta.h M src/kudu/consensus/log-test.cc M src/kudu/consensus/log.cc M src/kudu/consensus/log.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/tablet/diskrowset-test.cc M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/memrowset.h M src/kudu/tablet/mock-rowsets.h M src/kudu/tablet/rowset.cc M src/kudu/tablet/rowset.h M src/kudu/tablet/tablet-test.cc M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h M src/kudu/tablet/tablet_metadata-test.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h 24 files changed, 275 insertions(+), 63 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/68/6968/15 -- To view, visit http://gerrit.cloudera.org:8080/6968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315 Gerrit-PatchSet: 15 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] Add tablet state summary metrics
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7980 to look at the new patch set (#2). Change subject: Add tablet state summary metrics .. Add tablet state summary metrics This patch adds metrics for the number of tablets in each state (i.e. the number of tablets for each value of TabletStatePB). To avoid contention in the tablet manager, the frequency with which the state numbers are computed is limited. Requests that come too soon for an update will receive cached counts. The maximum frequency of updates is controlled by a new flag. Change-Id: I9fda2061d341586f0aa343787af59984a627080a --- M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/util/metrics.cc M src/kudu/util/metrics.h 5 files changed, 303 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/80/7980/2 -- To view, visit http://gerrit.cloudera.org:8080/7980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9fda2061d341586f0aa343787af59984a627080a Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-2119. Fix failure in encoding-test
Todd Lipcon has posted comments on this change. Change subject: KUDU-2119. Fix failure in encoding-test .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/7967/2/src/kudu/cfile/encoding-test.cc File src/kudu/cfile/encoding-test.cc: PS2, Line 271: int len = local_rng.Uniform(8) > What if len == 0 as a result of calling local_rng.Uniform(8)? Is that the yea I think that's fine and desirable since it tests zero-length strings. The "1 byte per entry" is still valid since we expect there to be at least a one-byte length for each entry, even if it's zero, in the current string encodings. (perhaps in the future we would encode lengths more efficiently but not today) http://gerrit.cloudera.org:8080/#/c/7967/3/src/kudu/cfile/encoding-test.cc File src/kudu/cfile/encoding-test.cc: Line 279: const uint kCount = r.Uniform(1000) + 1; > Why need to '+1' here? Is it to ensure to add at least one element to the b will do PS3, Line 285: kCount > Is it possible for s.size() == kCount? Can it be exactly 1 byte per entry? given there is some footer/header stuff as well, we're guaranteed more than one byte per -- To view, visit http://gerrit.cloudera.org:8080/7967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic46d2a993235e560475d931c7b023eb5b4faf7c2 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] KUDU-2119. Fix failure in encoding-test
Hello Will Berkeley, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7967 to look at the new patch set (#4). Change subject: KUDU-2119. Fix failure in encoding-test .. KUDU-2119. Fix failure in encoding-test In commit d1f53cc32 I introduced randomization for the format string used for the generated string data in this test. The random format string could sometimes incorporate '\0' bytes, which, in the worst case, could result in a string of length 0 or 1. This would then cause a later assertion to fail that was checking that the encoded data be at least two bytes per string. The fix switches from using a printf-style string to instead use a std::function to generate the data. The implementation of the function avoids using C strings and thus permits embedded null bytes. Change-Id: Ic46d2a993235e560475d931c7b023eb5b4faf7c2 --- M src/kudu/cfile/encoding-test.cc M src/kudu/util/random_util.cc M src/kudu/util/random_util.h 3 files changed, 49 insertions(+), 38 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/7967/4 -- To view, visit http://gerrit.cloudera.org:8080/7967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic46d2a993235e560475d931c7b023eb5b4faf7c2 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] test workload: make table creation more robust
Adar Dembo has uploaded a new patch set (#2). Change subject: test_workload: make table creation more robust .. test_workload: make table creation more robust The table creation here already works around the lack of Exactly Once semantics by considering certain kinds of failures to mean success. However, these failures imply that the client didn't finish waiting for the table to be created, so we need to do that ourselves. Without the patch, ClientStressTest_MultiMaster.TestLeaderResolutionTimeout failed 1/1000 times in DEBUG mode. With it, there were no failures. Change-Id: Ic1e76e502359b499466cfa90d21ac22f35928261 --- M src/kudu/integration-tests/test_workload.cc 1 file changed, 20 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/82/7982/2 -- To view, visit http://gerrit.cloudera.org:8080/7982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic1e76e502359b499466cfa90d21ac22f35928261 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2134: Defer block transaction commit to the end of a tablet copy
Adar Dembo has posted comments on this change. Change subject: KUDU-2134: Defer block transaction commit to the end of a tablet copy .. Patch Set 3: > The test failure is unrelated to your change. I put up a fix for this test failure: http://gerrit.cloudera.org:8080/7982 -- To view, visit http://gerrit.cloudera.org:8080/7966 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4feabc08c2199aad8d08be56f09ac06924345f2b Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: No
[kudu-CR] test workload: make table creation more robust
Hello Todd Lipcon, Alexey Serbin, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/7982 to review the following change. Change subject: test_workload: make table creation more robust .. test_workload: make table creation more robust The table creation here already works around the lack of Exactly Once semantics by considering certain kinds of failures to mean success. However, these failures imply that the client didn't finish waiting for the table to be created, so we need to do that ourselves. Change-Id: Ic1e76e502359b499466cfa90d21ac22f35928261 --- M src/kudu/integration-tests/test_workload.cc 1 file changed, 19 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/82/7982/1 -- To view, visit http://gerrit.cloudera.org:8080/7982 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ic1e76e502359b499466cfa90d21ac22f35928261 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1755 Part 2: Improve tablet on-disk size metric
Hello Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6968 to look at the new patch set (#14). Change subject: KUDU-1755 Part 2: Improve tablet on-disk size metric .. KUDU-1755 Part 2: Improve tablet on-disk size metric This adds WAL segment and cmeta sizes to the tablet on-disk metric. It also fixes the tablet superblock on disk size, which was being underestimated because the size was measured as the PB size instead of counting the container file the PB is stored in. It also exposes a second metric, Tablet Data Size On Disk, which measures just the size of data blocks and does not include metadata or WAL segments. Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315 --- M src/kudu/consensus/consensus_meta-test.cc M src/kudu/consensus/consensus_meta.cc M src/kudu/consensus/consensus_meta.h M src/kudu/consensus/log-test.cc M src/kudu/consensus/log.cc M src/kudu/consensus/log.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/tablet/diskrowset-test.cc M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/memrowset.h M src/kudu/tablet/mock-rowsets.h M src/kudu/tablet/rowset.cc M src/kudu/tablet/rowset.h M src/kudu/tablet/tablet-test.cc M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h M src/kudu/tablet/tablet_metadata-test.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h 24 files changed, 276 insertions(+), 63 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/68/6968/14 -- To view, visit http://gerrit.cloudera.org:8080/6968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315 Gerrit-PatchSet: 14 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-1755 Part 2: Improve tablet on-disk size metric
Will Berkeley has posted comments on this change. Change subject: KUDU-1755 Part 2: Improve tablet on-disk size metric .. Patch Set 13: (4 comments) http://gerrit.cloudera.org:8080/#/c/6968/13/src/kudu/consensus/consensus_meta.cc File src/kudu/consensus/consensus_meta.cc: Line 305: flush_count_for_tests_(0) { > consider initializing on_disk_size_ with 0 as well. Done http://gerrit.cloudera.org:8080/#/c/6968/13/src/kudu/consensus/consensus_meta.h File src/kudu/consensus/consensus_meta.h: PS13, Line 244: int64 > What is the reason to have this int64 instead of uint64_t which corresponds It's just for consistency with the other on disk size metrics. They were a mix of things but Mike requested they be int64_t according to GSG guidelines for signed vs. unsigned (speaking of which this should be an int64_t). Comment added. http://gerrit.cloudera.org:8080/#/c/6968/10/src/kudu/tablet/diskrowset.cc File src/kudu/tablet/diskrowset.cc: Line 690: Status DiskRowSet::CountRows(rowid_t *count) const { > On #1, I'm pretty certain that we need to acquire the lock to get a ref to Yes, I think we can take a ref under the lock and then call OnDiskSize() without the lock. delta_tracker_ doesn't require the lock at all. I've rewritten a couple of the methods to reflect this. http://gerrit.cloudera.org:8080/#/c/6968/10/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: PS10, Line 645: : Status TabletRepli > > TabletReplica::Shutdown() resets consensus_ with lock_ held. Old. Done. -- To view, visit http://gerrit.cloudera.org:8080/6968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] KUDU-2044 Tombstoned tablets show up in /metrics
Adar Dembo has posted comments on this change. Change subject: KUDU-2044 Tombstoned tablets show up in /metrics .. Patch Set 1: (7 comments) http://gerrit.cloudera.org:8080/#/c/7981/1//COMMIT_MSG Commit Message: PS1, Line 12: If a new copy of the tablet is placed on the server, the new : tablet's metric entity will replace the old one, if the old one's : hasn't been deregistered yet (steps toward deregistration occur only : when the entity publishes metrics e.g. on a call to /metrics). So what happens if a new copy lands _before_ the old entity was deregistered? http://gerrit.cloudera.org:8080/#/c/7981/1/src/kudu/integration-tests/tablet_copy-itest.cc File src/kudu/integration-tests/tablet_copy-itest.cc: PS1, Line 79: METRIC_DECLARE_entity(tablet); : METRIC_DECLARE_counter(rows_inserted); Nit: move this to the declarations below. Line 1540: Would it be useful to also check at this point that GetInt64Metric fails, because the entity has been unpublished? http://gerrit.cloudera.org:8080/#/c/7981/1/src/kudu/tablet/tablet.cc File src/kudu/tablet/tablet.cc: Line 304: metric_entity_->Depublish(); Don't you have to check if this is non-null first? Doesn't look like it's guaranteed to be created. http://gerrit.cloudera.org:8080/#/c/7981/1/src/kudu/tserver/tablet_server-test.cc File src/kudu/tserver/tablet_server-test.cc: Line 2595: // Therefore, a third call will not see tablet metrics. Maybe integrate this into the loop and check the value of 'i' to decide whether to call STR_CONTAINS or STR_NOT_CONTAINS? http://gerrit.cloudera.org:8080/#/c/7981/1/src/kudu/util/metrics.cc File src/kudu/util/metrics.cc: Line 483: UpdateReturnCopy(_, id, e, nullptr); Since you're not interested in the returned value, a simpler "entities_[id] = e" should suffice. http://gerrit.cloudera.org:8080/#/c/7981/1/src/kudu/util/metrics.h File src/kudu/util/metrics.h: PS1, Line 503: depublished Nit: "unpublish" is more grammatically correct. Could you use that instead of "depublish"? -- To view, visit http://gerrit.cloudera.org:8080/7981 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21b52262203410ded1e514502b59a79be12f8fb3 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] tablet copy: Allow voting from crashed initial tablet copies
Todd Lipcon has posted comments on this change. Change subject: tablet copy: Allow voting from crashed initial tablet copies .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/7961/3/src/kudu/master/sys_catalog.cc File src/kudu/master/sys_catalog.cc: Line 243: boost::none, mind adding a /* name= */ comment here so it's clearer the argument? the rest are relatively clear by their names but this one isn't. http://gerrit.cloudera.org:8080/#/c/7961/3/src/kudu/tablet/tablet-harness.h File src/kudu/tablet/tablet-harness.h: Line 103:boost::none, same http://gerrit.cloudera.org:8080/#/c/7961/3/src/kudu/tserver/tablet_copy_client.cc File src/kudu/tserver/tablet_copy_client.cc: PS3, Line 293: if (last_logged_opid) { : *superblock_->mutable_tombstone_last_logged_opid() = *last_logged_opid; : } : : I don't think it functionally matters, but I think this probably should go down below the check below, and maybe add a comment? -- To view, visit http://gerrit.cloudera.org:8080/7961 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idbc9cecac701c24ebe05759adbbb1fbe2b140506 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Add tablet state summary metrics
Adar Dembo has posted comments on this change. Change subject: Add tablet state summary metrics .. Patch Set 1: (5 comments) Has anything materially changed since the split? http://gerrit.cloudera.org:8080/#/c/7980/1/src/kudu/integration-tests/tablet_copy-itest.cc File src/kudu/integration-tests/tablet_copy-itest.cc: PS1, Line 79: METRIC_DECLARE_gauge_int32(tablets_num_running); : METRIC_DECLARE_gauge_int32(tablets_num_bootstrapping); Nit: move this down to where the other metrics are declared? PS1, Line 1545: SleepFor(MonoDelta::FromMilliseconds(15)); : ASSERT_EQ(1, CountRunningTablets(cluster_->tablet_server(follower_index))); Can you use ASSERT_EVENTUALLY rather than sleeping? Below too. http://gerrit.cloudera.org:8080/#/c/7980/1/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: PS1, Line 1259: // pass : break; Got some tabs in here. PS1, Line 1267: std::lock_guard lock(lock_); : UpdateTabletStateSummary(); : return tablet_state_summary_.num_not_initialized; Since this pattern is repeated a lot, perhaps you could change UpdateTabletStateSummary to accommodate it: 1. Have it acquire/release the lock, and 2. Have it return the appropriate counter value (specified via argument). http://gerrit.cloudera.org:8080/#/c/7980/1/src/kudu/tserver/ts_tablet_manager.h File src/kudu/tserver/ts_tablet_manager.h: Line 308: void UpdateTabletStateSummary(); Nit: convention is to add the "Unlocked" suffix to functions that must be called with a lock held. -- To view, visit http://gerrit.cloudera.org:8080/7980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9fda2061d341586f0aa343787af59984a627080a Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1755 Part 2: Improve tablet on-disk size metric
Alexey Serbin has posted comments on this change. Change subject: KUDU-1755 Part 2: Improve tablet on-disk size metric .. Patch Set 13: (2 comments) http://gerrit.cloudera.org:8080/#/c/6968/13/src/kudu/consensus/consensus_meta.cc File src/kudu/consensus/consensus_meta.cc: Line 305: flush_count_for_tests_(0) { consider initializing on_disk_size_ with 0 as well. http://gerrit.cloudera.org:8080/#/c/6968/13/src/kudu/consensus/consensus_meta.h File src/kudu/consensus/consensus_meta.h: PS13, Line 244: int64 What is the reason to have this int64 instead of uint64_t which corresponds for the call-site of fs_manager_->env()->GetFileSize(path, _disk_size) ? If there is anything particular, could you add a comment on that, please? -- To view, visit http://gerrit.cloudera.org:8080/6968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] Add tablet state summary metrics and fix KUDU-2044
Will Berkeley has abandoned this change. Change subject: Add tablet state summary metrics and fix KUDU-2044 .. Abandoned Split into two patches. Tablet state metrics: http://gerrit.cloudera.org:8080/7980 KUDU-2044: http://gerrit.cloudera.org:8080/7981 -- To view, visit http://gerrit.cloudera.org:8080/7618 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: I8c82987ffe4f37e8af95972bde97841e44c521d9 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2044 Tombstoned tablets show up in /metrics
Will Berkeley has uploaded a new change for review. http://gerrit.cloudera.org:8080/7981 Change subject: KUDU-2044 Tombstoned tablets show up in /metrics .. KUDU-2044 Tombstoned tablets show up in /metrics This patch stops tombstoned tablets from showing up in /metrics. It marks the tablet metric entity as unpublished, which causes all of its metrics to be retired and causes the metric registry to de-register it. If a new copy of the tablet is placed on the server, the new tablet's metric entity will replace the old one, if the old one's hasn't been deregistered yet (steps toward deregistration occur only when the entity publishes metrics e.g. on a call to /metrics). The tombstoned's tablets metric entity is not destroyed when it's deregistered, since there are still refs to it and its metrics in the TabletReplica class, Tablet class, and many other classes associated with a TabletReplica. The entity will be destroyed when the TabletReplica is deleted (since it either contains or holds final references to all the other classes), which happens if the tablet is replaced or deleted. While it's not ideal to hold the memory for the entity when it's no longer used, the reason this occurs is because the TabletReplica instance mostly stays alive. Releasing all the metric references would require specifically dropping refs to those metrics in all the surviving subcomponents of a TabletReplica instance that has been shut down; I think this problem would be better solved by more completely cleaning up a shut down TabletReplica instance, but that's a much larger scope than suppressing the metrics. Change-Id: I21b52262203410ded1e514502b59a79be12f8fb3 --- M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/tablet/tablet.cc M src/kudu/tserver/tablet_server-test.cc M src/kudu/util/metrics.cc M src/kudu/util/metrics.h 5 files changed, 145 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/81/7981/1 -- To view, visit http://gerrit.cloudera.org:8080/7981 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I21b52262203410ded1e514502b59a79be12f8fb3 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley
[kudu-CR] Add tablet state summary metrics
Will Berkeley has uploaded a new change for review. http://gerrit.cloudera.org:8080/7980 Change subject: Add tablet state summary metrics .. Add tablet state summary metrics This patch adds metrics for the number of tablets in each state (i.e. the number of tablets for each value of TabletStatePB). To avoid contention in the tablet manager, the frequency with which the state numbers are computed is limited. Requests that come too soon for an update will receive cached counts. The maximum frequency of updates is controlled by a new flag. Change-Id: I9fda2061d341586f0aa343787af59984a627080a --- M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/util/metrics.cc M src/kudu/util/metrics.h 5 files changed, 293 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/80/7980/1 -- To view, visit http://gerrit.cloudera.org:8080/7980 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I9fda2061d341586f0aa343787af59984a627080a Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley
[kudu-CR] [tests] fix flakes in delete table-itest
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7972 to look at the new patch set (#3). Change subject: [tests] fix flakes in delete_table-itest .. [tests] fix flakes in delete_table-itest Fixed flakes in DeleteTableITest.TestNoDeleteTombstonedTablets: * If a leader election happened in the middle of the AddServer/RemoveServer sequence, the test failed with an error message like the following: src/kudu/integration-tests/delete_table-itest.cc:1217: Failure Failed Bad status: Illegal state: Replica 37783b00d5d34ffe87953cb90fa60e7c\ is not leader of this config. Role: FOLLOWER. * If more than a couple of heartbeats from the evicted node were received, the test failed with an error message like the following: src/kudu/integration-tests/delete_table-itest.cc:1241: Failure Expected: (num_delete_attempts) <= (kMaxDeleteAttemptsPerEviction),\ actual: 5 vs 3 src/kudu/util/test_util.cc:283: Failure Failed Timed out waiting for assertion to pass Prior to the fix, running tests against a DEBUG build, there were about 3 failures per 1K run: http://dist-test.cloudera.org/job?job_id=aserbin.1504655654.10466 After the fix, there were no failures in multiple 1K runs: http://dist-test.cloudera.org/job?job_id=aserbin.1504670739.3127 Change-Id: I7dda40ae1054becaf25963a6d301ecaed5a926f9 --- M src/kudu/integration-tests/delete_table-itest.cc 1 file changed, 108 insertions(+), 33 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/72/7972/3 -- To view, visit http://gerrit.cloudera.org:8080/7972 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7dda40ae1054becaf25963a6d301ecaed5a926f9 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-2119. Fix failure in encoding-test
Hao Hao has posted comments on this change. Change subject: KUDU-2119. Fix failure in encoding-test .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/7967/3/src/kudu/cfile/encoding-test.cc File src/kudu/cfile/encoding-test.cc: Line 279: const uint kCount = r.Uniform(1000) + 1; Why need to '+1' here? Is it to ensure to add at least one element to the block? Can you add a comment for it? -- To view, visit http://gerrit.cloudera.org:8080/7967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic46d2a993235e560475d931c7b023eb5b4faf7c2 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] KUDU-2119. Fix failure in encoding-test
Andrew Wong has posted comments on this change. Change subject: KUDU-2119. Fix failure in encoding-test .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/7967/3/src/kudu/cfile/encoding-test.cc File src/kudu/cfile/encoding-test.cc: PS3, Line 285: kCount Is it possible for s.size() == kCount? Can it be exactly 1 byte per entry? -- To view, visit http://gerrit.cloudera.org:8080/7967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic46d2a993235e560475d931c7b023eb5b4faf7c2 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] KUDU-2119. Fix failure in encoding-test
Alexey Serbin has posted comments on this change. Change subject: KUDU-2119. Fix failure in encoding-test .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7967/2/src/kudu/cfile/encoding-test.cc File src/kudu/cfile/encoding-test.cc: PS2, Line 271: int len = local_rng.Uniform(8) What if len == 0 as a result of calling local_rng.Uniform(8)? Is that the desired behavior? -- To view, visit http://gerrit.cloudera.org:8080/7967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic46d2a993235e560475d931c7b023eb5b4faf7c2 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] KUDU-2119. Fix failure in encoding-test
Will Berkeley has posted comments on this change. Change subject: KUDU-2119. Fix failure in encoding-test .. Patch Set 3: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/7967/2/src/kudu/cfile/encoding-test.cc File src/kudu/cfile/encoding-test.cc: PS2, Line 103: slices.emplace_back(to_insert[i]); > yea, Slices are just non-owned pointers, so we have to keep the string obje :thumbsup: -- To view, visit http://gerrit.cloudera.org:8080/7967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic46d2a993235e560475d931c7b023eb5b4faf7c2 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] KUDU-2119. Fix failure in encoding-test
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7967 to look at the new patch set (#3). Change subject: KUDU-2119. Fix failure in encoding-test .. KUDU-2119. Fix failure in encoding-test In commit d1f53cc32 I introduced randomization for the format string used for the generated string data in this test. The random format string could sometimes incorporate '\0' bytes, which, in the worst case, could result in a string of length 0 or 1. This would then cause a later assertion to fail that was checking that the encoded data be at least two bytes per string. The fix switches from using a printf-style string to instead use a std::function to generate the data. The implementation of the function avoids using C strings and thus permits embedded null bytes. Change-Id: Ic46d2a993235e560475d931c7b023eb5b4faf7c2 --- M src/kudu/cfile/encoding-test.cc M src/kudu/util/random_util.cc M src/kudu/util/random_util.h 3 files changed, 42 insertions(+), 31 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/7967/3 -- To view, visit http://gerrit.cloudera.org:8080/7967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic46d2a993235e560475d931c7b023eb5b4faf7c2 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2119. Fix failure in encoding-test
Todd Lipcon has posted comments on this change. Change subject: KUDU-2119. Fix failure in encoding-test .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/7967/2/src/kudu/cfile/encoding-test.cc File src/kudu/cfile/encoding-test.cc: PS2, Line 101: uint > Nit: GSG says use int for loop variable. Done PS2, Line 103: slices.emplace_back(to_insert[i]); > Why not slices.emplace_back(formatter(i))? Does to_insert have a purpose? yea, Slices are just non-owned pointers, so we have to keep the string objects alive somewhere or else the underlying memory gets freed. -- To view, visit http://gerrit.cloudera.org:8080/7967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic46d2a993235e560475d931c7b023eb5b4faf7c2 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] KUDU-2119. Fix failure in encoding-test
Will Berkeley has posted comments on this change. Change subject: KUDU-2119. Fix failure in encoding-test .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/7967/2/src/kudu/cfile/encoding-test.cc File src/kudu/cfile/encoding-test.cc: PS2, Line 101: uint Nit: GSG says use int for loop variable. PS2, Line 103: slices.emplace_back(to_insert[i]); Why not slices.emplace_back(formatter(i))? Does to_insert have a purpose? -- To view, visit http://gerrit.cloudera.org:8080/7967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic46d2a993235e560475d931c7b023eb5b4faf7c2 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] tablet copy: Allow voting from crashed initial tablet copies
Mike Percy has posted comments on this change. Change subject: tablet copy: Allow voting from crashed initial tablet copies .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/7961/3/src/kudu/integration-tests/tablet_copy-itest.cc File src/kudu/integration-tests/tablet_copy-itest.cc: PS3, Line 1155: ASSERT_EVENTUALLY > By default, this times out in 30 seconds (which is exactly kTimeout for thi The AddServer() was what was causing the stability issue, so I think it's fine. PS3, Line 1161: ASSERT_OK > What it itest::AddServer() returns other than IllegalState, not-the-leader I don't think so, since this operation should effectively be idempotent. PS3, Line 1187: LOG(INFO) << "Restarting tablet servers. New replica TS UUID: " << new_replica_uuid : << ", tablet data state: " : << tablet::TabletDataState_Name(superblock.tablet_data_state()) : << ", last-logged opid: " << superblock.tombstone_last_logged_opid(); > nit: would it make sense to move this after ASSERT_OPID_EQ() and before ASS I don't think so, because logging before a failure would make that failure easier to debug. http://gerrit.cloudera.org:8080/#/c/7961/3/src/kudu/tserver/tablet_copy_client.cc File src/kudu/tserver/tablet_copy_client.cc: PS3, Line 401: RETURN_NOT_OK(meta_->ReplaceSuperBlock(*superblock_)); > If not clearing the tombstone_last_logged_opid field in the superblock, wou Hmm. Let me make sure we have coverage for this case by adding a test. -- To view, visit http://gerrit.cloudera.org:8080/7961 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idbc9cecac701c24ebe05759adbbb1fbe2b140506 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-2119. Fix failure in encoding-test
Todd Lipcon has posted comments on this change. Change subject: KUDU-2119. Fix failure in encoding-test .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/7967/1//COMMIT_MSG Commit Message: Line 16: The fix just replaces any '\0' characters in the random format string. > Does this also fix http://dist-test.cloudera.org:8080/diagnose?key=a32eeefe it does now (ensured that it now always adds at least one element to the block) http://gerrit.cloudera.org:8080/#/c/7967/1/src/kudu/cfile/encoding-test.cc File src/kudu/cfile/encoding-test.cc: PS1, Line 255: or '\0', but > Will brought this up when we were discussing this earlier and thought I'd b restructured this test substantially so now it tests random-length binary sequences including nulls PS1, Line 277: LOG(INFO) << "format string: " << format_string; > nit: is it important to have this information to be output during the test removed -- To view, visit http://gerrit.cloudera.org:8080/7967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic46d2a993235e560475d931c7b023eb5b4faf7c2 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR](branch-1.5.x) KUDU-2130 (part 2): more fixes for ITClientStress
Alexey Serbin has posted comments on this change. Change subject: KUDU-2130 (part 2): more fixes for ITClientStress .. Patch Set 1: Code-Review+2 Perhaps, you want to get +2 from JD as well? -- To view, visit http://gerrit.cloudera.org:8080/7979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic3f518513931c660fd93b4272d1b1fceb268f191 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-1.5.x Gerrit-Owner: Dan BurkertGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR](branch-1.5.x) KUDU-2130: java client: handle termination during negotiation edge case
Alexey Serbin has posted comments on this change. Change subject: KUDU-2130: java client: handle termination during negotiation edge case .. Patch Set 1: Code-Review+2 Perhaps, you want to get +2 from JD as well? -- To view, visit http://gerrit.cloudera.org:8080/7978 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3e9d4a6535ae82973743e4ac1071de0aac92b08b Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-1.5.x Gerrit-Owner: Dan BurkertGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] KUDU-2119. Fix failure in encoding-test
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7967 to look at the new patch set (#2). Change subject: KUDU-2119. Fix failure in encoding-test .. KUDU-2119. Fix failure in encoding-test In commit d1f53cc32 I introduced randomization for the format string used for the generated string data in this test. The random format string could sometimes incorporate '\0' bytes, which, in the worst case, could result in a string of length 0 or 1. This would then cause a later assertion to fail that was checking that the encoded data be at least two bytes per string. The fix switches from using a printf-style string to instead use a std::function to generate the data. The implementation of the function avoids using C strings and thus permits embedded null bytes. Change-Id: Ic46d2a993235e560475d931c7b023eb5b4faf7c2 --- M src/kudu/cfile/encoding-test.cc M src/kudu/util/random_util.cc M src/kudu/util/random_util.h 3 files changed, 41 insertions(+), 30 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/7967/2 -- To view, visit http://gerrit.cloudera.org:8080/7967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic46d2a993235e560475d931c7b023eb5b4faf7c2 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley
[kudu-CR](branch-1.5.x) KUDU-2130: java client: handle termination during negotiation edge case
Dan Burkert has uploaded a new change for review. http://gerrit.cloudera.org:8080/7978 Change subject: KUDU-2130: java client: handle termination during negotiation edge case .. KUDU-2130: java client: handle termination during negotiation edge case There was an edge case where a Connection can be terminated while negotiation is completing. This would result in a scary looking log message: 18:24:07.776 [DEBUG - New I/O worker #8112] (Connection.java:649) [peer master-127.32.133.1:64032] cleaning up while in state NEGOTIATING due to: connection disconnected 18:24:07.781 [ERROR - New I/O worker #8112] (Connection.java:418) [peer master-127.32.133.1:64032] unexpected exception from downstream on [id: 0xdd52bacc, /127.0.0.1:55318 :> /127.32.133.1:64032] java.lang.NullPointerException at org.apache.kudu.client.Connection.messageReceived(Connection.java:271) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.apache.kudu.client.Connection.handleUpstream(Connection.java:236) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) but in reality the error message is harmless; it just indicates that the connection has been terminated while the connection's messageReceived handler is clearing the message queue. This interruption is possible because of 82a8e9f99, which fixed a deadlock in Connection. This commit recognizes when this race has occured, and early exits from messageReceived. Change-Id: I3e9d4a6535ae82973743e4ac1071de0aac92b08b Reviewed-on: http://gerrit.cloudera.org:8080/7960 Reviewed-by: Alexey SerbinTested-by: Kudu Jenkins (cherry picked from commit f0aa3b3f194146760597e6ab88c304c6f408073c) --- M java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java 1 file changed, 9 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/78/7978/1 -- To view, visit http://gerrit.cloudera.org:8080/7978 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I3e9d4a6535ae82973743e4ac1071de0aac92b08b Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-1.5.x Gerrit-Owner: Dan Burkert
[kudu-CR](branch-1.5.x) KUDU-2130 (part 2): more fixes for ITClientStress
Dan Burkert has uploaded a new change for review. http://gerrit.cloudera.org:8080/7979 Change subject: KUDU-2130 (part 2): more fixes for ITClientStress .. KUDU-2130 (part 2): more fixes for ITClientStress This fixes some more race conditions in connection termination in the same vein as part 1. It also filters benign SSLException from being returned back to callers. Change-Id: Ic3f518513931c660fd93b4272d1b1fceb268f191 Reviewed-on: http://gerrit.cloudera.org:8080/7964 Reviewed-by: Alexey SerbinTested-by: Kudu Jenkins (cherry picked from commit f41a5c2b3afc8b4703c8047b347490b24c19) --- M java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java M java/kudu-client/src/test/java/org/apache/kudu/client/ITClientStress.java 2 files changed, 22 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/79/7979/1 -- To view, visit http://gerrit.cloudera.org:8080/7979 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ic3f518513931c660fd93b4272d1b1fceb268f191 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-1.5.x Gerrit-Owner: Dan Burkert