[kudu-CR] [docs] Fix code block formatting
lingbi...@gmail.com has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14320 Change subject: [docs] Fix code block formatting .. [docs] Fix code block formatting Change-Id: Ibc54ba74ee58e1c601bf7e60a3a51c6d48659345 --- M docs/design-docs/tablet.md 1 file changed, 8 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/20/14320/1 -- To view, visit http://gerrit.cloudera.org:8080/14320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ibc54ba74ee58e1c601bf7e60a3a51c6d48659345 Gerrit-Change-Number: 14320 Gerrit-PatchSet: 1 Gerrit-Owner: Anonymous Coward
[kudu-CR] [clock] add a built-in NTP client implementation
Alexey Serbin has uploaded a new patch set (#21) to the change originally created by Todd Lipcon. ( http://gerrit.cloudera.org:8080/7477 ) Change subject: [clock] add a built-in NTP client implementation .. [clock] add a built-in NTP client implementation This adds a stripped-down implementation of built-in NTP client without any reliance on the kernel NTP discipline. This initial implementation doesn't have all the necessary sanity checks for true time that RFC-compliant client would have and doesn't implement everything from RFC5905. However, this implementation is good enough for running with well behaved and properly configured reference NTP servers such as servers from pool.ntp.org, other public-domain NTP services, and properly configured NTP servers in local network. This is a first step on the road to eventually have robust and RFC-compliant Kudu's own NTP client that is able to detect and reject misbehaving and rogue reference NTP servers. For Kudu, this should hopefully make it easier for users to configure NTP even if they don't have root, and also can maintain better clock error than the system implementation, since we can prioritize low error bounds rather than low jitter. This patch doesn't have the knobs to control error vs jitter yet, though. This patch also contains tests to verify the newly introduced functionality in scenarios when the built-in NTP client is pointed to various combinations of properly configured NTP servers or unintentionally misconfigured/unavailable ones. Change-Id: Ieb9eee9f0334236f39617492bd6f01304d1a0255 --- M src/kudu/clock/CMakeLists.txt A src/kudu/clock/builtin_ntp.cc A src/kudu/clock/builtin_ntp.h M src/kudu/clock/hybrid_clock.cc A src/kudu/clock/ntp-test.cc 5 files changed, 1,662 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/77/7477/21 -- To view, visit http://gerrit.cloudera.org:8080/7477 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ieb9eee9f0334236f39617492bd6f01304d1a0255 Gerrit-Change-Number: 7477 Gerrit-PatchSet: 21 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Todd Lipcon
[kudu-CR] add a tool to create table
YangSong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14306 ) Change subject: add a tool to create table .. Patch Set 5: (2 comments) The test code can define a new function to avoid repeating the same code over and over. http://gerrit.cloudera.org:8080/#/c/14306/5/src/kudu/tools/tool_action_table.cc File src/kudu/tools/tool_action_table.cc: http://gerrit.cloudera.org:8080/#/c/14306/5/src/kudu/tools/tool_action_table.cc@937 PS5, Line 937: ParseValueOfType(column.default_value(), > Wrap in RETURN_NOT_OK. I find there is a problem here. I can't use ParseValueOfType derictly, there are two problems: 1. the string should convert to list, like ["1"] or [1] 2. in the JSON object, the default value's type is always string, but if I directly convert it to list, like "[1]" as the first parameter, it only work at the type of int, if the type is string, we should convert "1" to "[\"1\"]". Here I add a piece of code: if (column.column_type() == "STRING" OR "BINARY") { default_v = "[\"" + column.default_value() + "\"]"; } else { default_v = "[" + column.default_value() + "]"; } Do you think this is right? Or better advice. ConvertToKuduPartialRow has the same problem. http://gerrit.cloudera.org:8080/#/c/14306/5/src/kudu/tools/tool_action_table.cc@970 PS5, Line 970: for (const auto& value : bound.lower_bound().bound_values()) { : values += value; : values += ","; : } > You should be able to use a function from gutil/strings/join.h here and bel I try to modify like this: string tmp = JoinKeysIterator( bound.lower_bound().bound_values().begin(), bound.lower_bound().bound_values().end(), ","); but error reported at compile time. Maybe the RepeatedPtrField iterator is not support. -- To view, visit http://gerrit.cloudera.org:8080/14306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0bce4733a504f8ef5f024089a16fe3c6f1e493f1 Gerrit-Change-Number: 14306 Gerrit-PatchSet: 5 Gerrit-Owner: YangSong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: YangSong Gerrit-Comment-Date: Sun, 29 Sep 2019 10:00:20 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2069 p5: recheck tablet health when exiting maintenance mode
Hello Alexey Serbin, Kudu Jenkins, Adar Dembo, Hao Hao, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14223 to look at the new patch set (#6). Change subject: KUDU-2069 p5: recheck tablet health when exiting maintenance mode .. KUDU-2069 p5: recheck tablet health when exiting maintenance mode Previously, when exiting maintenance mode for a given tserver, if the replicas of that tserver were unhealthy, there was no mechanism with which to guarantee that the proper re-replication would happen. Specifically, the following sequence of events was possible: 1. tablet T has replicas on tservers A, B*, C 2. A enters maintenance mode 3. A is shut down 4. enough time passes for B* to consider A as failed 5. B* notices the failure of A and reports to the master that replica A has failed 6. the master does nothing to schedule re-replication because A is in maintenance mode 7. A exits maintenance mode, but is not brought back online 8. B* never hears back from A, and never hits a health state change to report to the master, and so the master never "rechecks" the health of T 9. T is left under-replicated with only B* and C Note: The set of tservers we need to recheck is the set that hosted a leader of any replica on A. This patch addresses this by requesting a full tablet report from every tserver in the cluster upon exiting maintenance mode on any tserver. Testing: - this adds to the existing integration test for maintenance mode to exercise the new behavior - amends an existing concurrency test to verify the correct locking behavior is used Change-Id: Ic0ab3d78cbc5b1228c01592a00118f11f76e43dd --- M src/kudu/integration-tests/maintenance_mode-itest.cc M src/kudu/master/master_service.cc M src/kudu/master/ts_descriptor.cc M src/kudu/master/ts_descriptor.h M src/kudu/master/ts_manager.cc M src/kudu/master/ts_manager.h M src/kudu/master/ts_state-test.cc 7 files changed, 74 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/14223/6 -- To view, visit http://gerrit.cloudera.org:8080/14223 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic0ab3d78cbc5b1228c01592a00118f11f76e43dd Gerrit-Change-Number: 14223 Gerrit-PatchSet: 6 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-2069 p4: stop replication from failed servers in maintenance mode
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14222 ) Change subject: KUDU-2069 p4: stop replication from failed servers in maintenance mode .. Patch Set 6: (7 comments) http://gerrit.cloudera.org:8080/#/c/14222/5/src/kudu/consensus/quorum_util-test.cc File src/kudu/consensus/quorum_util-test.cc: http://gerrit.cloudera.org:8080/#/c/14222/5/src/kudu/consensus/quorum_util-test.cc@645 PS5, Line 645: // Even if the replica to replace is meant to be ignored on failure, we : // should honor the replacement and try to add a replica. : for (char health : { '+', '-', '?' }) { : RaftConfigPB config; : AddPeer(, "A", V, health, {{"REPLACE", true}}); : AddPeer(, "B", V, '+'); : AddPeer(, "C", V, '+'); : EXPECT_TRUE(ShouldAddReplica(config, 3, { "A" })); : } : { : RaftConfigPB config; : AddPeer(, "A", V, '+', {{"REPLACE", true}}); : AddPeer(, "B", V, '-'); : AddPeer(, "C", V, '+'); : // Ignoring failures shouldn't impede our ability to add a replica when the : > nit: maybe, put this under for () cycle to iterate over the health status o Done http://gerrit.cloudera.org:8080/#/c/14222/5/src/kudu/consensus/quorum_util-test.cc@675 PS5, Line 675: p > Does it make sense to add coverage for other health statuses as well (unkno Done http://gerrit.cloudera.org:8080/#/c/14222/5/src/kudu/consensus/quorum_util-test.cc@684 PS5, Line 684: EXPECT_FALSE(ShouldAddReplica(config, 5, { "A", "B" })); : // But when there is a failure that isn't suppose > ditto Done http://gerrit.cloudera.org:8080/#/c/14222/5/src/kudu/consensus/quorum_util.cc File src/kudu/consensus/quorum_util.cc: http://gerrit.cloudera.org:8080/#/c/14222/5/src/kudu/consensus/quorum_util.cc@455 PS5, Line 455: if (VLOG_IS_ON(1) && ignore_failure_for_underreplication) { : VLOG(1) << Substitute("ignoring $0 if failed", peer_uuid); : } > Does it deserve VLOG(1) it's not important at all? Done http://gerrit.cloudera.org:8080/#/c/14222/5/src/kudu/integration-tests/maintenance_mode-itest.cc File src/kudu/integration-tests/maintenance_mode-itest.cc: http://gerrit.cloudera.org:8080/#/c/14222/5/src/kudu/integration-tests/maintenance_mode-itest.cc@269 PS5, Line 269: ASSERT_OK(ChangeTServerState(maintenance_uuid, TServerStateChangePB::EXIT_MAINTENANCE_MODE)); > Thank you for adding this scenario. Ack http://gerrit.cloudera.org:8080/#/c/14222/5/src/kudu/integration-tests/maintenance_mode-itest.cc@379 PS5, Line 379: void SetUp() override { : SKIP_IF_SLOW_NOT_ALLOWED(); : NO_FATALS(MaintenanceModeITest::SetUp()); : NO_FATALS(SetUpCluster(5)); > Could it happen that replicas at the server fell behind WAL GC threshold si Good point. I've separated the concern by disabling log GC and testing for FAILED_UNRECOVERABLE sepcifically using disk failures. http://gerrit.cloudera.org:8080/#/c/14222/5/src/kudu/integration-tests/maintenance_mode-itest.cc@385 PS5, Line 385: SKIP_IF_SLOW_NOT_ALLOWED(); > Does it make sense to verify the number of written rows after stopping the Done -- To view, visit http://gerrit.cloudera.org:8080/14222 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9a63b55011d16900c0d27eac0eb75880074204db Gerrit-Change-Number: 14222 Gerrit-PatchSet: 6 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Sun, 29 Sep 2019 06:18:15 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2069 p4: stop replication from failed servers in maintenance mode
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Adar Dembo, Hao Hao, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14222 to look at the new patch set (#6). Change subject: KUDU-2069 p4: stop replication from failed servers in maintenance mode .. KUDU-2069 p4: stop replication from failed servers in maintenance mode When determining whether a replica needs to be added, we may now consider a set of of UUIDs that are allowed to be in a "bad" state while not counting towards being under-replicated. Since the goal of maintenance mode is to cope with unexpected failures, "healthy" movement, e.g. through tooling that uses REPLACE and PROMOTE tagging, is still allowed to and from tservers in maintenance mode. Testing: - a unit test is added to exercise the new quorum logic to ignore certain failed UUIDs, taking into account REPLACE and PROMOTE replicas - integration tests are added to test: - behavior with RF=3 through restarts of the master - behavior when running move_replica tooling - behavior with RF=5 with background failures Change-Id: I9a63b55011d16900c0d27eac0eb75880074204db --- M src/kudu/consensus/quorum_util-test.cc M src/kudu/consensus/quorum_util.cc M src/kudu/consensus/quorum_util.h M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/maintenance_mode-itest.cc M src/kudu/master/catalog_manager.cc M src/kudu/master/ts_manager.cc M src/kudu/master/ts_manager.h 8 files changed, 657 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/22/14222/6 -- To view, visit http://gerrit.cloudera.org:8080/14222 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9a63b55011d16900c0d27eac0eb75880074204db Gerrit-Change-Number: 14222 Gerrit-PatchSet: 6 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241)