[kudu-CR] consensus: use periodic timers for failure detection
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7735 to look at the new patch set (#2). Change subject: consensus: use periodic timers for failure detection .. consensus: use periodic timers for failure detection This patch replaces the existing failure detection (FD) with a new approach built using periodic timers. The existing approach had a major drawback: each failure monitor required a dedicated thread, and there was a monitor for each replica. The new approach "schedules" a failure into the future using the server's reactor thread pool, "resetting" it when leader activity is detected. There's an inherent semantic mismatch between dedicated threads that periodically wake to check for failures and this new approach; I tried to provide similar semantics as best I could. Things worth noting: - Most importantly: some FD periods are now shorter. This is because the existing implementation "double counted" failure periods when adding backoff (once in LeaderElectionExpBackoffDeltaUnlocked, and once by virtue of the failure period comparison made by the failure monitor). This seemed accidental to me, so I didn't bother preserving that behavior. - It's tough to "expire" an FD using timers. Luckily, this only happens in RaftConsensus::Start, so by making PeriodicTimer::Start accept an optional delta, we can begin FD with an early delta that reflects the desired "detect a failure immediately but not too quickly" semantic, similar to how the dedicated failure monitor thread operates. - ReportFailureDetected is now run on a shared reactor thread rather than a dedicated failure monitor thread. Since StartElection performs IO, I thunked it onto the Raft thread pool. - Timer operations cannot fail, so I removed the return values from the various FD-related functions. - I also consolidated the two SnoozeFailureDetector variants; I found that this made it easier to look at all the call-sites. Change-Id: I8acdb44e12b975fda4a226aa784db95bc7b4e330 --- M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h 2 files changed, 115 insertions(+), 138 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/35/7735/2 -- To view, visit http://gerrit.cloudera.org:8080/7735 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8acdb44e12b975fda4a226aa784db95bc7b4e330 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] rpc: periodic timers
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7733 to look at the new patch set (#2). Change subject: rpc: periodic timers .. rpc: periodic timers This patch introduces a generic periodic timer class. How does it work? 1. A timer is constructed with a user-provided task functor. 2. After Start() is called, the timer will run the functor repeatedly on a fixed (and optionally, jittered) period. 3. When Reset() is called, it'll reset the period. 4. After Stop() is called, it won't run the functor anymore. The implementation is based on Messenger::ScheduleOnReactor but it could just as easily build on libev directly. I chose to use the Messenger so that I wouldn't have to implement timer thread pooling. It's also just a generic version of the Raft heartbeat logic found in Peer (see commit 1070e76). In follow-on patches I will use this class to replace the bespoke Peer logic as well as for Raft failure detection. If you're curious, it's actually because of failure detection that both Start() and Reset() optionally accept deltas to override the default period; the snoozing code likes to provide heavily customized backoff. Change-Id: I23f80631a5591b1ac023974f3d7d9eb1576cdb86 --- M src/kudu/rpc/CMakeLists.txt A src/kudu/rpc/periodic-test.cc A src/kudu/rpc/periodic.cc A src/kudu/rpc/periodic.h 4 files changed, 459 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/7733/2 -- To view, visit http://gerrit.cloudera.org:8080/7733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I23f80631a5591b1ac023974f3d7d9eb1576cdb86 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1489: move tablet metadata
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7617 to look at the new patch set (#4). Change subject: KUDU-1489: move tablet metadata .. KUDU-1489: move tablet metadata Currently tablet metadata are placed in the first specified data root. This patch allows users to specify that metadata be spread across all data roots. To accomplish this, servers will scan meta and cmeta directories across all roots and note where tablets currently are. When the meta or cmeta directory is needed, if a more appropriate directory exists, it is provided as an alternate. With striping, an appropriate directory for a tablet's metadata is any directory in the tablet's directory group. Changes to the tablet lifecycle with striping: - During DataDirManager::Open(), scan through all the directories and map all the tablets' current meta/cmeta directories. - During calls to ListTabletIds(), scan through all directories and update the tablets' current directories. - If there are duplicates (not an expected case, but may happen in the case of an unexpected shutdown), only one is kept. - When writing the metas (e.g. ReplaceSuperBlockUnlocked()), check to see whether the metadata dir is appropriate, if not, write it to a different dir. - The DataDirManager is now in charge of creating the metadata directories. Testing is done in fs_manager-test to ensure that tablet metadata gets striped and removed as needed. Change-Id: Ia5308f8d4b4b738f353f511e2f1a2634b675a60a --- M src/kudu/consensus/consensus_meta-test.cc M src/kudu/consensus/consensus_meta.cc M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h M src/kudu/fs/fs_manager-test.cc M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/integration-tests/external_mini_cluster_fs_inspector.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h 10 files changed, 643 insertions(+), 162 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/7617/4 -- To view, visit http://gerrit.cloudera.org:8080/7617 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia5308f8d4b4b738f353f511e2f1a2634b675a60a Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] consensus: use periodic timers for failure detection
Adar Dembo has posted comments on this change. Change subject: consensus: use periodic timers for failure detection .. Patch Set 1: I looped raft_consensus-itest in slow mode 1000 times both with and without this patch. There were three failures with the patch and two without. The failures were similar so I don't think they were introduced (or exacerbated) by this patch series. -- To view, visit http://gerrit.cloudera.org:8080/7735 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8acdb44e12b975fda4a226aa784db95bc7b4e330 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] consensus: use periodic timers for failure detection
Adar Dembo has posted comments on this change. Change subject: consensus: use periodic timers for failure detection .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/7735/1/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: Line 2357: failure_detector_->Start(initial_delta); > warning: parameter 'initial_delta' is passed by value and only copied once; Done http://gerrit.cloudera.org:8080/#/c/7735/1/src/kudu/consensus/raft_consensus.h File src/kudu/consensus/raft_consensus.h: Line 523: void EnsureFailureDetectorEnabled( > warning: function 'kudu::consensus::RaftConsensus::EnsureFailureDetectorEna Done -- To view, visit http://gerrit.cloudera.org:8080/7735 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8acdb44e12b975fda4a226aa784db95bc7b4e330 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] rpc: periodic timers
Adar Dembo has posted comments on this change. Change subject: rpc: periodic timers .. Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/7733/1/src/kudu/rpc/periodic-test.cc File src/kudu/rpc/periodic-test.cc: Line 70: SleepFor(MonoDelta::FromMilliseconds(period_ms_ * 2)); > warning: either cast from 'int' to 'int64_t' (aka 'long') is ineffective, o Done Line 76: SleepFor(MonoDelta::FromMilliseconds(period_ms_ * 2)); > warning: either cast from 'int' to 'int64_t' (aka 'long') is ineffective, o Done Line 96: if (now - start_time > MonoDelta::FromMilliseconds(period_ms_ * 5)) { > warning: either cast from 'int' to 'int64_t' (aka 'long') is ineffective, o Done Line 107: timer_->Reset(MonoDelta::FromMilliseconds(period_ms_ * 5)); > warning: either cast from 'int' to 'int64_t' (aka 'long') is ineffective, o Done Line 120: timer_->Start(MonoDelta::FromMilliseconds(period_ms_ * 5)); > warning: either cast from 'int' to 'int64_t' (aka 'long') is ineffective, o Done http://gerrit.cloudera.org:8080/#/c/7733/1/src/kudu/rpc/periodic.cc File src/kudu/rpc/periodic.cc: Line 76: Reset(next_task_delta); > warning: parameter 'next_task_delta' is passed by value and only copied onc Done -- To view, visit http://gerrit.cloudera.org:8080/7733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I23f80631a5591b1ac023974f3d7d9eb1576cdb86 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] consensus: use periodic timers for failure detection
Adar Dembo has posted comments on this change. Change subject: consensus: use periodic timers for failure detection .. Patch Set 1: It's interesting to compare this approach with the one I originally went with (consolidating the failure monitor threads): https://gerrit.cloudera.org/#/c/7330. -- To view, visit http://gerrit.cloudera.org:8080/7735 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8acdb44e12b975fda4a226aa784db95bc7b4e330 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] rpc: periodic timers
Hello David Ribeiro Alves, Mike Percy, Todd Lipcon, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/7733 to review the following change. Change subject: rpc: periodic timers .. rpc: periodic timers This patch introduces a generic periodic timer class. How does it work? 1. A timer is constructed with a user-provided task functor. 2. After Start() is called, the timer will run the functor repeatedly on a fixed (and optionally, jittered) period. 3. When Reset() is called, it'll reset the period. 4. After Stop() is called, it won't run the functor anymore. The implementation is based on Messenger::ScheduleOnReactor but it could just as easily build on libev directly. I chose to use the Messenger so that I wouldn't have to implement timer thread pooling. It's also just a generic version of the Raft heartbeat logic found in Peer (see commit 1070e76). In follow-on patches I will use this class to replace the bespoke Peer logic as well as for Raft failure detection. If you're curious, it's actually because of failure detection that both Start() and Reset() optionally accept deltas to override the default period; the snoozing code likes to provide heavily customized backoff. Change-Id: I23f80631a5591b1ac023974f3d7d9eb1576cdb86 --- M src/kudu/rpc/CMakeLists.txt A src/kudu/rpc/periodic-test.cc A src/kudu/rpc/periodic.cc A src/kudu/rpc/periodic.h 4 files changed, 459 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/7733/1 -- To view, visit http://gerrit.cloudera.org:8080/7733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I23f80631a5591b1ac023974f3d7d9eb1576cdb86 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] consensus peers: replace bespoke Raft heartbeat logic with periodic timers
Hello David Ribeiro Alves, Mike Percy, Todd Lipcon, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/7734 to review the following change. Change subject: consensus_peers: replace bespoke Raft heartbeat logic with periodic timers .. consensus_peers: replace bespoke Raft heartbeat logic with periodic timers Building on the generic periodic timer implementation, this patch replaces the one-off Raft heartbeating code with periodic timers. There's only one semantic change, but it's an important one: the jittering range has changed from [P/2, P] to [3P/4, 5P/4]. When I wrote commit 1070e76 I was nervous about making such a change, but since it reduces overall heartbeat load, I think it makes sense to do it. Change-Id: I5f7e1761d9f36dc6a25bd8e3e7d7a3b5c402afbf --- M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_peers.h 2 files changed, 24 insertions(+), 90 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/7734/1 -- To view, visit http://gerrit.cloudera.org:8080/7734 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I5f7e1761d9f36dc6a25bd8e3e7d7a3b5c402afbf Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] consensus: use periodic timers for failure detection
Hello David Ribeiro Alves, Mike Percy, Todd Lipcon, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/7735 to review the following change. Change subject: consensus: use periodic timers for failure detection .. consensus: use periodic timers for failure detection This patch replaces the existing failure detection (FD) with a new approach built using periodic timers. The existing approach had a major drawback: each failure monitor required a dedicated thread, and there was a monitor for each replica. The new approach "schedules" a failure into the future using the server's reactor thread pool, "resetting" it when leader activity is detected. There's an inherent semantic mismatch between dedicated threads that periodically wake to check for failures and this new approach; I tried to provide similar semantics as best I could. Things worth noting: - Most importantly: some FD periods are now shorter. This is because the existing implementation "double counted" failure periods when adding backoff (once in LeaderElectionExpBackoffDeltaUnlocked, and once by virtue of the failure period comparison made by the failure monitor). This seemed accidental to me, so I didn't bother preserving that behavior. - It's tough to "expire" an FD using timers. Luckily, this only happens in RaftConsensus::Start, so by making PeriodicTimer::Start accept an optional delta, we can begin FD with an early delta that reflects the desired "detect a failure immediately but not too quickly" semantic, similar to how the dedicated failure monitor thread operates. - ReportFailureDetected is now run on a shared reactor thread rather than a dedicated failure monitor thread. Since StartElection performs IO, I thunked it onto the Raft thread pool. - Timer operations cannot fail, so I removed the return values from the various FD-related functions. - I also consolidated the two SnoozeFailureDetector variants; I found that this made it easier to look at all the call-sites. Change-Id: I8acdb44e12b975fda4a226aa784db95bc7b4e330 --- M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h 2 files changed, 116 insertions(+), 138 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/35/7735/1 -- To view, visit http://gerrit.cloudera.org:8080/7735 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I8acdb44e12b975fda4a226aa784db95bc7b4e330 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] util: remove old failure detector and resettable heartbeater code
Hello David Ribeiro Alves, Mike Percy, Todd Lipcon, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/7736 to review the following change. Change subject: util: remove old failure detector and resettable heartbeater code .. util: remove old failure detector and resettable heartbeater code With the move to periodic timers, these classes are no longer needed. Change-Id: I7a8f77a56ed66e06e182a70e18b2837a264a0a4e --- M build-support/iwyu/iwyu-filter.awk M src/kudu/util/CMakeLists.txt D src/kudu/util/failure_detector-test.cc D src/kudu/util/failure_detector.cc D src/kudu/util/failure_detector.h D src/kudu/util/resettable_heartbeater-test.cc D src/kudu/util/resettable_heartbeater.cc D src/kudu/util/resettable_heartbeater.h 8 files changed, 0 insertions(+), 874 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/7736/1 -- To view, visit http://gerrit.cloudera.org:8080/7736 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I7a8f77a56ed66e06e182a70e18b2837a264a0a4e Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [iwyu] first pass
Alexey Serbin has posted comments on this change. Change subject: [iwyu] first pass .. Patch Set 18: (3 comments) > Other than the 1 nit in sp::shared_ptr, this looks good to me Great -- thank you for reviewing this jumbo-patch. I'll update the patch to include instead of http://gerrit.cloudera.org:8080/#/c/4738/19/src/kudu/client/shared_ptr.h File src/kudu/client/shared_ptr.h: Line 55: #include // IWYU pragma: export > remove this one as well OK, will do, but then it will be necessary to include in many places instead of "kudu/client/shared_ptr.h". Sometimes, it will be the case when both and are required. But after looking at it once more, I think that would be a better option to have from the beginning. http://gerrit.cloudera.org:8080/#/c/4738/18/src/kudu/tools/ksck.h File src/kudu/tools/ksck.h: Line 387: std::ostream* out = nullptr); > I'm ok with moving the impl, but I was just curious about why you changed t Because the default value required to include , which is very heavy header. http://gerrit.cloudera.org:8080/#/c/4738/18/src/kudu/util/threadpool-test.cc File src/kudu/util/threadpool-test.cc: Line 33: #include > ahh, I do remember dealing with this some time in the way past around the t yep, that's for yield. -- To view, visit http://gerrit.cloudera.org:8080/4738 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6923091be36a1c03fac34dd73e6b17e17ac9ceaa Gerrit-PatchSet: 18 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] disk failure: add persistent disk states
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7270 to look at the new patch set (#19). Change subject: disk failure: add persistent disk states .. disk failure: add persistent disk states This patch adds a new version of the path set so failed disks are not used the next time the server is run. A previously failed disk may continue to produce IOErrors after restart, but we may still like to start the server without the disk. To accomplish this, disk states, path names, and a timestamp are added to the path set. Additionally, if any disks are failed when loading instances from disk, an 'unhealthy' instance with no path set metadata will be created instead of failing the startup process. CheckIntegrity() is updated to accomodate this. Rather than comparing all instances against an agreed-upon set of UUIDs, the single path set with the largest timestamp is used as the main one to compare against (if only old path sets are available, the first healthy one is used). If no instances are healthy, CheckIntegrity() will fail, as all disks are failed. Additionally, the notion of a unhealthy instance is added to allow startup in the presence of disk failures, e.g. in failing to canonicalize or in failing to read a path instance. The main path set is checked to ensure its integrity (e.g. no duplicates), after which it is upgraded with the extra metadata if needed. It is then checked to ensure it is consistent with the healthy instances (e.g. having the right UUIDs and paths). Testing is done in a new iteration of CheckIntegrity(). Further testing is done in data_dirs-test to ensure the directory manager can be opened with failed disks. Testing is also added to fs_manager-test to ensure the FS layout can be loaded with a failed directory. Some notes: - Disk failures during FS layout creation are not tolerated. In these cases, there is presumably no data on the server anyway, so operators should easily be able to fix their cluster. - In the case of a server restart where all healthy disks fail to start up and some known failed disks start working again, the server will successfully start up with the bad disks (may have partially-written data). - If there are any unhealthy instances when upgrading the path sets (i.e. adding disk states, paths, timestamp), a complete mapping of UUIDs to paths will not be available, and CheckIntegrity() will fail. - The main path set's disk states are updated to reflect the failure of the instances. Since data on a failed disk cannot be trusted, disks that are successfully read from but are already marked FAILED by the main path set are not marked HEALTHY. This patch is a part of a series of patches to handle disk failures. To see how this fits, see 2.6 in: https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f --- M src/kudu/fs/block_manager_util-test.cc M src/kudu/fs/block_manager_util.cc M src/kudu/fs/block_manager_util.h M src/kudu/fs/data_dirs-test.cc M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h M src/kudu/fs/fs.proto M src/kudu/fs/fs_manager-test.cc M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc 12 files changed, 1,201 insertions(+), 275 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/70/7270/19 -- To view, visit http://gerrit.cloudera.org:8080/7270 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f Gerrit-PatchSet: 19 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] disk failure: add persistent disk states
Andrew Wong has posted comments on this change. Change subject: disk failure: add persistent disk states .. Patch Set 18: (3 comments) http://gerrit.cloudera.org:8080/#/c/7270/18/src/kudu/fs/data_dirs.cc File src/kudu/fs/data_dirs.cc: Line 328: Status DataDirManager::OpenExisting(Env* env, CanonicalizedRootsList data_fs_roots, > warning: the parameter 'data_fs_roots' is copied for each invocation but on Done Line 348: Status DataDirManager::CreateNew(Env* env, CanonicalizedRootsList data_fs_roots, > warning: the parameter 'data_fs_roots' is copied for each invocation but on Done http://gerrit.cloudera.org:8080/#/c/7270/18/src/kudu/fs/data_dirs.h File src/kudu/fs/data_dirs.h: Line 22: #include > warning: #includes are not sorted properly [llvm-include-order] Done -- To view, visit http://gerrit.cloudera.org:8080/7270 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f Gerrit-PatchSet: 18 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] consensus: Improve contract for API to fetch last-logged op id
Hello Todd Lipcon, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7717 to look at the new patch set (#3). Change subject: consensus: Improve contract for API to fetch last-logged op id .. consensus: Improve contract for API to fetch last-logged op id It's important that we differentiate between when we have a known last-logged op and when we don't actually know what it is or whether we have ever appended something to the local WAL. This applies both to the TABLET_DATA_READY case, where this information is stored in the WAL, and the TABLET_DATA_TOMBSTONED case, where this information is stored in the superblock. Cases where we are unable to determine the last-logged OpId from the WAL when a replica is in TABLET_DATA_READY state: * Early in the tablet replica lifecycle (before Raft is started). * When a replica encounters an error during initialization. Cases where we are unable to determine the last-logged OpId from the TabletMetadata when a replica is in TABLET_DATA_TOMBSTONED state: * If the replica was tombstoned while in a failed state. Included in this patch are the following API improvements: 1. Delete Log::GetLatestEntryOpId(). Previously, this method would only return something other than MinimumOpId() if a log entry had been appended during the object's lifetime. It is abandoned in favor of RaftConsensus::GetLastOpId(RECEIVED_OPID) which delegates to PeerMessageQueue::GetLastOpIdInLog(). 2. Merge PeerMessageQueue::Init() into the PeerMessageQueue constructor. This allows us to remove one lifecycle state and allows us to guarantee that, once the queue is constructed, we can always get a valid last-logged opid from it (see #1). 3. Make TabletMetadata::tombstone_last_logged_opid() return a boost::optional. We need to clearly differentiate between when we know the last-logged opid and when we don't. We also consider MinimumOpId() to be equal to boost::none at superblock load time, since previous versions of Kudu may have written (0,0) into the TabletMetadata 'tombstone_last_logged_opid' field. Change-Id: Ia4e4501a61cd40fdee0dc918b77675a0bc2515e7 --- M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/consensus_peers-test.cc M src/kudu/consensus/consensus_queue-test.cc M src/kudu/consensus/consensus_queue.cc M src/kudu/consensus/consensus_queue.h M src/kudu/consensus/log.cc M src/kudu/consensus/log.h M src/kudu/consensus/mt-log-test.cc M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_replica-test.cc M src/kudu/tools/kudu-tool-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_source_session.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc 19 files changed, 242 insertions(+), 242 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/7717/3 -- To view, visit http://gerrit.cloudera.org:8080/7717 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia4e4501a61cd40fdee0dc918b77675a0bc2515e7 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1407: reassign failed tablets
Hello David Ribeiro Alves, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7440 to look at the new patch set (#14). Change subject: KUDU-1407: reassign failed tablets .. KUDU-1407: reassign failed tablets Tablets put into the state tablet::FAILED are left until they are manually deleted; they are not evicted and reassigned. If a tablet fails to bootstrap, it will sit, responding to heartbeats, doing nothing else. This patch ensures failed tablets will be reassigned. As the tablets are not used, rather than directly setting replicas to FAILED, an error is first recorded and the TabletReplica::Shutdown(), leaving the final state as FAILED. A replica can no longer leave the FAILED state (calls to Shutdown() leave it FAILED). The tserver response generated by FAILED tablets is now TABLET_FAILED. Upon receiving this, a leader will immediately evict the peer. Prior to this patch, a tablet was marked FAILED if its WAL or metadata failed to delete (after already shutting down). If this occurs, there may be an inconsistency on-disk. This has been made fatal. Testing is done in a few places: - raft_consensus-itest is updated to ensure that tablets that fail to bootstrap are evicted and replaced. - tablet_server-test is also updated to ensure that, instead of TABLET_NOT_RUNNING, TABLET_FAILED is returned by failed tablets. - a test is added to ts_tablet_manager-itest is added to test that failed tablets while running are evicted and replaced. This patch is a part of a series of patches to handle disk failure. See section 2.5 in this doc: https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345 --- M src/kudu/client/scanner-internal.cc M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_queue.cc M src/kudu/consensus/consensus_queue.h M src/kudu/integration-tests/raft_consensus-itest.cc M src/kudu/integration-tests/ts_recovery-itest.cc M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/master/catalog_manager.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/tserver.proto 14 files changed, 160 insertions(+), 57 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/40/7440/14 -- To view, visit http://gerrit.cloudera.org:8080/7440 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345 Gerrit-PatchSet: 14 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-871. Support tombstoned voting
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6960 to look at the new patch set (#9). Change subject: KUDU-871. Support tombstoned voting .. KUDU-871. Support tombstoned voting This patch makes it possible for tombstoned tablet replicas to vote in Raft elections. Changes: * Add Stop() method to TabletReplica + Consensus lifecycle. * Includes new STOPPED state. * Tombstoning a replica should call Stop(). * Deleting a replica should call Shutdown(). * Add positive and negative tests for tombstoned voting. * Add a stress test that induces lots of tombstoned voting while running TabletCopy, TabletBootstrap, and DeleteTablet. * Fix DeleteTableITest.TestMergeConsensusMetadata after tombstoned voting changed its assumption that tombstoned tablets would not vote. * Fix several tests that expected tombstoned tablets to be SHUTDOWN when now they are STOPPED. * Get rid of tablet::FAILED state, instead go to SHUTDOWN with an error. Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 --- M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/cluster_itest_util.cc M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/integration-tests/delete_table-itest.cc M src/kudu/integration-tests/delete_tablet-itest.cc M src/kudu/integration-tests/external_mini_cluster-itest-base.cc M src/kudu/integration-tests/raft_consensus-itest.cc A src/kudu/integration-tests/tombstoned_voting-itest.cc A src/kudu/integration-tests/tombstoned_voting-stress-test.cc M src/kudu/integration-tests/ts_recovery-itest.cc M src/kudu/master/catalog_manager.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_replica-test.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tools/ksck-test.cc M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/kudu-ts-cli-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/tserver/tserver-path-handlers.cc M src/kudu/util/make_shared.h 33 files changed, 1,438 insertions(+), 341 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/6960/9 -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] consensus: Improve contract for API to fetch last-logged op id
Mike Percy has posted comments on this change. Change subject: consensus: Improve contract for API to fetch last-logged op id .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7717/1/src/kudu/tserver/tablet_copy_client.cc File src/kudu/tserver/tablet_copy_client.cc: Line 134: if (last_logged_opid && last_logged_opid->term() > caller_term) { > worth a log message indicating that it is allowing replacement of a failed Done -- To view, visit http://gerrit.cloudera.org:8080/7717 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia4e4501a61cd40fdee0dc918b77675a0bc2515e7 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1489: move tablet metadata
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7617 to look at the new patch set (#3). Change subject: KUDU-1489: move tablet metadata .. KUDU-1489: move tablet metadata Currently tablet metadata are placed in the first specified data root. This patch allows users to specify that metadata be spread across all data roots. To accomplish this, servers will scan meta and cmeta directories across all roots and note where tablets currently are. When the meta or cmeta directory is needed, if a more appropriate directory exists, it is provided as an alternate. With striping, an appropriate directory for a tablet's metadata is any directory in the tablet's directory group. Changes to the tablet lifecycle with striping: - During DataDirManager::Open(), scan through all the directories and map all the tablets' current meta/cmeta directories. - During calls to ListTabletIds(), scan through all directories and update the tablets' current directories. - If there are duplicates (not an expected case, but may happen in the case of an unexpected shutdown), only one is kept. - When writing the metas (e.g. ReplaceSuperBlockUnlocked()), check to see whether the metadata dir is appropriate, if not, write it to a different dir. - The DataDirManager is now in charge of creating the metadata directories. Testing is done in fs_manager-test to ensure that tablet metadata gets striped and removed as needed. Change-Id: Ia5308f8d4b4b738f353f511e2f1a2634b675a60a --- M src/kudu/consensus/consensus_meta-test.cc M src/kudu/consensus/consensus_meta.cc M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h M src/kudu/fs/fs_manager-test.cc M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/integration-tests/external_mini_cluster_fs_inspector.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h 10 files changed, 644 insertions(+), 162 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/7617/3 -- To view, visit http://gerrit.cloudera.org:8080/7617 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia5308f8d4b4b738f353f511e2f1a2634b675a60a Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] disk failure: add persistent disk states
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7270 to look at the new patch set (#18). Change subject: disk failure: add persistent disk states .. disk failure: add persistent disk states This patch adds a new version of the path set so failed disks are not used the next time the server is run. A previously failed disk may continue to produce IOErrors after restart, but we may still like to start the server without the disk. To accomplish this, disk states, path names, and a timestamp are added to the path set. Additionally, if any disks are failed when loading instances from disk, an 'unhealthy' instance with no path set metadata will be created instead of failing the startup process. CheckIntegrity() is updated to accomodate this. Rather than comparing all instances against an agreed-upon set of UUIDs, the single path set with the largest timestamp is used as the main one to compare against (if only old path sets are available, the first healthy one is used). If no instances are healthy, CheckIntegrity() will fail, as all disks are failed. Additionally, the notion of a unhealthy instance is added to allow startup in the presence of disk failures, e.g. in failing to canonicalize or in failing to read a path instance. The main path set is checked to ensure its integrity (e.g. no duplicates), after which it is upgraded with the extra metadata if needed. It is then checked to ensure it is consistent with the healthy instances (e.g. having the right UUIDs and paths). Testing is done in a new iteration of CheckIntegrity(). Further testing is done in data_dirs-test to ensure the directory manager can be opened with failed disks. Testing is also added to fs_manager-test to ensure the FS layout can be loaded with a failed directory. Some notes: - Disk failures during FS layout creation are not tolerated. In these cases, there is presumably no data on the server anyway, so operators should easily be able to fix their cluster. - In the case of a server restart where all healthy disks fail to start up and some known failed disks start working again, the server will successfully start up with the bad disks (may have partially-written data). - If there are any unhealthy instances when upgrading the path sets (i.e. adding disk states, paths, timestamp), a complete mapping of UUIDs to paths will not be available, and CheckIntegrity() will fail. - The main path set's disk states are updated to reflect the failure of the instances. Since data on a failed disk cannot be trusted, disks that are successfully read from but are already marked FAILED by the main path set are not marked HEALTHY. This patch is a part of a series of patches to handle disk failures. To see how this fits, see 2.6 in: https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f --- M src/kudu/fs/block_manager_util-test.cc M src/kudu/fs/block_manager_util.cc M src/kudu/fs/block_manager_util.h M src/kudu/fs/data_dirs-test.cc M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h M src/kudu/fs/fs.proto M src/kudu/fs/fs_manager-test.cc M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc 12 files changed, 1,202 insertions(+), 276 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/70/7270/18 -- To view, visit http://gerrit.cloudera.org:8080/7270 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f Gerrit-PatchSet: 18 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] consensus: Tablet copy should clear last-logged opid from superblock
Todd Lipcon has posted comments on this change. Change subject: consensus: Tablet copy should clear last-logged opid from superblock .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/7718/3//COMMIT_MSG Commit Message: Line 9: Keeping around irrelevant information is bad hygiene. Mike and I discussed offline... it seems like this might not be a great idea after all as implemented here. We want to preserve the ability to vote in a case like: - a new replica is being added - tablet copy starts - either the tablet copy source or destination crashes - cluster restarts, and one of the other two replicas doesn't come back up Now we have the following replicas: 1) good state 2) crashed / not coming back up 3) failed tablet copy We want to make sure that replica #3 retains the ability to vote so it can elect #1 and repair itself. -- To view, visit http://gerrit.cloudera.org:8080/7718 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaa84d59c63222e9ddb05dca492f9ecd47b5c63ea Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [iwyu] first pass
Dan Burkert has posted comments on this change. Change subject: [iwyu] first pass .. Patch Set 19: (2 comments) http://gerrit.cloudera.org:8080/#/c/4738/19/src/kudu/client/shared_ptr.h File src/kudu/client/shared_ptr.h: Line 55: #include // IWYU pragma: export remove this one as well http://gerrit.cloudera.org:8080/#/c/4738/18/src/kudu/experiments/rwlock-perf.cc File src/kudu/experiments/rwlock-perf.cc: Line 29: #include > That's because the boost mappings put the smart_ptr/detail/spinlock.hpp hea Sounds good. -- To view, visit http://gerrit.cloudera.org:8080/4738 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6923091be36a1c03fac34dd73e6b17e17ac9ceaa Gerrit-PatchSet: 19 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-2101 Include a table summary at the bottom
Todd Lipcon has posted comments on this change. Change subject: KUDU-2101 Include a table summary at the bottom .. Patch Set 4: I think a flag for machine-parseable (eg json) output would be nice if we have a need to parse it. Otherwise what's the purpose of the no-summary flag? -- To view, visit http://gerrit.cloudera.org:8080/7707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1913352e3a1f91b4bb07e2f5001c8cc94d5155d4 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-2032 (part 2): propagate master hostnames into client
Todd Lipcon has posted comments on this change. Change subject: KUDU-2032 (part 2): propagate master hostnames into client .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/7692/5/src/kudu/client/client-internal.cc File src/kudu/client/client-internal.cc: Line 631: const pair& leader_addr_and_name, > Why not two separate params? Then you could even move the leader_hostname I actually started with that, but because the addr_and_name pairs are fed into the ConnectToCluster stuff in a vector > it seemed cleaner to keep the same pair type on its callback. -- To view, visit http://gerrit.cloudera.org:8080/7692 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie5838f22c96f757124112b505825a53740468ce1 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [java] Fix gradle wrapper with 4.1
Todd Lipcon has posted comments on this change. Change subject: [java] Fix gradle wrapper with 4.1 .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7726 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I12daa60faf3376cbb8662710bde3856472bb Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] [java] Fix gradle wrapper with 4.1
Todd Lipcon has submitted this change and it was merged. Change subject: [java] Fix gradle wrapper with 4.1 .. [java] Fix gradle wrapper with 4.1 Gradle doesn’t use trailing zeros with new minor versions which broke the wrapper logic. This fixes the version to remove the trailing zero and adjusts the logic to generate the gradle-wrapper.jar url (which does use a trailing zero). Change-Id: I12daa60faf3376cbb8662710bde3856472bb Reviewed-on: http://gerrit.cloudera.org:8080/7726 Tested-by: Kudu Jenkins Reviewed-by: Todd Lipcon--- M java/gradle/dependencies.gradle M java/gradle/wrapper.gradle M java/gradle/wrapper/gradle-wrapper.properties 3 files changed, 6 insertions(+), 4 deletions(-) Approvals: Todd Lipcon: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7726 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I12daa60faf3376cbb8662710bde3856472bb Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] disk failure: add persistent disk states
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7270 to look at the new patch set (#17). Change subject: disk failure: add persistent disk states .. disk failure: add persistent disk states This patch adds a new version of the path set so failed disks are not used the next time the server is run. A previously failed disk may continue to produce IOErrors after restart, but we may still like to start the server without the disk. To accomplish this, disk states, path names, and a timestamp are added to the path set. Additionally, if any disks are failed when loading instances from disk, an 'unhealthy' instance with no path set metadata will be created instead of failing the startup process. CheckIntegrity() is updated to accomodate this. Rather than comparing all instances against an agreed-upon set of UUIDs, the single path set with the largest timestamp is used as the main one to compare against (if only old path sets are available, the first healthy one is used). If no instances are healthy, CheckIntegrity() will fail, as all disks are failed. Additionally, the notion of a unhealthy instance is added to allow startup in the presence of disk failures, e.g. in failing to canonicalize or in failing to read a path instance. The main path set is checked to ensure its integrity (e.g. no duplicates), after which it is upgraded with the extra metadata if needed. It is then checked to ensure it is consistent with the healthy instances (e.g. having the right UUIDs and paths). Testing is done in a new iteration of CheckIntegrity(). Further testing is done in data_dirs-test to ensure the directory manager can be opened with failed disks. Testing is also added to fs_manager-test to ensure the FS layout can be loaded with a failed directory. Some notes: - Disk failures during FS layout creation are not tolerated. In these cases, there is presumably no data on the server anyway, so operators should easily be able to fix their cluster. - In the case of a server restart where all healthy disks fail to start up and some known failed disks start working again, the server will successfully start up with the bad disks (may have partially-written data). - If there are any unhealthy instances when upgrading the path sets (i.e. adding disk states, paths, timestamp), a complete mapping of UUIDs to paths will not be available, and CheckIntegrity() will fail. - The main path set's disk states are updated to reflect the failure of the instances. Since data on a failed disk cannot be trusted, disks that are successfully read from but are already marked FAILED by the main path set are not marked HEALTHY. This patch is a part of a series of patches to handle disk failures. To see how this fits, see 2.6 in: https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f --- M src/kudu/fs/block_manager_util-test.cc M src/kudu/fs/block_manager_util.cc M src/kudu/fs/block_manager_util.h M src/kudu/fs/data_dirs-test.cc M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h M src/kudu/fs/fs.proto M src/kudu/fs/fs_manager-test.cc M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc 12 files changed, 1,198 insertions(+), 272 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/70/7270/17 -- To view, visit http://gerrit.cloudera.org:8080/7270 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f Gerrit-PatchSet: 17 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] disk failure: add persistent disk states
Andrew Wong has posted comments on this change. Change subject: disk failure: add persistent disk states .. Patch Set 15: (30 comments) http://gerrit.cloudera.org:8080/#/c/7270/15//COMMIT_MSG Commit Message: Line 11: failed disk may be partially written and thus should not be used. > I dont quite follow this logic. If we crashed in the middle of writing a bl Right, so the Status.IsDiskFailure() doesn't account for "out of space" errors. In those cases, we'll still crash. There was a bit of discussion about this a couple months ago and it seems like the "out of space" behavior _should_ be different. For now, we just crash, but I don't think it'd be terribly unreasonable to drop some tablets on the out-of-space disk. In any case, yeah, I've updated this. http://gerrit.cloudera.org:8080/#/c/7270/15/src/kudu/fs/block_manager_util-test.cc File src/kudu/fs/block_manager_util-test.cc: Line 151: TEST_F(KuduTest, CheckIntegrity) { > There's a lot to process here. In total, do you get full coverage of every All paths out of CheckIntegrity(), yes. "Update these instance files" is covered in DataDirManagerTest::TestOpenWithFailedDirs. http://gerrit.cloudera.org:8080/#/c/7270/6/src/kudu/fs/block_manager_util.cc File src/kudu/fs/block_manager_util.cc: Line 88: > I don't see this change in PS15. Done. http://gerrit.cloudera.org:8080/#/c/7270/15/src/kudu/fs/block_manager_util.cc File src/kudu/fs/block_manager_util.cc: PS15, Line 54: CloneAndPrepend(msg) > CloneAndPrepend() is work; can we defer it to if !s.ok() and we want to ret Done PS15, Line 69: Status::OK() > This is the default value anyway; you can omit health_status_ from the list Done PS15, Line 121: // If the write fails, don't update the timestamp. : uint64_t original_timestamp = pb->path_set().timestamp_us(); : auto reset_timestamp = MakeScopedCleanup([&] { : pb->mutable_path_set()->set_timestamp_us(original_timestamp); : }); > This doesn't seem that important, but OK. Sure, removed. PS15, Line 137: // Some instances on failed disks (e.g. those that failed to canonicalize : // their path) are prefixed with illegal whitespaces to signify failure. > As we discussed offline, rather than modifying the path to indicate a failu Done PS15, Line 197: // Identify the instance that is the most up-to-date and check that there : // are no duplicate UUIDs among the healthy input instances. > Should add that the old invariant was that all instances have to have ident Since that and the TODO are more function-wide comments, I pushed it to the top. PS15, Line 205: max_timestamp > Nit: add the units (max_timestamp_us?). Done Line 208: const PathSetPB path_set = instances[i]->metadata()->path_set(); > Store a cref; no need to make a copy. Done Line 218: updated_instances->insert(instances[i]); > Can we modify a temporary set first, so that if we return an error we avoid Done Line 231: int path_set_uuids_size; > Isn't this always the same as main_uuids.size() after L232-244? Done Line 232: if (main_path_set->all_paths_size() == 0) { > It's rather easy to forget that this means "legacy". Perhaps store it in an Done PS15, Line 234: main_uuids.insert(main_path_set->deprecated_all_uuids().begin(), : main_path_set->deprecated_all_uuids().end()); : path_set_uuids_size = main_path_set->deprecated_all_uuids_size(); > Nit: flip the order so it's the same as L239-243. Done Line 240: DCHECK_GT(path_set_uuids_size, 0); > I think this is trivially correct (by virtue of being in this else statemen Done Line 242: main_uuids.insert(path.uuid()); > Should this InsertOrDie? Or can we expect duplicates? No, this entire if statement is meant to deduplicate the UUIDs. I've added a comment since it seems that was the source of confusion of one of the above comments. PS15, Line 265: depcreated > deprecated Done PS15, Line 267: main_path_updated > main_path_set_updated? Done PS15, Line 302: main_path_set->mutable_all_paths(i) > How about using pb_at_i here (though make it a pointer, not a cref). Done PS15, Line 325: path > path_set Done http://gerrit.cloudera.org:8080/#/c/7270/15/src/kudu/fs/data_dirs-test.cc File src/kudu/fs/data_dirs-test.cc: PS15, Line 61: // Use log block mode. This should not have much effect on the directory : // manager itself, but log blocks are expected to be more common in production. : static const char* kBlockType = "log"; > This is persisted (and compared) in PathInstanceMetadataFile, so it doesn't Done PS15, Line 94: vector GetDirNames(int num_dirs) { : vector ret; : for (int i = 0; i < num_dirs; i++) { : string dir_name = Substitute("$0-$1", kDirNamePrefix, i);
[kudu-CR] disk failure: add persistent disk states
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7270 to look at the new patch set (#16). Change subject: disk failure: add persistent disk states .. disk failure: add persistent disk states This patch adds a new version of the path set so failed disks are not used the next time the server is run. A previously failed disk may continue to produce IOErrors after restart, but we may still like to start the server without the disk. To accomplish this, disk states, path names, and a timestamp are added to the path set. Additionally, if any disks are failed when loading instances from disk, an 'unhealthy' instance with no path set metadata will be created instead of failing the startup process. CheckIntegrity() is updated to accomodate this. Rather than comparing all instances against an agreed-upon set of UUIDs, the single path set with the largest timestamp is used as the main one to compare against (if only old path sets are available, the first healthy one is used). If no instances are healthy, CheckIntegrity() will fail, as all disks are failed. Additionally, the notion of a unhealthy instance is added to allow startup in the presence of disk failures, e.g. in failing to canonicalize or in failing to read a path instance. The main path set is checked to ensure its integrity (e.g. no duplicates), after which it is upgraded with the extra metadata if needed. It is then checked to ensure it is consistent with the healthy instances (e.g. having the right UUIDs and paths). Testing is done in a new iteration of CheckIntegrity(). Further testing is done in data_dirs-test to ensure the directory manager can be opened with failed disks. Testing is also added to fs_manager-test to ensure the FS layout can be loaded with a failed directory. Some notes: - Disk failures during FS layout creation are not tolerated. In these cases, there is presumably no data on the server anyway, so operators should easily be able to fix their cluster. - In the case of a server restart where all healthy disks fail to start up and some known failed disks start working again, the server will successfully start up with the bad disks (may have partially-written data). - If there are any unhealthy instances when upgrading the path sets (i.e. adding disk states, paths, timestamp), a complete mapping of UUIDs to paths will not be available, and CheckIntegrity() will fail. - The main path set's disk states are updated to reflect the failure of the instances. Since data on a failed disk cannot be trusted, disks that are successfully read from but are already marked FAILED by the main path set are not marked HEALTHY. This patch is a part of a series of patches to handle disk failures. To see how this fits, see 2.6 in: https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f --- M src/kudu/fs/block_manager_util-test.cc M src/kudu/fs/block_manager_util.cc M src/kudu/fs/block_manager_util.h M src/kudu/fs/data_dirs-test.cc M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h M src/kudu/fs/fs.proto M src/kudu/fs/fs_manager-test.cc M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc 12 files changed, 1,200 insertions(+), 272 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/70/7270/16 -- To view, visit http://gerrit.cloudera.org:8080/7270 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f Gerrit-PatchSet: 16 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [iwyu] first pass
Alexey Serbin has posted comments on this change. Change subject: [iwyu] first pass .. Patch Set 18: (41 comments) Thank you for the thorough review! http://gerrit.cloudera.org:8080/#/c/4738/18/src/kudu/client/client-internal.cc File src/kudu/client/client-internal.cc: Line 31: #include // IWYU pragma: keep > I think function could be forward declared. Done http://gerrit.cloudera.org:8080/#/c/4738/18/src/kudu/client/client-unittest.cc File src/kudu/client/client-unittest.cc: Line 36: > extra space Done http://gerrit.cloudera.org:8080/#/c/4738/18/src/kudu/client/error-internal.cc File src/kudu/client/error-internal.cc: Line 23: #include "kudu/client/write_op.h" // IWYU pragma: keep > KuduWriteOperation could perhaps be forward declared? I thought so as well, but it failed to compile: /data/8/aserbin/Projects/kudu/src/kudu/gutil/gscoped_ptr.h:144:36: error: invalid application of 'sizeof' to an incomplete type 'kudu::client::KuduWriteOperation' enum { type_must_be_complete = sizeof(T) }; That's why I added the pragma. I think it would be OK with the unique_ptr, but gscoped_ptr has a special compile-time check in its deleter. Maybe, the newer code of gutil library has this corrected? http://gerrit.cloudera.org:8080/#/c/4738/18/src/kudu/client/shared_ptr.h File src/kudu/client/shared_ptr.h: Line 42: #include // IWYU pragma: export > I don't think this pragma is correct (or the one below): this file is only Good catch! It didn't hurt in that case -- IWYU is intelligent enough to track that down and I didn't see it suggesting this header instead of in that case, but sure -- that's not a good idea to export symbols from here. I'll remove this. http://gerrit.cloudera.org:8080/#/c/4738/18/src/kudu/codegen/jit_wrapper.cc File src/kudu/codegen/jit_wrapper.cc: Line 22: #include "kudu/codegen/jit_wrapper.h" > I think it was correct to list this one first. Yep, you are right. I mistakenly moved it down. Will fix. http://gerrit.cloudera.org:8080/#/c/4738/18/src/kudu/common/key_encoder.cc File src/kudu/common/key_encoder.cc: Line 25: #include "kudu/util/faststring.h" // IWYU pragma: keep > forward declare? Huh, it seemed to me I added that because compilation was broken if doing forward declaration -- that's why I added the pragma. But now it passes. http://gerrit.cloudera.org:8080/#/c/4738/18/src/kudu/common/schema.h File src/kudu/common/schema.h: Line 1: // or more contributor license agreements. See the NOTICE file > Typo ^ good catch -- indeed, i unintentionally removed that. it's back now. http://gerrit.cloudera.org:8080/#/c/4738/18/src/kudu/common/wire_protocol.h File src/kudu/common/wire_protocol.h: Line 29: class optional; > I'm curious - is this preferred to optional_fwd.hpp? Yes, for some reason IWYU prefers this to optional_fwd.hpp. Probably, once I'm more familiar with IWYU code I can figure out why and try to fix it. But as of now, to keep the number of pragmas as less as possible, I decided just to please IWYU with adding what it asked for. http://gerrit.cloudera.org:8080/#/c/4738/18/src/kudu/consensus/mt-log-test.cc File src/kudu/consensus/mt-log-test.cc: Line 33: > extra space Done http://gerrit.cloudera.org:8080/#/c/4738/18/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: Line 74: > extra space Done http://gerrit.cloudera.org:8080/#/c/4738/18/src/kudu/experiments/rwlock-perf.cc File src/kudu/experiments/rwlock-perf.cc: Line 29: #include > This one's suspicious - I think we've gotten rid of all boost shared_ptr us That's because the boost mappings put the smart_ptr/detail/spinlock.hpp header into the 'private' set, recommending to add the 'public' boost/smart_ptr/shared_ptr.hpp I think we could do one of the following: 1) return detail/spinlock.hpp back and add the file into the 'mute' list 2) return detail/spinlock.hpp back and update the boost mappings. 3) leave it as is I think option 1 is the best in the short run. http://gerrit.cloudera.org:8080/#/c/4738/18/src/kudu/fs/block_manager.cc File src/kudu/fs/block_manager.cc: Line 18: > extra space Done http://gerrit.cloudera.org:8080/#/c/4738/18/src/kudu/fs/data_dirs.h File src/kudu/fs/data_dirs.h: Line 33: #include "kudu/gutil/callback_forward.h" // IWYU pragma: keep > I don't think it needs callback.h and callback_forward.h Done http://gerrit.cloudera.org:8080/#/c/4738/18/src/kudu/gutil/strings/join.h File src/kudu/gutil/strings/join.h: Line 19: // > extra line Done http://gerrit.cloudera.org:8080/#/c/4738/18/src/kudu/integration-tests/cluster_itest_util.h File src/kudu/integration-tests/cluster_itest_util.h: Line 49: class optional; > same - consider optional_fwd.hpp Yep: as already mentioned, I tried that but IWYU prefers this way. I suggest to leave it as is for now, and then we can clarify on this a little bit later in the background.
[kudu-CR] [iwyu] first pass
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4738 to look at the new patch set (#19). Change subject: [iwyu] first pass .. [iwyu] first pass Updated C++ source files in accordance with some of the recommendations from include-what-you-use (IWYU) tool: * remove unused header files * add missing header files * use forward declarations when possible As a result, time of compilation reduced ~10% if building with GNU make in parallel with 48 jobs (make -j48) at 48-core machine with dual Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz processors and HDD drives. prior: real1m46.782s user47m1.633s sys 3m10.678s after: real1m36.867s user42m17.340s sys 2m54.117s The next step is automating the checks, so IWYU check would run automatically (like the LINT build). Note: As of now, not all recommendations from the tool are applied yet, especially for the test-related code. That's because some recommendations produced by IWYU are incorrect. Basically, every suggestion required manual inspection and judgment, at least it was necessary to verify the result code at least compiles. As a result, the process of applying those recommendations is tedious and time consuming. Hopefully, IWYU will get better in the future. Meanwhile, we can address the rest of the recommendations file-by-file or so in the short run. References: [1] IWYU pragmas: https://github.com/include-what-you-use/include-what-you-use/blob/master/docs/IWYUPragmas.md [2] IWYU mappings: https://github.com/include-what-you-use/include-what-you-use/blob/master/docs/IWYUMappings.md Change-Id: I6923091be36a1c03fac34dd73e6b17e17ac9ceaa --- M src/kudu/benchmarks/rle.cc M src/kudu/benchmarks/tpch/line_item_tsv_importer.h M src/kudu/benchmarks/tpch/rpc_line_item_dao-test.cc M src/kudu/benchmarks/tpch/rpc_line_item_dao.cc M src/kudu/benchmarks/tpch/rpc_line_item_dao.h M src/kudu/benchmarks/tpch/tpch1.cc M src/kudu/benchmarks/tpch/tpch_real_world.cc M src/kudu/benchmarks/wal_hiccup.cc M src/kudu/cfile/binary_dict_block.cc M src/kudu/cfile/binary_dict_block.h M src/kudu/cfile/binary_plain_block.cc M src/kudu/cfile/binary_plain_block.h M src/kudu/cfile/binary_prefix_block.cc M src/kudu/cfile/binary_prefix_block.h M src/kudu/cfile/block_cache-test.cc M src/kudu/cfile/block_cache.cc M src/kudu/cfile/block_cache.h M src/kudu/cfile/block_compression.cc M src/kudu/cfile/block_compression.h M src/kudu/cfile/bloomfile-test-base.h M src/kudu/cfile/bloomfile-test.cc M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/bshuf_block.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/cfile/cfile_util.cc M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/cfile/encoding-test.cc M src/kudu/cfile/index-test.cc M src/kudu/cfile/index_block.cc M src/kudu/cfile/index_block.h M src/kudu/cfile/index_btree.cc M src/kudu/cfile/index_btree.h M src/kudu/cfile/mt-bloomfile-test.cc M src/kudu/cfile/type_encodings.cc M src/kudu/cfile/type_encodings.h M src/kudu/client/batcher.cc M src/kudu/client/batcher.h M src/kudu/client/client-internal.cc M src/kudu/client/client-internal.h M src/kudu/client/client-test-util.cc M src/kudu/client/client-test-util.h M src/kudu/client/client-test.cc M src/kudu/client/client-unittest.cc M src/kudu/client/client.cc M src/kudu/client/client.h M src/kudu/client/client_builder-internal.h M src/kudu/client/error-internal.cc M src/kudu/client/error-internal.h M src/kudu/client/error_collector.cc M src/kudu/client/error_collector.h M src/kudu/client/master_rpc.cc M src/kudu/client/master_rpc.h M src/kudu/client/meta_cache.cc M src/kudu/client/meta_cache.h M src/kudu/client/partitioner-internal.cc M src/kudu/client/partitioner-internal.h M src/kudu/client/predicate-test.cc M src/kudu/client/replica-internal.cc M src/kudu/client/replica-internal.h M src/kudu/client/resource_metrics.cc M src/kudu/client/resource_metrics.h M src/kudu/client/samples/sample.cc M src/kudu/client/scan_batch.cc M src/kudu/client/scan_batch.h M src/kudu/client/scan_configuration.cc M src/kudu/client/scan_configuration.h M src/kudu/client/scan_predicate.cc M src/kudu/client/scan_predicate.h M src/kudu/client/scan_token-internal.cc M src/kudu/client/scan_token-internal.h M src/kudu/client/scan_token-test.cc M src/kudu/client/scanner-internal.cc M src/kudu/client/scanner-internal.h M src/kudu/client/schema-internal.h M src/kudu/client/schema.cc M src/kudu/client/schema.h M src/kudu/client/session-internal.cc M src/kudu/client/session-internal.h M src/kudu/client/shared_ptr.h M src/kudu/client/table-internal.cc M src/kudu/client/table-internal.h M src/kudu/client/table_alterer-internal.cc M src/kudu/client/table_alterer-internal.h M
[kudu-CR] KUDU-1407: reassign failed tablets
Andrew Wong has posted comments on this change. Change subject: KUDU-1407: reassign failed tablets .. Patch Set 13: (1 comment) @dralves Opted out of doing the enums thing we discussed offline. I think there are few enough cases for now for it to be ok just add case handling as needed. http://gerrit.cloudera.org:8080/#/c/7440/12/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: PS12, Line 753: replica->SetError(s); : replica->Shutdown(); > I don't think you have to lose the error messages: Ah, fair enough. Done. -- To view, visit http://gerrit.cloudera.org:8080/7440 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1407: reassign failed tablets
Hello David Ribeiro Alves, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7440 to look at the new patch set (#13). Change subject: KUDU-1407: reassign failed tablets .. KUDU-1407: reassign failed tablets Tablets put into the state tablet::FAILED are left until they are manually deleted; they are not evicted and reassigned. If a tablet fails to bootstrap, it will sit, responding to heartbeats, doing nothing else. This patch ensures failed tablets will be reassigned. As the tablets are not used, rather than directly setting replicas to FAILED, an error is first recorded and the TabletReplica::Shutdown(), leaving the final state as FAILED. A replica can no longer leave the FAILED state (calls to Shutdown() leave it FAILED). The tserver response generated by FAILED tablets is now TABLET_FAILED. Upon receiving this, a leader will immediately evict the peer. Prior to this patch, a tablet was marked FAILED if its WAL or metadata failed to delete (after already shutting down). If this occurs, there may be an inconsistency on-disk. This has been made fatal. Testing is done in a few places: - raft_consensus-itest is updated to ensure that tablets that fail to bootstrap are evicted and replaced. - tablet_server-test is also updated to ensure that, instead of TABLET_NOT_RUNNING, TABLET_FAILED is returned by failed tablets. - a test is added to ts_tablet_manager-itest is added to test that failed tablets while running are evicted and replaced. This patch is a part of a series of patches to handle disk failure. See section 2.5 in this doc: https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345 --- M src/kudu/client/scanner-internal.cc M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_queue.cc M src/kudu/consensus/consensus_queue.h M src/kudu/integration-tests/raft_consensus-itest.cc M src/kudu/integration-tests/ts_recovery-itest.cc M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/master/catalog_manager.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/tserver.proto 14 files changed, 162 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/40/7440/13 -- To view, visit http://gerrit.cloudera.org:8080/7440 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1407: reassign failed tablets
Adar Dembo has posted comments on this change. Change subject: KUDU-1407: reassign failed tablets .. Patch Set 12: (1 comment) http://gerrit.cloudera.org:8080/#/c/7440/12/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: PS12, Line 753: replica->SetError(s); : replica->Shutdown(); > I'm going to keep it this way since it preserves some useful error messages I don't think you have to lose the error messages: Status s; auto cleanup = MakeScopedCleanup([&]() { replica->SetError(s); replica->Shutdown(); } s = cmeta_manager_->Load(...); if (!s.ok()) { LOG(ERROR) << ...; return; } This way you get a customized log message per error site, but you centralize the cleanup in one place. -- To view, visit http://gerrit.cloudera.org:8080/7440 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345 Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1407: reassign failed tablets
Andrew Wong has posted comments on this change. Change subject: KUDU-1407: reassign failed tablets .. Patch Set 12: Code-Review-1 (1 comment) I'm a little suspicious of the build failure; I'm currently running a dist-test to make sure this won't flake delete_table-itest http://gerrit.cloudera.org:8080/#/c/7440/12/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: PS12, Line 753: replica->SetError(s); : replica->Shutdown(); > Should put this in a ScopedCleanup (and cancel it at the end of the functio I'm going to keep it this way since it preserves some useful error messages. Scoped cleanup might make it a bit trickier. -- To view, visit http://gerrit.cloudera.org:8080/7440 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345 Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1407: reassign failed tablets
Adar Dembo has posted comments on this change. Change subject: KUDU-1407: reassign failed tablets .. Patch Set 12: (1 comment) Feel free to ignore my feedback if you weren't going to generate a new patch since it's minor. http://gerrit.cloudera.org:8080/#/c/7440/12/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: PS12, Line 753: replica->SetError(s); : replica->Shutdown(); Should put this in a ScopedCleanup (and cancel it at the end of the function) so you don't have to repeat it over and over. -- To view, visit http://gerrit.cloudera.org:8080/7440 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345 Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1407: reassign failed tablets
David Ribeiro Alves has posted comments on this change. Change subject: KUDU-1407: reassign failed tablets .. Patch Set 12: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7440 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345 Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] consensus: Tablet copy should clear last-logged opid from superblock
Hello Todd Lipcon, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7718 to look at the new patch set (#3). Change subject: consensus: Tablet copy should clear last-logged opid from superblock .. consensus: Tablet copy should clear last-logged opid from superblock Keeping around irrelevant information is bad hygiene. Change-Id: Iaa84d59c63222e9ddb05dca492f9ecd47b5c63ea --- M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h 3 files changed, 66 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/18/7718/3 -- To view, visit http://gerrit.cloudera.org:8080/7718 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iaa84d59c63222e9ddb05dca492f9ecd47b5c63ea Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] consensus: Tablet copy should clear last-logged opid from superblock
Mike Percy has posted comments on this change. Change subject: consensus: Tablet copy should clear last-logged opid from superblock .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7718/2//COMMIT_MSG Commit Message: PS2, Line 10: want to disk tombstoned voting > ? s/disk/risk/ but I'm not sure it adds much. I'll just remove the last sentence. -- To view, visit http://gerrit.cloudera.org:8080/7718 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaa84d59c63222e9ddb05dca492f9ecd47b5c63ea Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-2032 (part 1): pass pre-resolution hostname into RPC proxies
Sailesh Mukil has posted comments on this change. Change subject: KUDU-2032 (part 1): pass pre-resolution hostname into RPC proxies .. Patch Set 5: Code-Review+1 (4 comments) http://gerrit.cloudera.org:8080/#/c/7687/5/src/kudu/client/client-internal.cc File src/kudu/client/client-internal.cc: PS5, Line 675: master_proxy_.reset(new MasterServiceProxy( : messenger_, leader_addr, leader_addr.host())); nit: single line. http://gerrit.cloudera.org:8080/#/c/7687/5/src/kudu/integration-tests/create-table-stress-test.cc File src/kudu/integration-tests/create-table-stress-test.cc: PS5, Line 105: master_proxy_.reset(new MasterServiceProxy( : messenger_, addr, addr.host())); nit: single line. http://gerrit.cloudera.org:8080/#/c/7687/5/src/kudu/integration-tests/master-stress-test.cc File src/kudu/integration-tests/master-stress-test.cc: PS5, Line 245: unique_ptr proxy( : new MasterServiceProxy(messenger, addr, addr.host())); nit: single line http://gerrit.cloudera.org:8080/#/c/7687/4/src/kudu/master/ts_descriptor.cc File src/kudu/master/ts_descriptor.cc: Line 246: RETURN_NOT_OK(ResolveSockaddr(, )); > addr.ToStringWithoutPort is always the dotted-decimal string '1.2.3.4' rath Thanks for clarifying. -- To view, visit http://gerrit.cloudera.org:8080/7687 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I96fb3c73382f0be6e30e29ae2e7176be42f3bb98 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-2104. Upgrade to Spark 2.2.0
Dan Burkert has posted comments on this change. Change subject: KUDU-2104. Upgrade to Spark 2.2.0 .. Patch Set 2: I guess that begs the question: are we still compatible with Spark 2.0 and 2.1? I think we are at this point, but would we intentionally forgo using a newly introduced API just to remain backwards compat? -- To view, visit http://gerrit.cloudera.org:8080/7720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If3c179816f02fa8cc3e942e803851881dda0a014 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] [java] Fix gradle wrapper with 4.1
Grant Henke has uploaded a new change for review. http://gerrit.cloudera.org:8080/7726 Change subject: [java] Fix gradle wrapper with 4.1 .. [java] Fix gradle wrapper with 4.1 Gradle doesn’t use trailing zeros with new minor versions which broke the wrapper logic. This fixes the version to remove the trailing zero and adjusts the logic to generate the gradle-wrapper.jar url (which does use a trailing zero). Change-Id: I12daa60faf3376cbb8662710bde3856472bb --- M java/gradle/dependencies.gradle M java/gradle/wrapper.gradle M java/gradle/wrapper/gradle-wrapper.properties 3 files changed, 6 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/26/7726/1 -- To view, visit http://gerrit.cloudera.org:8080/7726 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I12daa60faf3376cbb8662710bde3856472bb Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant Henke
[kudu-CR] KUDU-2101 Include a table summary at the bottom
Jean-Daniel Cryans has posted comments on this change. Change subject: KUDU-2101 Include a table summary at the bottom .. Patch Set 4: Wondering... should we had a flag to disable the summary at the end? Maybe a way to have a less verbose output. Or maybe the reverse, have a flag to not show all the details and only have the summary? -- To view, visit http://gerrit.cloudera.org:8080/7707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1913352e3a1f91b4bb07e2f5001c8cc94d5155d4 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] [docs] Deprecate Java 7 and Spark 1
Jean-Daniel Cryans has posted comments on this change. Change subject: [docs] Deprecate Java 7 and Spark 1 .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7699 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I050df790ca51729cc99866b9cd75933c3c907a4c Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] [docs] Deprecate Java 7 and Spark 1
Jean-Daniel Cryans has submitted this change and it was merged. Change subject: [docs] Deprecate Java 7 and Spark 1 .. [docs] Deprecate Java 7 and Spark 1 Starts the release notes for Kudu 1.5.0 and adds entries for Java 7 and Spark 1 deprecation. Change-Id: I050df790ca51729cc99866b9cd75933c3c907a4c Reviewed-on: http://gerrit.cloudera.org:8080/7699 Tested-by: Kudu Jenkins Reviewed-by: Dan BurkertReviewed-by: Jean-Daniel Cryans --- M docs/installation.adoc M docs/prior_release_notes.adoc M docs/release_notes.adoc M java/README.md 4 files changed, 271 insertions(+), 197 deletions(-) Approvals: Dan Burkert: Looks good to me, approved Jean-Daniel Cryans: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7699 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I050df790ca51729cc99866b9cd75933c3c907a4c Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1942. Kerberos fails to log in on hostnames with capital letters
Dan Burkert has posted comments on this change. Change subject: KUDU-1942. Kerberos fails to log in on hostnames with capital letters .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/7693/4/src/kudu/security/init.cc File src/kudu/security/init.cc: Line 398: std::transform(hostname.begin(), hostname.end(), hostname.begin(), tolower); > Does this handle multi-byte characters correctly? Is that necessary for ho Nevermind, hostnames can only be ASCII. -- To view, visit http://gerrit.cloudera.org:8080/7693 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5ef65dd827459476a2d225d8e3f7c80ff2fdf627 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [docs] Deprecate Java 7 and Spark 1
Dan Burkert has posted comments on this change. Change subject: [docs] Deprecate Java 7 and Spark 1 .. Patch Set 3: Code-Review+2 Wanted to give you a change to weigh in on the fixes. But if you insist SHIP IT! -- To view, visit http://gerrit.cloudera.org:8080/7699 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I050df790ca51729cc99866b9cd75933c3c907a4c Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-2104. Upgrade to Spark 2.2.0
Dan Burkert has posted comments on this change. Change subject: KUDU-2104. Upgrade to Spark 2.2.0 .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7720/2/docs/developing.adoc File docs/developing.adoc: Line 162: - The spark 2.x integration requires Java 8 at runtime as of Kudu 1.5.0. > It is, but if this patch is accepted Spark 2.2.0 will be the build default But if your 'spark2-submit' binary is Spark 2.1, you should still be able to run jobs including kudu-spark2_2.11 with JRE 7, right? -- To view, visit http://gerrit.cloudera.org:8080/7720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If3c179816f02fa8cc3e942e803851881dda0a014 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] KUDU-1942. Kerberos fails to log in on hostnames with capital letters
Dan Burkert has posted comments on this change. Change subject: KUDU-1942. Kerberos fails to log in on hostnames with capital letters .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/7693/4/src/kudu/security/init.cc File src/kudu/security/init.cc: Line 398: std::transform(hostname.begin(), hostname.end(), hostname.begin(), tolower); Does this handle multi-byte characters correctly? Is that necessary for hostnames? -- To view, visit http://gerrit.cloudera.org:8080/7693 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5ef65dd827459476a2d225d8e3f7c80ff2fdf627 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-2032 (part 2): propagate master hostnames into client
Dan Burkert has posted comments on this change. Change subject: KUDU-2032 (part 2): propagate master hostnames into client .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/7692/5/src/kudu/client/client-internal.cc File src/kudu/client/client-internal.cc: Line 631: const pair& leader_addr_and_name, Why not two separate params? Then you could even move the leader_hostname into the new MasterServiceProxy call. -- To view, visit http://gerrit.cloudera.org:8080/7692 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie5838f22c96f757124112b505825a53740468ce1 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-2032 (part 1): pass pre-resolution hostname into RPC proxies
Dan Burkert has posted comments on this change. Change subject: KUDU-2032 (part 1): pass pre-resolution hostname into RPC proxies .. Patch Set 5: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/7687 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I96fb3c73382f0be6e30e29ae2e7176be42f3bb98 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-2101 Include a table summary at the bottom
Todd Lipcon has posted comments on this change. Change subject: KUDU-2101 Include a table summary at the bottom .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1913352e3a1f91b4bb07e2f5001c8cc94d5155d4 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] consensus: Tablet copy should clear last-logged opid from superblock
Todd Lipcon has posted comments on this change. Change subject: consensus: Tablet copy should clear last-logged opid from superblock .. Patch Set 2: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/7718/2//COMMIT_MSG Commit Message: PS2, Line 10: want to disk tombstoned voting ? -- To view, visit http://gerrit.cloudera.org:8080/7718 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaa84d59c63222e9ddb05dca492f9ecd47b5c63ea Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] consensus: Improve contract for API to fetch last-logged op id
Todd Lipcon has posted comments on this change. Change subject: consensus: Improve contract for API to fetch last-logged op id .. Patch Set 2: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/7717/1/src/kudu/tserver/tablet_copy_client.cc File src/kudu/tserver/tablet_copy_client.cc: Line 134: if (last_logged_opid && last_logged_opid->term() > caller_term) { > I'm not sure what else we can do. If we have tombstoned a failed replica, w worth a log message indicating that it is allowing replacement of a failed replica? I seem to remember we convinced ourselves that this is probably slightly unsafe technically but that we should do it, but I think it would be good to dig up some of our thinking and put it in a comment and have some log message which indicates that we are "bending the rules" in some way here. -- To view, visit http://gerrit.cloudera.org:8080/7717 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia4e4501a61cd40fdee0dc918b77675a0bc2515e7 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-2101 Include a table summary at the bottom
Will Berkeley has posted comments on this change. Change subject: KUDU-2101 Include a table summary at the bottom .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/7707/3/src/kudu/tools/ksck.cc File src/kudu/tools/ksck.cc: PS3, Line 235: == > On second thought, I think I might have misled you on this, maybe it should Whoops. -- To view, visit http://gerrit.cloudera.org:8080/7707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1913352e3a1f91b4bb07e2f5001c8cc94d5155d4 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] KUDU-2101 Include a table summary at the bottom
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7707 to look at the new patch set (#4). Change subject: KUDU-2101 Include a table summary at the bottom .. KUDU-2101 Include a table summary at the bottom This add a table summary to the bottom of ksck. For each table, the table contains a status and information about the total number of tablets and the number of healthy, under-replicated, and unhealthy tablets. A tablet with consensus mismatch is counted as unhealthy, to make the table easier for less experienced users to understand. See ksck-test for sample output. Change-Id: I1913352e3a1f91b4bb07e2f5001c8cc94d5155d4 --- M src/kudu/tools/ksck-test.cc M src/kudu/tools/ksck.cc M src/kudu/tools/ksck.h 3 files changed, 152 insertions(+), 34 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/7707/4 -- To view, visit http://gerrit.cloudera.org:8080/7707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1913352e3a1f91b4bb07e2f5001c8cc94d5155d4 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] security: only lookup hostname if HOST substitution is required
Alexey Serbin has posted comments on this change. Change subject: security: only lookup hostname if _HOST substitution is required .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7694 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5de8647d6cf63ea70d880fa530fa289e8bae24fe Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-HasComments: No
[kudu-CR] security: only lookup hostname if HOST substitution is required
Alexey Serbin has submitted this change and it was merged. Change subject: security: only lookup hostname if _HOST substitution is required .. security: only lookup hostname if _HOST substitution is required The Kerberos principal configuration uses the special token '_HOST' to indicate that the FQDN of the host should be specified. Previously we would always lookup the FQDN even if the substitution was not required, which might mean that startup would fail if there was no FQDN available, even if no _HOST substitution was required. Now, we only lookup the FQDN if FLAGS_principal contains the substitution token. This provides the possibility of a workaround of explicit principal configuration on machines with no FQDN. Change-Id: I5de8647d6cf63ea70d880fa530fa289e8bae24fe Reviewed-on: http://gerrit.cloudera.org:8080/7694 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin--- M src/kudu/security/init.cc 1 file changed, 10 insertions(+), 7 deletions(-) Approvals: Alexey Serbin: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7694 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I5de8647d6cf63ea70d880fa530fa289e8bae24fe Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil
[kudu-CR] KUDU-2032 (part 2): propagate master hostnames into client
Hello Alexey Serbin, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7692 to look at the new patch set (#5). Change subject: KUDU-2032 (part 2): propagate master hostnames into client .. KUDU-2032 (part 2): propagate master hostnames into client This changes the client code to remember the user-specified master addresses and propagate them into the creation of master proxies. It's not possible to reproduce the necessary DNS configurations in a minicluster test, but with this patch I am now able to use 'kudu perf loadgen' against a Kerberized cluster even when my local krb5.conf has rdns=false. Change-Id: Ie5838f22c96f757124112b505825a53740468ce1 --- M src/kudu/client/client-internal.cc M src/kudu/client/client-internal.h M src/kudu/client/master_rpc.cc M src/kudu/client/master_rpc.h M src/kudu/integration-tests/external_mini_cluster.cc 5 files changed, 56 insertions(+), 40 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/92/7692/5 -- To view, visit http://gerrit.cloudera.org:8080/7692 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie5838f22c96f757124112b505825a53740468ce1 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2032 (part 1): pass pre-resolution hostname into RPC proxies
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7687 to look at the new patch set (#5). Change subject: KUDU-2032 (part 1): pass pre-resolution hostname into RPC proxies .. KUDU-2032 (part 1): pass pre-resolution hostname into RPC proxies This modifies the constructor of RPC proxies (generated and otherwise) to take the remote hostname in addition to the existing resolved Sockaddr parameter. The hostname is then passed into the ConnectionId object, and plumbed through to the SASL client in place of the IP address that was used previously. The patch changes all of the construction sites of Proxy to fit the new interface. In most of the test cases, we don't have real hostnames, so we just use the dotted-decimal string form of the remote Sockaddr, which matches the existing behavior. In the real call sites, we have actual host names typically specified by the user, and in those cases we'll need to pass those into the proxy. In a few cases, they were conveniently available in the same function that creates the proxy. In others, they are relatively far away, so this patch just uses the dotted-decimal string and leaves TODOs. In the case that Kerberos is not configured, this change should have no effect since the hostname is ignored by SASL "plain". In the case that Kerberos is configured with 'rdns=true', they also have no effect, because the krb5 library will resolve and reverse the hostname from the IP as it did before. In the case that 'rdns=false', this moves us one step closer to fixing KUDU-2032 by getting a hostname into the SASL library. I verified that, if I set 'rdns = false' on a Kerberized client, I'm now able to run 'kudu master status ' successfully where it would not before. This tool uses a direct proxy instantiation where the hostname was easy to plumb in. 'kudu table list ' still does not work because it uses the client, which wasn't convenient to plumb quite yet. Given that this makes incremental improvement towards fixing the issue without any regression, and is already a fairly wide patch, I hope to commit this and then address the remaining plumbing in a separate patch. Change-Id: I96fb3c73382f0be6e30e29ae2e7176be42f3bb98 --- M src/kudu/client/client-internal.cc M src/kudu/client/master_rpc.cc M src/kudu/client/meta_cache.cc M src/kudu/consensus/consensus_peers.cc M src/kudu/integration-tests/create-table-stress-test.cc M src/kudu/integration-tests/exactly_once_writes-itest.cc M src/kudu/integration-tests/external_mini_cluster.cc M src/kudu/integration-tests/internal_mini_cluster.cc M src/kudu/integration-tests/master-stress-test.cc M src/kudu/integration-tests/master_cert_authority-itest.cc M src/kudu/integration-tests/master_replication-itest.cc M src/kudu/integration-tests/security-itest.cc M src/kudu/integration-tests/table_locations-itest.cc M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/master/master-test.cc M src/kudu/master/master.cc M src/kudu/master/sys_catalog-test.cc M src/kudu/master/ts_descriptor.cc M src/kudu/master/ts_descriptor.h M src/kudu/rpc/connection.h M src/kudu/rpc/connection_id.cc M src/kudu/rpc/connection_id.h M src/kudu/rpc/exactly_once_rpc-test.cc M src/kudu/rpc/mt-rpc-test.cc M src/kudu/rpc/negotiation.cc M src/kudu/rpc/protoc-gen-krpc.cc M src/kudu/rpc/proxy.cc M src/kudu/rpc/proxy.h M src/kudu/rpc/reactor.cc M src/kudu/rpc/rpc-bench.cc M src/kudu/rpc/rpc-test.cc M src/kudu/rpc/rpc_stub-test.cc M src/kudu/tools/ksck_remote.cc M src/kudu/tools/tool_action_common.cc M src/kudu/tserver/heartbeater.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_service-test.cc M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/tablet_server_test_util.cc M src/kudu/util/net/net_util-test.cc M src/kudu/util/net/sockaddr.cc M src/kudu/util/net/sockaddr.h 42 files changed, 222 insertions(+), 138 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/87/7687/5 -- To view, visit http://gerrit.cloudera.org:8080/7687 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I96fb3c73382f0be6e30e29ae2e7176be42f3bb98 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2032 (part 1): pass pre-resolution hostname into RPC proxies
Todd Lipcon has posted comments on this change. Change subject: KUDU-2032 (part 1): pass pre-resolution hostname into RPC proxies .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/7687/4/src/kudu/master/ts_descriptor.cc File src/kudu/master/ts_descriptor.cc: Line 246: RETURN_NOT_OK(ResolveSockaddr(, )); > In cases like this, wouldn't addr.ToStringWithoutPort() be the same as 'hos addr.ToStringWithoutPort is always the dotted-decimal string '1.2.3.4' rather than the original hostname. The idea of trying to maintain the user-specified host is also to deal with the case where the user is using a hostname that doesn't quite match the canonical name. krb5 will often canonicalize it for us using reverse DNS (with the default config) but we should leave that up to krb5.conf, and respect rdns=false if they have configured it that way. http://gerrit.cloudera.org:8080/#/c/7687/4/src/kudu/util/net/sockaddr.cc File src/kudu/util/net/sockaddr.cc: PS4, Line 102: char str[INET_ADDRSTRLEN]; : ::inet_ntop(AF_INET, _.sin_addr, str, INET_ADDRSTRLEN); : return StringPrintf("%s:%d", str, port()); > return StringPrintf("%s:%d", ToStringWithoutPort(), port()); actually just realized that we already have a .host() which is equivalent to the new method I added. will just use that. -- To view, visit http://gerrit.cloudera.org:8080/7687 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I96fb3c73382f0be6e30e29ae2e7176be42f3bb98 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-2104. Upgrade to Spark 2.2.0
Grant Henke has posted comments on this change. Change subject: KUDU-2104. Upgrade to Spark 2.2.0 .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7720/2/docs/developing.adoc File docs/developing.adoc: Line 162: - The spark 2.x integration requires Java 8 at runtime as of Kudu 1.5.0. > I thought we determined that the jar should be compatible with Java 7 when It is, but if this patch is accepted Spark 2.2.0 will be the build default which requires Java 8. -- To view, visit http://gerrit.cloudera.org:8080/7720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If3c179816f02fa8cc3e942e803851881dda0a014 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] [java] Include Spark/Scala base version in kudu-spark-tools
Dan Burkert has posted comments on this change. Change subject: [java] Include Spark/Scala base version in kudu-spark-tools .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/7723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2e25b0a0d560d5c9504002fbe3bbcfdc79a83d2d Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] KUDU-2104. Upgrade to Spark 2.2.0
Dan Burkert has posted comments on this change. Change subject: KUDU-2104. Upgrade to Spark 2.2.0 .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7720/2/docs/developing.adoc File docs/developing.adoc: Line 162: - The spark 2.x integration requires Java 8 at runtime as of Kudu 1.5.0. I thought we determined that the jar should be compatible with Java 7 when used with Spark 2.1? -- To view, visit http://gerrit.cloudera.org:8080/7720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If3c179816f02fa8cc3e942e803851881dda0a014 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] KUDU-2032 (part 2): propagate master hostnames into client
Alexey Serbin has posted comments on this change. Change subject: KUDU-2032 (part 2): propagate master hostnames into client .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7692 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie5838f22c96f757124112b505825a53740468ce1 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] [docs] Deprecate Java 7 and Spark 1
Dan Burkert has posted comments on this change. Change subject: [docs] Deprecate Java 7 and Spark 1 .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/7699 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I050df790ca51729cc99866b9cd75933c3c907a4c Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] [java] Include Spark/Scala base version in kudu-spark-tools
Grant Henke has uploaded a new patch set (#2). Change subject: [java] Include Spark/Scala base version in kudu-spark-tools .. [java] Include Spark/Scala base version in kudu-spark-tools Renames the kudu-spark-tools artifact to include the spark and scala base version. Including the scala version is standard practice and helps users avoid binary compatability issues. Adding the Spark base version matches the pattern used in the kudu-spark module and prevents users from trying to use kudu-spark-tools with Spark 1. This change will also make upgrades to Scala 2.12 or Spark 3 in the future more seamless. Before: kudu-spark-tools After:kudu-spark2-tools_2.11 Change-Id: I2e25b0a0d560d5c9504002fbe3bbcfdc79a83d2d --- M docs/release_notes.adoc M java/kudu-spark-tools/build.gradle M java/kudu-spark-tools/pom.xml 3 files changed, 10 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/7723/2 -- To view, visit http://gerrit.cloudera.org:8080/7723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2e25b0a0d560d5c9504002fbe3bbcfdc79a83d2d Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] [java] Include spark/scala base version in spark-tools
Grant Henke has uploaded a new change for review. http://gerrit.cloudera.org:8080/7723 Change subject: [java] Include spark/scala base version in spark-tools .. [java] Include spark/scala base version in spark-tools Renames the kudu-spark-tools artifact to include the spark and scala base version. Including the scala version is standard practice and help users avoid binary compatability issues. Adding the Spark base version matches the pattern used in the kudu-spark module and prevents users from trying to use kudu-spark-tools with Spark 1. This change will also make upgrades to Scala 2.12 or Spark 3 in the future more seamless. Change-Id: I2e25b0a0d560d5c9504002fbe3bbcfdc79a83d2d Before: kudu-spark-tools After:kudu-spark2-tools_2.11 --- M docs/release_notes.adoc M java/kudu-spark-tools/build.gradle M java/kudu-spark-tools/pom.xml 3 files changed, 10 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/7723/1 -- To view, visit http://gerrit.cloudera.org:8080/7723 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I2e25b0a0d560d5c9504002fbe3bbcfdc79a83d2d Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant Henke
[kudu-CR] KUDU-2101 Include a table summary at the bottom
Will Berkeley has posted comments on this change. Change subject: KUDU-2101 Include a table summary at the bottom .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/7707/2/src/kudu/tools/ksck.cc File src/kudu/tools/ksck.cc: PS2, Line 235: if (left.TableStatus() == CheckResult::OK && right.TableStatus() == CheckResult::OK) { : return left.name < right.name; : } : if (left.TableStatus() == CheckResult::OK && right.TableStatus() != CheckResult::OK) { : return true; : } : if (left.TableStatus() != CheckResult::OK && right.TableStatus() == CheckResult::OK) { : return false; : } : if (left.TableStatus() != CheckResult::OK && right.TableStatus() != CheckResult::OK) { : return left.name < right.name; : } > slightly simpler option: Done PS2, Line 249: col_names > do you need this variable? I would guess you could just pass the brace list No, don't need it, but I thought it was a bit more readable. Will change. PS2, Line 265: row > same - need the var? Ditto -- To view, visit http://gerrit.cloudera.org:8080/7707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1913352e3a1f91b4bb07e2f5001c8cc94d5155d4 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] KUDU-2101 Include a table summary at the bottom
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7707 to look at the new patch set (#3). Change subject: KUDU-2101 Include a table summary at the bottom .. KUDU-2101 Include a table summary at the bottom This add a table summary to the bottom of ksck. For each table, the table contains a status and information about the total number of tablets and the number of healthy, under-replicated, and unhealthy tablets. A tablet with consensus mismatch is counted as unhealthy, to make the table easier for less experienced users to understand. See ksck-test for sample output. Change-Id: I1913352e3a1f91b4bb07e2f5001c8cc94d5155d4 --- M src/kudu/tools/ksck-test.cc M src/kudu/tools/ksck.cc M src/kudu/tools/ksck.h 3 files changed, 152 insertions(+), 34 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/7707/3 -- To view, visit http://gerrit.cloudera.org:8080/7707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1913352e3a1f91b4bb07e2f5001c8cc94d5155d4 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-1942. Kerberos fails to log in on hostnames with capital letters
Alexey Serbin has submitted this change and it was merged. Change subject: KUDU-1942. Kerberos fails to log in on hostnames with capital letters .. KUDU-1942. Kerberos fails to log in on hostnames with capital letters This ensures that servers canonicalize their FQDNs to lower-case before generating Kerberos principal names. With this change I was able to set up a working cluster on my laptop with a capitalized hostname, where before it would fail as described in the JIRA. I also verified that I was able to connect from both C++ and Java clients. Change-Id: I5ef65dd827459476a2d225d8e3f7c80ff2fdf627 Reviewed-on: http://gerrit.cloudera.org:8080/7693 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin--- M src/kudu/security/init.cc 1 file changed, 3 insertions(+), 0 deletions(-) Approvals: Alexey Serbin: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7693 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I5ef65dd827459476a2d225d8e3f7c80ff2fdf627 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1942. Kerberos fails to log in on hostnames with capital letters
Alexey Serbin has posted comments on this change. Change subject: KUDU-1942. Kerberos fails to log in on hostnames with capital letters .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7693 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5ef65dd827459476a2d225d8e3f7c80ff2fdf627 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-2033 (part 2). Add test for Java client failover support.
Edward Fancher has uploaded a new change for review. http://gerrit.cloudera.org:8080/7722 Change subject: KUDU-2033 (part 2). Add test for Java client failover support. .. KUDU-2033 (part 2). Add test for Java client failover support. New test which validates that that the java client correctly gets metadata updates after the Master leader is killed. Change-Id: I228e08429d952f0f5b2657e0b8481366c5c572a4 --- A java/kudu-client/src/test/java/org/apache/kudu/client/TestClientFailoverSupport.java 1 file changed, 87 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/22/7722/1 -- To view, visit http://gerrit.cloudera.org:8080/7722 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I228e08429d952f0f5b2657e0b8481366c5c572a4 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Edward Fancher
[kudu-CR] KUDU-2101 Include a table summary at the bottom
Todd Lipcon has posted comments on this change. Change subject: KUDU-2101 Include a table summary at the bottom .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/7707/2/src/kudu/tools/ksck.cc File src/kudu/tools/ksck.cc: PS2, Line 235: if (left.TableStatus() == CheckResult::OK && right.TableStatus() == CheckResult::OK) { : return left.name < right.name; : } : if (left.TableStatus() == CheckResult::OK && right.TableStatus() != CheckResult::OK) { : return true; : } : if (left.TableStatus() != CheckResult::OK && right.TableStatus() == CheckResult::OK) { : return false; : } : if (left.TableStatus() != CheckResult::OK && right.TableStatus() != CheckResult::OK) { : return left.name < right.name; : } slightly simpler option: return std::make_pair(left.TableStatus() == CheckResult::OK, left.name) < std::make_pair(right.TableStatus() == CheckResult::OK, right.name); does that work? PS2, Line 249: col_names do you need this variable? I would guess you could just pass the brace list below PS2, Line 265: row same - need the var? -- To view, visit http://gerrit.cloudera.org:8080/7707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1913352e3a1f91b4bb07e2f5001c8cc94d5155d4 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] KUDU-2104. Upgrade to Spark 2.2.0
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7720 to look at the new patch set (#2). Change subject: KUDU-2104. Upgrade to Spark 2.2.0 .. KUDU-2104. Upgrade to Spark 2.2.0 Upgrades to Spark 2.2.0 and documents the Java 8 runtime requirement. Also adjusts the Jenkins scripts to use Java 8 when building and running. Change-Id: If3c179816f02fa8cc3e942e803851881dda0a014 --- M build-support/jenkins/build-and-test.sh M docs/developing.adoc M docs/release_notes.adoc M java/README.md M java/gradle/dependencies.gradle M java/pom.xml 6 files changed, 10 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/20/7720/2 -- To view, visit http://gerrit.cloudera.org:8080/7720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: If3c179816f02fa8cc3e942e803851881dda0a014 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Kudu Jenkins
[kudu-CR] [java] Upgrade to Spark 2.2.0
Grant Henke has uploaded a new change for review. http://gerrit.cloudera.org:8080/7720 Change subject: [java] Upgrade to Spark 2.2.0 .. [java] Upgrade to Spark 2.2.0 Upgrades to Spark 2.2.0 and documents the Java 8 runtime requirement. Also adjusts the Jenkins scripts to use Java 8 when building and running. Change-Id: If3c179816f02fa8cc3e942e803851881dda0a014 --- M build-support/jenkins/build-and-test.sh M docs/developing.adoc M docs/release_notes.adoc M java/README.md M java/gradle/dependencies.gradle M java/pom.xml 6 files changed, 10 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/20/7720/1 -- To view, visit http://gerrit.cloudera.org:8080/7720 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: If3c179816f02fa8cc3e942e803851881dda0a014 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant Henke
[kudu-CR] [docs] Deprecate Java 7 and Spark 1
Hello Dan Burkert, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7699 to look at the new patch set (#3). Change subject: [docs] Deprecate Java 7 and Spark 1 .. [docs] Deprecate Java 7 and Spark 1 Starts the release notes for Kudu 1.5.0 and adds entries for Java 7 and Spark 1 deprecation. Change-Id: I050df790ca51729cc99866b9cd75933c3c907a4c --- M docs/installation.adoc M docs/prior_release_notes.adoc M docs/release_notes.adoc M java/README.md 4 files changed, 271 insertions(+), 197 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/7699/3 -- To view, visit http://gerrit.cloudera.org:8080/7699 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I050df790ca51729cc99866b9cd75933c3c907a4c Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2101 Include a table summary at the bottom
Will Berkeley has posted comments on this change. Change subject: KUDU-2101 Include a table summary at the bottom .. Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/7707/1/src/kudu/tools/ksck.cc File src/kudu/tools/ksck.cc: PS1, Line 222: push_back > emplace_back(std::move(ts)) Done Line 231: // The table summary is in alphabetical order by name. > is this best or should we put the unhealthy tables at the bottom, so that w Done PS1, Line 239: vector names; : vector statuses; : vector total_tablets; : vector num_healthy; : vector num_underreplicated; : vector num_unhealthy; > is using the 'AddRow' interface not fewer lines of code / easier to follow Done PS1, Line 255: UNHEALTHY > I think unavailable is better but the current unhealthy is better aligned w Unavailable is probably better. An under-replicated tablet is available, but it's not healthy. I think it's straightforward without an explanation to a casual Hadoop admin but I can add some words if you disagree. PS1, Line 255: UNHEALTHY > I find "UNHEALTHY" to be a little confusing. Should we say "unavailable" or Done -- To view, visit http://gerrit.cloudera.org:8080/7707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1913352e3a1f91b4bb07e2f5001c8cc94d5155d4 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] KUDU-2101 Include a table summary at the bottom
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7707 to look at the new patch set (#2). Change subject: KUDU-2101 Include a table summary at the bottom .. KUDU-2101 Include a table summary at the bottom This add a table summary to the bottom of ksck. For each table, the table contains a status and information about the total number of tablets and the number of healthy, under-replicated, and unhealthy tablets. A tablet with consensus mismatch is counted as unhealthy, to make the table easier for less experienced users to understand. See ksck-test for sample output. Change-Id: I1913352e3a1f91b4bb07e2f5001c8cc94d5155d4 --- M src/kudu/tools/ksck-test.cc M src/kudu/tools/ksck.cc M src/kudu/tools/ksck.h 3 files changed, 164 insertions(+), 34 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/7707/2 -- To view, visit http://gerrit.cloudera.org:8080/7707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1913352e3a1f91b4bb07e2f5001c8cc94d5155d4 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] java: prohibit use of a KuduTable from an unassociated KuduClient
Todd Lipcon has posted comments on this change. Change subject: java: prohibit use of a KuduTable from an unassociated KuduClient .. Patch Set 2: sure, will do. -- To view, visit http://gerrit.cloudera.org:8080/7362 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib4a977f36b9c7ba3758322a2216ce90208b5d014 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] [docs] Deprecate Java 7 and Spark 1
Hello Dan Burkert, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7699 to look at the new patch set (#2). Change subject: [docs] Deprecate Java 7 and Spark 1 .. [docs] Deprecate Java 7 and Spark 1 Starts the release notes for Kudu 1.5.0 and adds entries for Java 7 and Spark 1 deprecation. Change-Id: I050df790ca51729cc99866b9cd75933c3c907a4c --- M docs/installation.adoc M docs/prior_release_notes.adoc M docs/release_notes.adoc M java/README.md 4 files changed, 262 insertions(+), 191 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/99/7699/2 -- To view, visit http://gerrit.cloudera.org:8080/7699 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I050df790ca51729cc99866b9cd75933c3c907a4c Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] java: prohibit use of a KuduTable from an unassociated KuduClient
Jean-Daniel Cryans has posted comments on this change. Change subject: java: prohibit use of a KuduTable from an unassociated KuduClient .. Patch Set 2: Todd, is it possible to rewrite this patch without the change to request tracking? -- To view, visit http://gerrit.cloudera.org:8080/7362 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib4a977f36b9c7ba3758322a2216ce90208b5d014 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-2101 Include a table summary at the bottom
Jean-Daniel Cryans has posted comments on this change. Change subject: KUDU-2101 Include a table summary at the bottom .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7707/1/src/kudu/tools/ksck.cc File src/kudu/tools/ksck.cc: PS1, Line 255: UNHEALTHY > I find "UNHEALTHY" to be a little confusing. Should we say "unavailable" or I think unavailable is better but the current unhealthy is better aligned with OK state of being healthy. Maybe we just need to print out an explanation? -- To view, visit http://gerrit.cloudera.org:8080/7707 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1913352e3a1f91b4bb07e2f5001c8cc94d5155d4 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-871. Support tombstoned voting
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6960 to look at the new patch set (#8). Change subject: KUDU-871. Support tombstoned voting .. KUDU-871. Support tombstoned voting This patch makes it possible for tombstoned tablet replicas to vote in Raft elections. Changes: * Add Stop() method to TabletReplica + Consensus lifecycle. * Includes new STOPPED state. * Tombstoning a replica should call Stop(). * Deleting a replica should call Shutdown(). * Add positive and negative tests for tombstoned voting. * Add a stress test that induces lots of tombstoned voting while running TabletCopy, TabletBootstrap, and DeleteTablet. * Fix DeleteTableITest.TestMergeConsensusMetadata after tombstoned voting changed its assumption that tombstoned tablets would not vote. * Fix several tests that expected tombstoned tablets to be SHUTDOWN when now they are STOPPED. * Get rid of tablet::FAILED state, instead go to SHUTDOWN with an error. Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 --- M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/cluster_itest_util.cc M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/integration-tests/delete_table-itest.cc M src/kudu/integration-tests/delete_tablet-itest.cc M src/kudu/integration-tests/external_mini_cluster-itest-base.cc M src/kudu/integration-tests/raft_consensus-itest.cc A src/kudu/integration-tests/tombstoned_voting-itest.cc A src/kudu/integration-tests/tombstoned_voting-stress-test.cc M src/kudu/integration-tests/ts_recovery-itest.cc M src/kudu/master/catalog_manager.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_replica-test.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tools/ksck-test.cc M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/kudu-ts-cli-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/tserver/tserver-path-handlers.cc M src/kudu/util/make_shared.h 33 files changed, 1,393 insertions(+), 341 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/6960/8 -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] consensus: Improve contract for API to fetch last-logged op id
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7717 to look at the new patch set (#2). Change subject: consensus: Improve contract for API to fetch last-logged op id .. consensus: Improve contract for API to fetch last-logged op id It's important that we differentiate between when we have a known last-logged op and when we don't actually know what it is or whether we have ever appended something to the local WAL. This applies both to the TABLET_DATA_READY case, where this information is stored in the WAL, and the TABLET_DATA_TOMBSTONED case, where this information is stored in the superblock. Cases where we are unable to determine the last-logged OpId from the WAL when a replica is in TABLET_DATA_READY state: * Early in the tablet replica lifecycle (before Raft is started). * When a replica encounters an error during initialization. Cases where we are unable to determine the last-logged OpId from the TabletMetadata when a replica is in TABLET_DATA_TOMBSTONED state: * If the replica was tombstoned while in a failed state. Included in this patch are the following API improvements: 1. Delete Log::GetLatestEntryOpId(). Previously, this method would only return something other than MinimumOpId() if a log entry had been appended during the object's lifetime. It is abandoned in favor of RaftConsensus::GetLastOpId(RECEIVED_OPID) which delegates to PeerMessageQueue::GetLastOpIdInLog(). 2. Merge PeerMessageQueue::Init() into the PeerMessageQueue constructor. This allows us to remove one lifecycle state and allows us to guarantee that, once the queue is constructed, we can always get a valid last-logged opid from it (see #1). 3. Make TabletMetadata::tombstone_last_logged_opid() return a boost::optional. We need to clearly differentiate between when we know the last-logged opid and when we don't. We also consider MinimumOpId() to be equal to boost::none at superblock load time, since previous versions of Kudu may have written (0,0) into the TabletMetadata 'tombstone_last_logged_opid' field. Change-Id: Ia4e4501a61cd40fdee0dc918b77675a0bc2515e7 --- M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/consensus_peers-test.cc M src/kudu/consensus/consensus_queue-test.cc M src/kudu/consensus/consensus_queue.cc M src/kudu/consensus/consensus_queue.h M src/kudu/consensus/log.cc M src/kudu/consensus/log.h M src/kudu/consensus/mt-log-test.cc M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_replica-test.cc M src/kudu/tools/kudu-tool-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_source_session.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc 19 files changed, 227 insertions(+), 242 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/7717/2 -- To view, visit http://gerrit.cloudera.org:8080/7717 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia4e4501a61cd40fdee0dc918b77675a0bc2515e7 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] consensus: Improve contract for API to fetch last-logged op id
Mike Percy has posted comments on this change. Change subject: consensus: Improve contract for API to fetch last-logged op id .. Patch Set 1: (15 comments) http://gerrit.cloudera.org:8080/#/c/7717/1/src/kudu/consensus/consensus_peers-test.cc File src/kudu/consensus/consensus_peers-test.cc: Line 210: ASSERT_OPID_EQ(first, message_queue_->GetLastOpIdInLog()); > warning: local copy '_left210' of the variable 'first' is never modified; c Done http://gerrit.cloudera.org:8080/#/c/7717/1/src/kudu/consensus/consensus_queue.cc File src/kudu/consensus/consensus_queue.cc: Line 125: log_cache_(metric_entity, std::move(log), local_peer_pb_.permanent_uuid(), tablet_id_), > warning: passing result of std::move() as a const reference argument; no mo Done Line 126: metrics_(std::move(metric_entity)), > warning: passing result of std::move() as a const reference argument; no mo Done Line 143: log_cache_.Init(queue_state_.last_appended); > mind adding a TODO to merge LogCache::Init into the LogCache ctor, given we Done http://gerrit.cloudera.org:8080/#/c/7717/1/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: Line 245: queue.get(), > in the case that one of the RETURN_NOT_OK calls below fails, then you'll en Done Line 250: pending->SetInitialCommittedOpId(info.last_committed_id); > I believe it was on purpose that this was called after the appending of the Ah, thank you for the catch. This caused a bug that I was having trouble diagnosing. PS1, Line 1492: opt_local_last_logged_opid) ? > you can use .value_or(MinimumOpId()) Actually, in this patch I can bypass this because we know RaftConsensus is running so we can use queue_->GetLastOpIdInLog() Line 1493: : MinimumOpId(); > is this right? I thought, if we don't know our own local op id, then it wou Not sure what I was thinking. Removed. Line 2226: if (!queue_ || !pending_) return boost::none; > think it's worth a comment here explaining the cases where we could hit thi Done PS1, Line 2234: default: : return boost::none; > is there another valid value to pass here? not FATAL? Hrm, yeah. It's a protobuf enum that is exposed to a remote read API via RPC, so for forward compatibility I'll make it a DFATAL and return boost::none. http://gerrit.cloudera.org:8080/#/c/7717/1/src/kudu/tablet/tablet_metadata.h File src/kudu/tablet/tablet_metadata.h: Line 216: void SetPreFlushCallback(StatusClosure callback) { pre_flush_callback_ = callback; } > warning: parameter 'callback' is passed by value and only copied once; cons Done http://gerrit.cloudera.org:8080/#/c/7717/1/src/kudu/tserver/tablet_copy_client.cc File src/kudu/tserver/tablet_copy_client.cc: Line 134: if (last_logged_opid && last_logged_opid->term() > caller_term) { > same question as in the Vote case -- is it correct to allow replacement if I'm not sure what else we can do. If we have tombstoned a failed replica, we will not know the last-logged opid, and this covers that case. Line 237: if (last_logged_opid && last_logged_opid->term() > remote_cstate_->current_term()) { > same see above http://gerrit.cloudera.org:8080/#/c/7717/1/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: Line 460: if (last_logged_opid) { > same question see my response in the in other file Line 474: int64_t last_logged_term = opt_last_logged_opid->term(); > is this guaranteed to be non-none? Ah, indeed no. I'll give it the same treatment as the others for now. -- To view, visit http://gerrit.cloudera.org:8080/7717 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia4e4501a61cd40fdee0dc918b77675a0bc2515e7 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-871. Support tombstoned voting
Mike Percy has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/6960/7/src/kudu/tserver/tablet_service.cc File src/kudu/tserver/tablet_service.cc: Line 115: using kudu::consensus::MinimumOpId; > warning: using decl 'MinimumOpId' is unused [misc-unused-using-decls] Done Line 117: using kudu::consensus::OpIdBiggerThan; > warning: using decl 'OpIdBiggerThan' is unused [misc-unused-using-decls] Done -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-2089: Failed java tests can orphan test-tmp files
Jun He has uploaded a new change for review. http://gerrit.cloudera.org:8080/7719 Change subject: KUDU-2089: Failed java tests can orphan test-tmp files .. KUDU-2089: Failed java tests can orphan test-tmp files This changes the MiniKuduCluster code to add the paths to pathsToDelete before calling configureAndStartProcess. If configureAndStartProcess throws an exception, the test-tmp files will be still deleted during shutdown. Change-Id: I53c04987b6683dc959c319d384657c28b4671168 --- M java/kudu-client/src/test/java/org/apache/kudu/client/MiniKuduCluster.java 1 file changed, 6 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/19/7719/1 -- To view, visit http://gerrit.cloudera.org:8080/7719 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I53c04987b6683dc959c319d384657c28b4671168 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Jun He