[kudu-CR] KUDU-871. Support tombstoned voting
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6960 to look at the new patch set (#6). Change subject: KUDU-871. Support tombstoned voting .. KUDU-871. Support tombstoned voting This patch makes it possible for tombstoned tablet replicas to vote in Raft elections. Changes: * Add Stop() method to TabletReplica + Consensus lifecycle. * Includes new STOPPED state. * Tombstoning a replica should call Stop(). * Deleting a replica should call Shutdown(). * Add positive and negative tests for tombstoned voting. * Add a stress test that induces lots of tombstoned voting while running TabletCopy, TabletBootstrap, and DeleteTablet. * Protect TabletMetadata::tombstone_last_logged_opid_ with a lock. * Fix DeleteTableITest.TestMergeConsensusMetadata after tombstoned voting changed its assumption that tombstoned tablets would not vote. * Fix several tests that expected tombstoned tablets to be SHUTDOWN when now they are STOPPED. Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 --- M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/cluster_itest_util.cc M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/integration-tests/delete_table-itest.cc M src/kudu/integration-tests/external_mini_cluster-itest-base.cc A src/kudu/integration-tests/tombstoned_voting-itest.cc A src/kudu/integration-tests/tombstoned_voting-stress-test.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_replica-test.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/kudu-ts-cli-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/tserver/tserver-path-handlers.cc 27 files changed, 1,145 insertions(+), 239 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/6960/6 -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-871. Support tombstoned voting
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6960 to look at the new patch set (#5). Change subject: KUDU-871. Support tombstoned voting .. KUDU-871. Support tombstoned voting This patch makes it possible for tombstoned tablet replicas to vote in Raft elections. Changes: * Add Stop() method to TabletReplica + Consensus lifecycle. * Includes new STOPPED state. * Tombstoning a replica should call Stop(). * Deleting a replica should call Shutdown(). * Add positive and negative tests for tombstoned voting. * Add a stress test that induces lots of tombstoned voting while running TabletCopy, TabletBootstrap, and DeleteTablet. * Protect TabletMetadata::tombstone_last_logged_opid_ with a lock. * Fix DeleteTableITest.TestMergeConsensusMetadata after tombstoned voting changed its assumption that tombstoned tablets would not vote. * Fix several tests that expected tombstoned tablets to be SHUTDOWN when now they are STOPPED. Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 --- M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/cluster_itest_util.cc M src/kudu/integration-tests/cluster_itest_util.h M src/kudu/integration-tests/delete_table-itest.cc M src/kudu/integration-tests/external_mini_cluster-itest-base.cc A src/kudu/integration-tests/tombstoned_voting-itest.cc A src/kudu/integration-tests/tombstoned_voting-stress-test.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_replica-test.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/kudu-ts-cli-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/tablet_copy_source_session-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/tserver/tserver-path-handlers.cc 27 files changed, 1,149 insertions(+), 242 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/6960/5 -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-871. Support tombstoned voting
Mike Percy has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting .. Patch Set 4: (32 comments) http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: Line 188: Status RaftConsensus::Init() { > I think the DCHECK on state here was useful, even if it needs to be broaden Done PS4, Line 1493: tombstone_last_logged_opid ? *tombstone_last_logged_opid : : GetLatestOpIdFromLog(); > could we simplify by always using 'max()' of the entry from the metadata an I don't think so; see my response to the same question in tablet_service.cc However, I do think we should use the latest opid in the log if Consensus is currently running, since it is guaranteed to be correct in that case. Line 1497: LOG_WITH_PREFIX_UNLOCKED(INFO) << "voting while tombstoned"; > any way to improve this with a bit more info? I'll add the tombstone_last_logged_opid here but I can't think of a reason to log more than that because we already log the details whenever we vote. Line 2004: state_ = new_state; > given this is in every case, move it down below the switch? Done Line 2513: Status RaftConsensus::CheckSafeToVoteUnlocked(optional tombstone_last_logged_opid) const { > warning: the parameter 'tombstone_last_logged_opid' is copied for each invo Done PS4, Line 2515: (!tombstone_last_logged_opid > not 100% following why this portion of the condition is necessary if we are voting while tombstoned then we don't have to be running PS4, Line 2520: state_ == kShutdown > should we instead be checking the expected state (ie kStopped)? aren't ther We can tombstoned vote in kInit and kStopped state, even in kRunning actually due to a race between the check in TabletService and execution in RaftConsensus. It's not possible to expose RaftConsensus in kNew state due to code in TabletReplica however it would be better to encapsulate that guarantee within this class so I'll add a Create() factory method. Line 2634: return kMinimumTerm; > under what circumstances do we hit this case? It seems like cmeta_ should b Good catch; this was left over from a previous attempt on this patch and is not possible with the current rev. However I agree with you that a factory method for RaftConsensus would be preferable here. Line 2685: if (cmeta_) { > same removed http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/integration-tests/cluster_itest_util.h File src/kudu/integration-tests/cluster_itest_util.h: Line 237:const int64_t candidate_term, > warning: parameter 'candidate_term' is const-qualified in the function decl Done PS4, Line 239:boost::optional ignore_live_leader, :boost::optional is_pre_election, > what's the point of making these optional? don't they have well-defined def I thought it was nice to pass them as optional so that if you don't specify them you get the default behavior as specified in the proto file. http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/integration-tests/tombstoned_voting-stress-test.cc File src/kudu/integration-tests/tombstoned_voting-stress-test.cc: Line 38: using kudu::consensus::MakeOpId; > warning: using decl 'MakeOpId' is unused [misc-unused-using-decls] Done Line 39: using kudu::consensus::LeaderStepDownResponsePB; > warning: using decl 'LeaderStepDownResponsePB' is unused [misc-unused-using Done Line 41: using kudu::consensus::RaftConsensus; > warning: using decl 'RaftConsensus' is unused [misc-unused-using-decls] Done Line 42: using kudu::consensus::RaftPeerPB; > warning: using decl 'RaftPeerPB' is unused [misc-unused-using-decls] Done Line 47: using kudu::tablet::TabletStatePB; > warning: using decl 'TabletStatePB' is unused [misc-unused-using-decls] Done Line 48: using kudu::tserver::TabletServerErrorPB; > warning: using decl 'TabletServerErrorPB' is unused [misc-unused-using-decl Done http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/tablet/metadata.proto File src/kudu/tablet/metadata.proto: Line 135: // Tablet states are sent in TabletReports and kept in TabletReplica. > what effect will these changes have if there is a rolling upgrade with an o Based on a grep through the code, the master only knows about tablet::RUNNING and tablet::FAILED so it shouldn't even notice. http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/tablet/tablet_metadata.h File src/kudu/tablet/tablet_metadata.h: Line 216: void SetPreFlushCallback(StatusClosure callback) { pre_flush_callback_ = callback; } > warning: parameter 'callback' is passed by value and only copied once; cons Done http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: Line 133: set_state(INITIALIZED); > maybe moot based
[kudu-CR] Give more context on errors reading cfiles
Grant Henke has posted comments on this change. Change subject: Give more context on errors reading cfiles .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/7620/1/src/kudu/cfile/cfile-test.cc File src/kudu/cfile/cfile-test.cc: Line 852: ASSERT_STR_MATCHES(s.ToString(), "block [0-9]+"); This test doesn't necessarily validate every error message. We corrupt each bit, but not necessarily to trigger ever "kind" of failure. I don't think a test like that is reasonable or worth it. But I thought I would mention it. http://gerrit.cloudera.org:8080/#/c/7620/1/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: Line 196: return Status::Corruption("invalid CFile header size", std::to_string(header_size)); Should the block_id be given here too? (and other similar size/parse checks) -- To view, visit http://gerrit.cloudera.org:8080/7620 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0bda5688a020032c512235ee574cb3e53c7872af Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] [maintenance manager] fix op scheduling lock contention, thread ID tweak
Todd Lipcon has posted comments on this change. Change subject: [maintenance manager] fix op scheduling lock contention, thread ID tweak .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7621/1/src/kudu/util/maintenance_manager.proto File src/kudu/util/maintenance_manager.proto: Line 35: required sfixed64 thread_id = 1; why fixed? -- To view, visit http://gerrit.cloudera.org:8080/7621 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I018ebe65c71344d8911ff66848478f75fef085af Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] consensus: Don't replay config changes
Todd Lipcon has submitted this change and it was merged. Change subject: consensus: Don't replay config changes .. consensus: Don't replay config changes We have invariants in place that make it unnecessary to "replay" a config change operation, so we will now simply check those invariants during tablet bootstrap. This patch removes plumbing of ConsensusMetadataManager into TabletBootstrap, since it is not needed there, and instead simply passes in the previously-committed raft config from the consensus metadata in order to check for correctness. Change-Id: Id62bf96b21560b2ef5838415e52aeb3720875e62 Reviewed-on: http://gerrit.cloudera.org:8080/7614 Tested-by: Kudu Jenkins Reviewed-by: Todd Lipcon--- M src/kudu/consensus/consensus_meta_manager.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/tablet_bootstrap-test.cc M src/kudu/tablet/tablet_bootstrap.cc M src/kudu/tablet/tablet_bootstrap.h M src/kudu/tserver/ts_tablet_manager.cc 6 files changed, 55 insertions(+), 60 deletions(-) Approvals: Todd Lipcon: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7614 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Id62bf96b21560b2ef5838415e52aeb3720875e62 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] consensus: Don't replay config changes
Todd Lipcon has posted comments on this change. Change subject: consensus: Don't replay config changes .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7614 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id62bf96b21560b2ef5838415e52aeb3720875e62 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] disk failure: release failed txs from tracker
Todd Lipcon has posted comments on this change. Change subject: disk failure: release failed txs from tracker .. Patch Set 8: (6 comments) http://gerrit.cloudera.org:8080/#/c/7439/8/src/kudu/tablet/mvcc.h File src/kudu/tablet/mvcc.h: Line 404: void Cancel(); here's an idea: how about rather than specifically cancelling each transaction, there is an MvccManager::Shutdown() call, and then we allow AbortTransaction() of an APPLYING transaction only in the case that the MvccManager has entered shutdown state? then we don't need new per-transaction states, and it more accurately captures the idea that "you can only cancel operations once the entire tablet ahs started shutting itself down" (and we can still ensure that, on destruction, the map has been cleared out) http://gerrit.cloudera.org:8080/#/c/7439/8/src/kudu/tablet/transactions/transaction_driver.cc File src/kudu/tablet/transactions/transaction_driver.cc: Line 481: CHECK(s.IsDiskFailure()); per conversation on Slack, I think generalizing this to something like the following would separate the concerns a bit better: - on a disk error, mark the tablet as shutdown - when attempting to apply operations on a shutdown tablet, return Status::Aborted - if we get Status::Aborted here, we can CHECK that the tablet is in Shutdown() state and then go to the cancellation path -- basically this would be the same code as in HandleFailure today, except widened the scope to allow cancelling a replicating/replicated transaction in the case that the tablet is in failed state? (eg by taking a new parameter to HandleFailure indicating that it's due to the tablet shutdown?) http://gerrit.cloudera.org:8080/#/c/7439/8/src/kudu/tablet/transactions/write_transaction.cc File src/kudu/tablet/transactions/write_transaction.cc: Line 134: if (PREDICT_FALSE(tablet->IsInFailedDir())) { making ApplyRowOperations return a Status would allow centralizing this logic into Tablet a bit better and also generalizing to all "Shutdown" cases instead of just specifically the disk-failure one Line 139: tablet->ApplyRowOperations(state()); here if Tablet indicated that it has shut down then you can early-return without creating a commit message. that would be a pretty good safety net that we don't accidentally end up writing a commit message http://gerrit.cloudera.org:8080/#/c/7439/8/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: Line 1076: tablet->SetInFailedDir(); again maybe generalize to just 'MarkFailed' since the handling isn't particular to the reason for the failure. (eg if we detected some internal inconsistency like a bad checksum or something we may also want to markfailed even though it's not in a failed dir) Line 1078: LOG(FATAL) << Substitute("Data directory $0 failed. Disk failure handling not implemented", now implemented? -- To view, visit http://gerrit.cloudera.org:8080/7439 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I983620f27e7226806a2cca253db7619731914d42 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] consensus: Don't replay config changes
Hello Todd Lipcon, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7614 to look at the new patch set (#4). Change subject: consensus: Don't replay config changes .. consensus: Don't replay config changes We have invariants in place that make it unnecessary to "replay" a config change operation, so we will now simply check those invariants during tablet bootstrap. This patch removes plumbing of ConsensusMetadataManager into TabletBootstrap, since it is not needed there, and instead simply passes in the previously-committed raft config from the consensus metadata in order to check for correctness. Change-Id: Id62bf96b21560b2ef5838415e52aeb3720875e62 --- M src/kudu/consensus/consensus_meta_manager.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/tablet_bootstrap-test.cc M src/kudu/tablet/tablet_bootstrap.cc M src/kudu/tablet/tablet_bootstrap.h M src/kudu/tserver/ts_tablet_manager.cc 6 files changed, 55 insertions(+), 60 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/14/7614/4 -- To view, visit http://gerrit.cloudera.org:8080/7614 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id62bf96b21560b2ef5838415e52aeb3720875e62 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Dan Burkert has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 13: (19 comments) http://gerrit.cloudera.org:8080/#/c/7207/5//COMMIT_MSG Commit Message: PS5, Line 12: potentionaly typo: potentially http://gerrit.cloudera.org:8080/#/c/7207/13//COMMIT_MSG Commit Message: PS13, Line 10: creates s/creates which happen/new blocks PS13, Line 14: I FinalizeWrite() t add commas (API, FinalizeWrite(), to...) PS13, Line 15: resued reused PS13, Line 15: , no comma here Line 19: https://docs.google.com/a/cloudera.com/document/d/15zZaKeLENRGC_AdjF2N3hxtM_6_OuydXA1reWvDbbK0/edit?usp=sharing I think you may need to make this viewable from anyone (I tried in an incognito window and it asked me to log into my cloudera gmail account). http://gerrit.cloudera.org:8080/#/c/7207/13/src/kudu/fs/block_manager.h File src/kudu/fs/block_manager.h: PS13, Line 74: results s/results/resulting Line 88: // No more data may be written to the block, but it is not yet closed. 'not yet closed' doesn't really distinguish this state from CLOSED. I think this would be more clear as 'but it is not yet guaranteed to be durably stored on disk'. Line 281: BlockTransaction() {} prefer 'BlockTransaction() = default;' Line 283: ~BlockTransaction() { Can't this just be set to the default dtor? ~BlockTransaction() = default; PS13, Line 288: push_back( emplace_back PS13, Line 293: CommitCreation Might be more clear as 'CommitCreatedBlocks'. Line 305: Status s = bm->CloseBlocks(blocks); Would it be difficult to change BlockManager::CloseBlocks to take a 'const vector&' so you don't have to make a vector copy here? If that would require changing a bunch of other call sites, feel free to ignore. http://gerrit.cloudera.org:8080/#/c/7207/13/src/kudu/fs/file_block_manager.cc File src/kudu/fs/file_block_manager.cc: Line 727: // dirty pages if 'block_manager_flush_on_finalize' is true. I think this should be doing an async flush regardless of the block_manager_flush_on_finalize flag value. Unfortunately I think that means you will need to copy most of the body of Finalize and remove the check (maybe abstract it out?), but I think we definitely want to parallel flushing to happen here. block_coalesce_close shouldn't be tied to the flush on finalize flag. http://gerrit.cloudera.org:8080/#/c/7207/13/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 278: Status FinishBlock(const Status& s, WritableBlock* block, It looks like the only place that calls FinishBlock calls it with an OK status, so I think the s param can be removed. PS13, Line 522: unnecessarily unnecessary Line 1234: Status DoClose(CloseFlags flags); Could DoClose and CloseFlags be private? Line 1911: // dirty pages if 'block_manager_flush_on_finalize' is true. Same thing here - I think we want to do this depending only on the block_coalesce_close flag. PS13, Line 1917: belong s/belong/belonging -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao Hao Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] consensus: Don't replay config changes
Hello Todd Lipcon, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7614 to look at the new patch set (#3). Change subject: consensus: Don't replay config changes .. consensus: Don't replay config changes We have invariants in place that make it unnecessary to "replay" a config change operation, so we will now simply check those invariants during tablet bootstrap. This patch removes plumbing of ConsensusMetadataManager into TabletBootstrap, since it is not needed there, and instead simply passes in the previously-committed raft config from the consensus metadata in order to check for correctness. Change-Id: Id62bf96b21560b2ef5838415e52aeb3720875e62 --- M src/kudu/consensus/consensus_meta_manager.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/tablet_bootstrap-test.cc M src/kudu/tablet/tablet_bootstrap.cc M src/kudu/tablet/tablet_bootstrap.h M src/kudu/tserver/ts_tablet_manager.cc 6 files changed, 54 insertions(+), 59 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/14/7614/3 -- To view, visit http://gerrit.cloudera.org:8080/7614 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id62bf96b21560b2ef5838415e52aeb3720875e62 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Add Maintenance Manager visualizer
Dan Burkert has posted comments on this change. Change subject: Add Maintenance Manager visualizer .. Patch Set 7: (2 comments) http://gerrit.cloudera.org:8080/#/c/7570/7/www/maintenance-manager.mustache File www/maintenance-manager.mustache: Line 173: } else if (millis < 1000 * 60) { > I think a better threshold here might be 300 or something... otherwise anyw My personal preference is to always make it the largest possible unit, but keep around 3 significant digits of precision. PS7, Line 180: // Translates a thread id hex string into a counting-number identifier. : function threadString(thread_id) { : return "Thread " + (threads.indexOf(thread_id) + 1); : } > would rather figure a way to show pids here so you can correlate with trace https://gerrit.cloudera.org/#/c/7621/ switches the MM to use the system TID -- To view, visit http://gerrit.cloudera.org:8080/7570 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If2e7f18ac4834791a94d935b540930b11ea14532 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sam OkrentGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sam Okrent Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [maintenance manager] fix op scheduling lock contention, thread ID tweak
Hello Jean-Daniel Cryans, Todd Lipcon, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/7621 to review the following change. Change subject: [maintenance manager] fix op scheduling lock contention, thread ID tweak .. [maintenance manager] fix op scheduling lock contention, thread ID tweak 943b1ae26e62a0c82 introduced a lock acquisition on the MM global lock when a worker thread begins running an op. This resulted in measurable lock contention with many MM threads, for instance the dense_node-itest which creates 100 MM threads. This commit introduces a finer grained lock around the currently running ops container, which reduces the contention on the global lock. I've also snuck in a change to make the thread ID correspond to the system thread ID, instead of a synthetic C++ stdlib ID. Change-Id: I018ebe65c71344d8911ff66848478f75fef085af --- M src/kudu/util/maintenance_manager.cc M src/kudu/util/maintenance_manager.h M src/kudu/util/maintenance_manager.proto 3 files changed, 30 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/7621/1 -- To view, visit http://gerrit.cloudera.org:8080/7621 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I018ebe65c71344d8911ff66848478f75fef085af Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Add tablet state summary metrics and fix KUDU-2044
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7618 to look at the new patch set (#2). Change subject: Add tablet state summary metrics and fix KUDU-2044 .. Add tablet state summary metrics and fix KUDU-2044 This patch adds metrics for the number of tablets in each state (i.e. the number of tablets for each value of TabletStatePB). The numbers are computed by the heartbeater. There's two reasons for this: 1. the heartbeater was already computing the number of RUNNING and BOOTSTRAPPING tablets by holding the appropriate lock and iterating over the tablet map 2. the alternative to having some thread periodically iterate over the tablet map is to increment and decrement the metrics when tablets transition states. This is error-prone, particularly if new states are added, and mistakes will accumulate until the metric is worse than useless. It also addresses KUDU-2044: tombstoned tablets will no longer produce metrics in /metrics output. A metric entity can now be marked as hidden, so it will not print to /metrics, and tablet metric entities are marked as hidden when the tablet is tombstoned, and un-hidden if and when the tablet is revived. Change-Id: I8c82987ffe4f37e8af95972bde97841e44c521d9 --- M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/tablet/tablet.cc M src/kudu/tserver/heartbeater.cc M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/util/metrics.cc M src/kudu/util/metrics.h 8 files changed, 250 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/18/7618/2 -- To view, visit http://gerrit.cloudera.org:8080/7618 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8c82987ffe4f37e8af95972bde97841e44c521d9 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Kudu Jenkins
[kudu-CR] Give more context on errors reading cfiles
Hello David Ribeiro Alves, Grant Henke, Andrew Wong, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/7620 to review the following change. Change subject: Give more context on errors reading cfiles .. Give more context on errors reading cfiles Recently, a user reported an error loading a tablet in which the only reported message was "bad magic". This wasn't useful for pinpointing the id of the bad block or what might have happened to cause the problem. This patch adds more context info in such circumstances: we now include DebugString output for "bad magic" errors as well as the block ID in all cases. The unit test is updated to check that block IDs show up in all Corruption status messages. Change-Id: I0bda5688a020032c512235ee574cb3e53c7872af --- M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/index_btree.cc 3 files changed, 37 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/20/7620/1 -- To view, visit http://gerrit.cloudera.org:8080/7620 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I0bda5688a020032c512235ee574cb3e53c7872af Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Grant Henke
[kudu-CR] consensus: Don't replay config changes
Todd Lipcon has posted comments on this change. Change subject: consensus: Don't replay config changes .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7614 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id62bf96b21560b2ef5838415e52aeb3720875e62 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] consensus: Don't replay config changes
Mike Percy has posted comments on this change. Change subject: consensus: Don't replay config changes .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/7614/1/src/kudu/tablet/tablet_bootstrap.cc File src/kudu/tablet/tablet_bootstrap.cc: Line 81: using consensus::ChangeConfigRecordPB; > yea Done PS1, Line 1418: IllegalState > maybe Corruption is more appropriate Done -- To view, visit http://gerrit.cloudera.org:8080/7614 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id62bf96b21560b2ef5838415e52aeb3720875e62 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] consensus: Don't replay config changes
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7614 to look at the new patch set (#2). Change subject: consensus: Don't replay config changes .. consensus: Don't replay config changes We have invariants in place that make it unnecessary to "replay" a config change operation, so we will now simply check those invariants during tablet bootstrap. This patch removes plumbing of ConsensusMetadataManager into TabletBootstrap, since it is not needed there, and instead simply passes in the previously-committed raft config from the consensus metadata in order to check for correctness. Change-Id: Id62bf96b21560b2ef5838415e52aeb3720875e62 --- M src/kudu/consensus/consensus_meta_manager.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/tablet_bootstrap-test.cc M src/kudu/tablet/tablet_bootstrap.cc M src/kudu/tablet/tablet_bootstrap.h M src/kudu/tserver/ts_tablet_manager.cc 6 files changed, 54 insertions(+), 59 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/14/7614/2 -- To view, visit http://gerrit.cloudera.org:8080/7614 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id62bf96b21560b2ef5838415e52aeb3720875e62 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [thirdparty]: added include-what-you-use
Todd Lipcon has posted comments on this change. Change subject: [thirdparty]: added include-what-you-use .. Patch Set 3: (2 comments) any idea what the incremental compilation time cost is for thirdparty on this? http://gerrit.cloudera.org:8080/#/c/7593/3/thirdparty/download-thirdparty.sh File thirdparty/download-thirdparty.sh: PS3, Line 55: local URL_PREFIX=$3 : if [ -z "$URL_PREFIX" ]; then : URL_PREFIX="${CLOUDFRONT_URL_PREFIX}" : fi is this used? http://gerrit.cloudera.org:8080/#/c/7593/3/thirdparty/vars.sh File thirdparty/vars.sh: PS3, Line 148: IWYU_NAME=llvm-$LLVM_VERSION.src : IWYU_SOURCE=$TP_SOURCE_DIR/$IWYU_NAME are these actually used, then? -- To view, visit http://gerrit.cloudera.org:8080/7593 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I041ad9e4bbff410f1760c7abd8cd173e5e9cc564 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-2060: Show primary keys in the master's table web UI page
Todd Lipcon has submitted this change and it was merged. Change subject: KUDU-2060: Show primary keys in the master's table web UI page .. KUDU-2060: Show primary keys in the master's table web UI page Change-Id: I5563f2bbe31e0b61e1f0f222327c2a4837250fb7 Reviewed-on: http://gerrit.cloudera.org:8080/7569 Tested-by: Kudu Jenkins Reviewed-by: Todd Lipcon--- M src/kudu/server/webui_util.cc M src/kudu/tserver/tablet_server-test.cc 2 files changed, 6 insertions(+), 2 deletions(-) Approvals: Todd Lipcon: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I5563f2bbe31e0b61e1f0f222327c2a4837250fb7 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: srisaikumarravip...@inspur.com Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2060: Show primary keys in the master's table web UI page
Todd Lipcon has posted comments on this change. Change subject: KUDU-2060: Show primary keys in the master's table web UI page .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5563f2bbe31e0b61e1f0f222327c2a4837250fb7 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: srisaikumarravip...@inspur.com Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd LipconGerrit-HasComments: No
[kudu-CR] disk failure: release failed txs from tracker
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7439 to look at the new patch set (#8). Change subject: disk failure: release failed txs from tracker .. disk failure: release failed txs from tracker Currently, if a TransactionState object is deleted while it is Applying, Kudu will crash as a way to prevent the failed transaction from persisting incorrect state onto disk. However, this is not always necessary: transactions that fail due to disk failure may want to keep the server alive by marking the tablet as unusable. In this case, there needs to be a way to release transactions from the TransactionTracker without crashing Kudu, regardless of state. This patch adds the ability to Cancel transactions. Unlike Aborts, which ensure the transaction is not in the APPLYING state when destructed, a Canceled transaction has no such constraint. This is only "safe" because the TransactionTracker's tablet is expected to shut down and no longer be used. Additionally, a new flag is added to Tablet to indicate that it resides on a failed disk. If set, the tablet's transactions will end early. Currently, this codepath will not have a visible effect on Kudu as disk failures are still fatal, but it is tested for. Testing is done in a couple ways: - A test is added in mvcc-test to Cancel an Applying ScopedTransaction and ensure that there are no errors when it leaves scope. - A test is added to tablet_replica-test to register a WriteTransaction, set the tablet's "in-failed-dir" flag, and begin Applying. The transaction exits early and releases itself from MVCC. This is part of a series of patches to address disk failure. To see how this patch fits in, see section 2.3 of: https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit Change-Id: I983620f27e7226806a2cca253db7619731914d42 --- M src/kudu/tablet/mvcc-test.cc M src/kudu/tablet/mvcc.cc M src/kudu/tablet/mvcc.h M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h M src/kudu/tablet/tablet_replica-test.cc M src/kudu/tablet/transactions/alter_schema_transaction.cc M src/kudu/tablet/transactions/alter_schema_transaction.h M src/kudu/tablet/transactions/transaction.h M src/kudu/tablet/transactions/transaction_driver.cc M src/kudu/tablet/transactions/transaction_driver.h M src/kudu/tablet/transactions/transaction_tracker-test.cc M src/kudu/tablet/transactions/write_transaction.cc M src/kudu/tablet/transactions/write_transaction.h M src/kudu/tserver/ts_tablet_manager.cc 15 files changed, 133 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/39/7439/8 -- To view, visit http://gerrit.cloudera.org:8080/7439 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I983620f27e7226806a2cca253db7619731914d42 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] disk failure: release failed txs from tracker
Andrew Wong has posted comments on this change. Change subject: disk failure: release failed txs from tracker .. Patch Set 6: (3 comments) http://gerrit.cloudera.org:8080/#/c/7439/6/src/kudu/tablet/transactions/transaction.h File src/kudu/tablet/transactions/transaction.h: Line 20: #include > warning: #includes are not sorted properly [llvm-include-order] Done http://gerrit.cloudera.org:8080/#/c/7439/6/src/kudu/tablet/transactions/transaction_driver.cc File src/kudu/tablet/transactions/transaction_driver.cc: Line 45: using std::shared_ptr; > warning: using decl 'shared_ptr' is unused [misc-unused-using-decls] Done http://gerrit.cloudera.org:8080/#/c/7439/6/src/kudu/tablet/transactions/transaction_tracker-test.cc File src/kudu/tablet/transactions/transaction_tracker-test.cc: Line 24: #include "kudu/tablet/transactions/transaction_driver.h" > warning: #includes are not sorted properly [llvm-include-order] Done -- To view, visit http://gerrit.cloudera.org:8080/7439 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I983620f27e7226806a2cca253db7619731914d42 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] disk failure: release failed txs from tracker
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7439 to look at the new patch set (#7). Change subject: disk failure: release failed txs from tracker .. disk failure: release failed txs from tracker Currently, if a TransactionState object is deleted while it is Applying, Kudu will crash as a way to prevent the failed transaction from persisting incorrect state onto disk. However, this is not always necessary: transactions that fail due to disk failure may want to keep the server alive by marking the tablet as unusable. In this case, there needs to be a way to release transactions from the TransactionTracker without crashing Kudu, regardless of state. This patch adds the ability to Cancel transactions. Unlike Aborts, which ensure the transaction is not in the APPLYING state when destructed, a Canceled transaction has no such constraint. This is only "safe" because the TransactionTracker's tablet is expected to shut down and no longer be used. Additionally, a new flag is added to Tablet to indicate that it resides on a failed disk. If set, the tablet's transactions will end early. Currently, this codepath will not have a visible effect on Kudu as disk failures are still fatal, but it is tested for. Testing is done in a couple ways: - A test is added in mvcc-test to Cancel an Applying ScopedTransaction and ensure that there are no errors when it leaves scope. - A test is added to tablet_replica-test to register a WriteTransaction, set the tablet's "in-failed-dir" flag, and being Applying. The transaction exits early and releases itself from MVCC. This is part of a series of patches to address disk failure. To see how this patch fits in, see section 2.3 of: https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit Change-Id: I983620f27e7226806a2cca253db7619731914d42 --- M src/kudu/tablet/mvcc-test.cc M src/kudu/tablet/mvcc.cc M src/kudu/tablet/mvcc.h M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h M src/kudu/tablet/tablet_replica-test.cc M src/kudu/tablet/transactions/alter_schema_transaction.cc M src/kudu/tablet/transactions/alter_schema_transaction.h M src/kudu/tablet/transactions/transaction.h M src/kudu/tablet/transactions/transaction_driver.cc M src/kudu/tablet/transactions/transaction_driver.h M src/kudu/tablet/transactions/transaction_tracker-test.cc M src/kudu/tablet/transactions/write_transaction.cc M src/kudu/tablet/transactions/write_transaction.h M src/kudu/tserver/ts_tablet_manager.cc 15 files changed, 133 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/39/7439/7 -- To view, visit http://gerrit.cloudera.org:8080/7439 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I983620f27e7226806a2cca253db7619731914d42 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2060: Show primary keys in the master's table web UI page
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7569 to look at the new patch set (#2). Change subject: KUDU-2060: Show primary keys in the master's table web UI page .. KUDU-2060: Show primary keys in the master's table web UI page Change-Id: I5563f2bbe31e0b61e1f0f222327c2a4837250fb7 --- M src/kudu/server/webui_util.cc M src/kudu/tserver/tablet_server-test.cc 2 files changed, 6 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/7569/2 -- To view, visit http://gerrit.cloudera.org:8080/7569 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5563f2bbe31e0b61e1f0f222327c2a4837250fb7 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: srisaikumarravip...@inspur.com Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] Add tablet state summary metrics and fix KUDU-2044
Will Berkeley has uploaded a new change for review. http://gerrit.cloudera.org:8080/7618 Change subject: Add tablet state summary metrics and fix KUDU-2044 .. Add tablet state summary metrics and fix KUDU-2044 This patch adds metrics for the number of tablets in each state (i.e. the number of tablets for each value of TabletStatePB). The numbers are computed by the heartbeater. There's two reasons for this: 1. the heartbeater was already computing the number of RUNNING and BOOTSTRAPPING tablets by holding the appropriate lock and iterating over the tablet map 2. the alternative to having some thread periodically iterate over the tablet map is to increment and decrement the metrics when tablets transition states. This is error-prone, particularly if new states are added, and mistakes will accumulate until the metric is worse than useless. It also addresses KUDU-2044: tombstoned tablets will no longer produce metrics in /metrics output. A metric entity can now be marked as hidden, so it will not print to /metrics, and tablet metric entities are marked as hidden when the tablet is tombstoned, and un-hidden if and when the tablet is revived. Change-Id: I8c82987ffe4f37e8af95972bde97841e44c521d9 --- M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/tablet/tablet.cc M src/kudu/tserver/heartbeater.cc M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/util/metrics.cc M src/kudu/util/metrics.h 8 files changed, 249 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/18/7618/1 -- To view, visit http://gerrit.cloudera.org:8080/7618 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I8c82987ffe4f37e8af95972bde97841e44c521d9 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley
[kudu-CR] WIP KUDU-1489: move tablet metadata
Andrew Wong has uploaded a new change for review. http://gerrit.cloudera.org:8080/7617 Change subject: WIP KUDU-1489: move tablet metadata .. WIP KUDU-1489: move tablet metadata WIP: a few areas still need cleaning (e.g. data_dirs.cc) Some terminology: - Root: a top-level directory specified by the user via fs_data_dirs. They are canonicalized by the dir manager. - Dir: a suffix used by subdirectories of the roots. Generally should not be used alone, but rather as a suffix (e.g. DataDir). - DataDir: a */data subdirectory of a root - each DataDir has a path instance file - TabletMetadataDir: a */tablet-metadata subdirectory of a root - ConsensusMetadataDir: a */consensus-metadata subdirectory of a root Changes to the tablet lifecycle with striping: - During DataDirManager::Open(), scan through all the directories and map all the tablets' current (c)metadata directories - Additionally, during calls to ListTabletIds(), scan through all directories and update the tablets' current directories - If there are duplicates (not an expected case, but will likely happen), try to delete those that aren't in the disk group - When writing the metas (TabletMetadata::ReplaceSuperBlockUnlocked()), check to see whether the metadata dir is appropriate, if not, write it to a different dir. The crux of this patch is aimed at moving metadata when initializing the TSTabletManager. Rather than always getting metadata from the first data directory, all known directories (both WAL and data dirs) are scanned for metadata instances. Any metadata found in a directory it shouldn't be in is moved to one appropriate for it. An appropriate directory for a tablet's metadata is: 1. any directory in the tablet's directory group, or 2. the WAL dir (if tablet metadata striping is disabled) If striping is enabled, the metadata directory for each tablet will be decided when the tablet is created (or copied). Change-Id: Ia5308f8d4b4b738f353f511e2f1a2634b675a60a --- M src/kudu/consensus/consensus_meta-test.cc M src/kudu/consensus/consensus_meta.cc M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h M src/kudu/fs/fs_manager-test.cc M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/integration-tests/external_mini_cluster_fs_inspector.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/util/path_util.h 14 files changed, 557 insertions(+), 136 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/7617/1 -- To view, visit http://gerrit.cloudera.org:8080/7617 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ia5308f8d4b4b738f353f511e2f1a2634b675a60a Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong
[kudu-CR] KUDU-2033 (part 1). Add write and stop/start in a loop to TestLeaderFailover
Mike Percy has submitted this change and it was merged. Change subject: KUDU-2033 (part 1). Add write and stop/start in a loop to TestLeaderFailover .. KUDU-2033 (part 1). Add write and stop/start in a loop to TestLeaderFailover TestLeaderFailover is made more robust by adding a write loop in which the tablet leader is killed and restarted. The main idea is to check that the writes complete and can be read after the leader is killed. Done in a loop so we can be sure that we stay stable through multiple stops and restarts of the leader tablet. Change-Id: Ie93a15084a4c4c0748dc74c005b8313f443a5ba9 Reviewed-on: http://gerrit.cloudera.org:8080/7456 Reviewed-by: Mike PercyTested-by: Mike Percy --- M java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java A java/kudu-client/src/test/java/org/apache/kudu/client/TestMultipleLeaderFailover.java A java/kudu-client/src/test/java/org/apache/kudu/util/AssertHelpers.java 3 files changed, 137 insertions(+), 1 deletion(-) Approvals: Mike Percy: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/7456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ie93a15084a4c4c0748dc74c005b8313f443a5ba9 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Edward Fancher Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2033 (part 1). Add write and stop/start in a loop to TestLeaderFailover
Mike Percy has posted comments on this change. Change subject: KUDU-2033 (part 1). Add write and stop/start in a loop to TestLeaderFailover .. Patch Set 8: Apparently raft_consensus-itest is flaky under TSAN. Overriding unrelated test failure. -- To view, visit http://gerrit.cloudera.org:8080/7456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie93a15084a4c4c0748dc74c005b8313f443a5ba9 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Edward FancherGerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-2033 (part 1). Add write and stop/start in a loop to TestLeaderFailover
Mike Percy has posted comments on this change. Change subject: KUDU-2033 (part 1). Add write and stop/start in a loop to TestLeaderFailover .. Patch Set 8: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/7456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie93a15084a4c4c0748dc74c005b8313f443a5ba9 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Edward FancherGerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-2033 (part 1). Add write and stop/start in a loop to TestLeaderFailover
Mike Percy has posted comments on this change. Change subject: KUDU-2033 (part 1). Add write and stop/start in a loop to TestLeaderFailover .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie93a15084a4c4c0748dc74c005b8313f443a5ba9 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Edward FancherGerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] disk failure: reassign failed tablets
Andrew Wong has posted comments on this change. Change subject: disk failure: reassign failed tablets .. Patch Set 8: (5 comments) http://gerrit.cloudera.org:8080/#/c/7440/7/src/kudu/client/scanner-internal.cc File src/kudu/client/scanner-internal.cc: PS7, Line 232: case tserver::TabletServerErrorPB::TABLET_FAILED: // fall-through > would it make more sense to have this be like: TABLET_NOT_FOUND? How do we Hrm, maybe, but I'm keeping this as is for now. Reasoning here was that before when a tablet was in the FAILED state, we would treat it as TABLET_NOT_RUNNING. I'm looking in client/scanner-internal.cc and it seems like we blacklist the location for TNR (if there's somewhere else I should be looking, please let me know). I'm not sure it makes sense to retry on TNR. I suppose it could retry if the tablet were NOT_STARTED or BOOTSTRAPPING, but tablets in QUIESCING and SHUTDOWN are also considered NOT_RUNNING. http://gerrit.cloudera.org:8080/#/c/7440/7/src/kudu/consensus/consensus_peers.cc File src/kudu/consensus/consensus_peers.cc: PS7, Line 284: sponse_.error().code() == TabletServerErrorPB::TABLET_FAILED) > maybe in this case we should directly call: NotifyObserversOfFailedFollower Done. http://gerrit.cloudera.org:8080/#/c/7440/7/src/kudu/consensus/consensus_queue.cc File src/kudu/consensus/consensus_queue.cc: PS7, Line 638: // Initiate Tablet Copy on the peer if the tablet is not found. : if (response.has_error()) { : CHECK_EQ(tserver::TabletServerErrorPB::TABLET_NOT_FOUND, response.error().code()); : peer->needs_tablet_copy = true; : VLOG_WITH_PREFIX_UNLOCKED(1) << "Marked peer as needing tablet copy: " : << peer->ToString(); : *more_pending = true; : return; : } : : // Sanity checks. : // Some of these can be eventually removed, but they are handy for now. : DCHECK(response.status().IsInitialized()) : << "Error: Uninitialized: " << response.InitializationErrorString() : << ". Response: "<< SecureShortDebugString(response); : // TODO: Include uuid in error messages as well. : DCHECK(response.has_responder_uuid() && !response.responder_uuid().empty()) : > see my comment on the call site Done http://gerrit.cloudera.org:8080/#/c/7440/7/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: PS7, Line 170: DEFINE_bool(master_tombstone_failed_tablet_replicas, true, : "Whether the master should tombstone (delete) tablet replicas that " : "are reporting a failed state. Only for testing!"); : TAG_FLAG(master_tombstone_failed_tablet_replicas, hidden); > is this a test only thing? As of now, yes. Will update to make that clear. http://gerrit.cloudera.org:8080/#/c/7440/7/src/kudu/tablet/metadata.proto File src/kudu/tablet/metadata.proto: PS7, Line 161: the tablet will be evicted and > ?? Should be evicted and replaced. -- To view, visit http://gerrit.cloudera.org:8080/7440 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] disk failure: reassign failed tablets
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7440 to look at the new patch set (#8). Change subject: disk failure: reassign failed tablets .. disk failure: reassign failed tablets Tablets put into the state tablet::FAILED are left until they are manually deleted. This is an issue because failed tablets don't get evicted and reassigned (e.g. if a tablet fails to bootstrap, it will sit, responding to heartbeats, doing nothing else). To remediate this, this patch changes the tserver response generated by FAILED tablets to a new TABLET_FAILED state, on which a leader will immediately evict the peer. Additionally, a new tablet state is added: FAILED_AND_SHUTDOWN. Like QUIESCING and SHUTDOWN, TabletReplica::Shutdown() can wait on FAILED_AND_SHUTDOWN. This is useful if a failed tablet needs to be shut down and still needs to be reassigned. Calling normal Shutdown() cannot leave the replica in the FAILED state, and the SHUTDOWN state cannot itself indicate the need for eviction. Prior to this patch, tablets were set to FAILED when they failed to delete metadata. This is no longer the case. Since error statuses during deletion are only returned during IO to the metadata directory, and because the metadata directory is a single point of failure, failures on this codepath are made fatal for now. Once this is no longer the case, these failures should be made benign, as proper error handling should make files in the failed metadata directory unreachable. This ensures the tablets that were meant to be deleted are not reassigned. The test raft_consensus-itest is updated to ensure that failed tablets are evicted and replaced. The test tablet_server-test is also updated to ensure that, instead of TABLET_NOT_RUNNING, TABLET_FAILED is returned by failed tablets. This patch is a part of a series of patches to handle disk failure. See section 2.5 in this doc: https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345 --- M src/kudu/client/scanner-internal.cc M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_queue.cc M src/kudu/consensus/consensus_queue.h M src/kudu/integration-tests/raft_consensus-itest.cc M src/kudu/integration-tests/ts_recovery-itest.cc M src/kudu/master/catalog_manager.cc M src/kudu/tablet/metadata.proto M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/tserver.proto 14 files changed, 131 insertions(+), 65 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/40/7440/8 -- To view, visit http://gerrit.cloudera.org:8080/7440 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-871. Support tombstoned voting
Todd Lipcon has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting .. Patch Set 4: (21 comments) did a first pass, probably will have more comments after a second pass but figured I'd send these along http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: Line 188: Status RaftConsensus::Init() { I think the DCHECK on state here was useful, even if it needs to be broadened a bit now PS4, Line 1493: tombstone_last_logged_opid ? *tombstone_last_logged_opid : : GetLatestOpIdFromLog(); could we simplify by always using 'max()' of the entry from the metadata and the entry from our log (if we have a log) Line 1497: LOG_WITH_PREFIX_UNLOCKED(INFO) << "voting while tombstoned"; any way to improve this with a bit more info? Line 2004: state_ = new_state; given this is in every case, move it down below the switch? PS4, Line 2515: (!tombstone_last_logged_opid not 100% following why this portion of the condition is necessary PS4, Line 2520: state_ == kShutdown should we instead be checking the expected state (ie kStopped)? aren't there some other states we might be in where we don't want to vote? Line 2634: return kMinimumTerm; under what circumstances do we hit this case? It seems like cmeta_ should be set in Init() and we shouldn't ever be calling functions on RaftConsensus until after a successful Init() (see other comment about changing to a factory function so it's more clear that we never expose a constructed-by-not-initialized instance) Line 2685: if (cmeta_) { same http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/integration-tests/cluster_itest_util.h File src/kudu/integration-tests/cluster_itest_util.h: PS4, Line 239:boost::optional ignore_live_leader, :boost::optional is_pre_election, what's the point of making these optional? don't they have well-defined defaults (false) anyway? http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/tablet/metadata.proto File src/kudu/tablet/metadata.proto: Line 135: // Tablet states are sent in TabletReports and kept in TabletReplica. what effect will these changes have if there is a rolling upgrade with an old master and new tablet servers? do we need to force an incompatibility so that people upgrade masters first? or is it safe to have these show up as 'UNKNOWN' on an old-version master? http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: Line 133: set_state(INITIALIZED); maybe moot based on the suggestion of a factory method, but if a method like this remains, the state change should wait until the method is successful Line 272: // Release mem tracker resources. is this comment stale? PS4, Line 407: LOG_WITH_PREFIX(INFO) << "Fetched status for addr " << _ : << ": " << SecureDebugString(*status_pb_out); leftover debug print? Line 427: CHECK_EQ(INITIALIZED, state_); redundant with the DCHECK in set_state http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/tablet/tablet_replica.h File src/kudu/tablet/tablet_replica.h: PS4, Line 75: TabletReplica(scoped_refptr meta, : scoped_refptr cmeta_manager, : consensus::RaftPeerPB local_peer_pb, : ThreadPool* apply_pool, : Callbackmark_dirty_clbk); : : // Initializes RaftConsensus. : Status Init(ThreadPool* raft_pool); instead of having this split "construct and then init" how about a factory method which would have a similar signature to the original constructor? eg: static Status Create(scoped_refptr meta, ..., scoped_refptr* replica); which does the construction and Init()? That way there is less externally-visible lifecycle for callers to think about (and maybe could reduce one of the enum values too)? PS4, Line 96: tombstoned voting in general, right? ie what would happen if you called Stop() but the data wasn't tombstoned, and you asked to vote? PS4, Line 169: // If any peers in the consensus configuration lack permanent uuids, get them via an : // RPC call and update. : // TODO: move this to raft_consensus.h. : Status UpdatePermanentUuids(); can you remove this while you're here? seems to be dead Line 277: // Wait until the TabletReplica is fully in STOPPED state. or SHUTDOWN http://gerrit.cloudera.org:8080/#/c/6960/4/src/kudu/tserver/tablet_service.cc File src/kudu/tserver/tablet_service.cc: Line 947: tablet::TabletDataState data_state =
[kudu-CR] WIP: [iwyu] first pass
Alexey Serbin has posted comments on this change. Change subject: WIP: [iwyu] first pass .. Patch Set 11: -Verified > (19 comments) > > Thanks for banging on this some more, will be great to have with > the automated checking. I only got through about 40%, but I saw a > lot of repeated patterns that are kind of suspicious. Basically > most uses of the pragmas surprised me. Are these pragmas > indicating issues with IWYU, or with our includes? In particular > logging.h, port.h, and boost headers keep reappearing. Thank you for taking a look at this. The automated checking is coming -- I added the script to mute warnings for some files in a separate changelist and I'm planning to include it into a pre-commit build. I just was hoping to remove as many files from the mute list as possible for now. Yes, those pragmas manifest issues with IWYU -- it's still in alpha quality, if not lower. The pattern with is a re-occurring one, and there are many more like that. And there are even worse ones, which breaks compilation. The problem is that I don't know how to automate the process of straighten it up, and it requires a lot of manual intervention (that's why so much time is spent on that already). The only hope is to try mappings for the boost headers -- that type of suggestions might be straighten up as documented. As for -- I tried a mapping with previous version of the IWYU tool, but no luck. I'll add a link about the pragmas into the commit message, sure. -- To view, visit http://gerrit.cloudera.org:8080/4738 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6923091be36a1c03fac34dd73e6b17e17ac9ceaa Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR](branch-1.2.x) KUDU-2085. Fix crash when seeking past end of prefix-encoded blocks
Todd Lipcon has submitted this change and it was merged. Change subject: KUDU-2085. Fix crash when seeking past end of prefix-encoded blocks .. KUDU-2085. Fix crash when seeking past end of prefix-encoded blocks This fixes a bug when seeking past the last element of a prefix-encoded binary block. In very specific circumstances, described in the JIRA, this would cause a Status::Corruption() to be returned. Prior to the fix, the updated test would fail pretty reliably when looped 50-100 times. With the patch, I looped it 1000 times with no failure. Change-Id: I1670b2244d586e4920f0603c48f65ed993a3b12b Reviewed-on: http://gerrit.cloudera.org:8080/7545 Tested-by: Kudu Jenkins Reviewed-by: David Ribeiro AlvesReviewed-by: Andrew Wong (cherry picked from commit d1f53cc323d5d07e79304765fe171f535c1d548a) Reviewed-on: http://gerrit.cloudera.org:8080/7613 --- M src/kudu/cfile/binary_prefix_block.cc M src/kudu/cfile/encoding-test.cc 2 files changed, 51 insertions(+), 10 deletions(-) Approvals: David Ribeiro Alves: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7613 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I1670b2244d586e4920f0603c48f65ed993a3b12b Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: branch-1.2.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR](branch-1.2.x) KUDU-2085. Fix crash when seeking past end of prefix-encoded blocks
David Ribeiro Alves has posted comments on this change. Change subject: KUDU-2085. Fix crash when seeking past end of prefix-encoded blocks .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7613 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1670b2244d586e4920f0603c48f65ed993a3b12b Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: branch-1.2.x Gerrit-Owner: Todd LipconGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-2033 (part 1). Add write and stop/start in a loop to TestLeaderFailover
Edward Fancher has posted comments on this change. Change subject: KUDU-2033 (part 1). Add write and stop/start in a loop to TestLeaderFailover .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/7456/7/java/kudu-client/src/test/java/org/apache/kudu/util/AssertHelpers.java File java/kudu-client/src/test/java/org/apache/kudu/util/AssertHelpers.java: PS7, Line 30: long deadlineNanos = System.nanoTime() + timeoutMillis * 100; : boolean success; : : do { : success = expression.get(); : if (success) break; : > How about using a do-while loop so you only have to call .get() once in the Done -- To view, visit http://gerrit.cloudera.org:8080/7456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie93a15084a4c4c0748dc74c005b8313f443a5ba9 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Edward FancherGerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-2033 (part 1). Add write and stop/start in a loop to TestLeaderFailover
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7456 to look at the new patch set (#8). Change subject: KUDU-2033 (part 1). Add write and stop/start in a loop to TestLeaderFailover .. KUDU-2033 (part 1). Add write and stop/start in a loop to TestLeaderFailover TestLeaderFailover is made more robust by adding a write loop in which the tablet leader is killed and restarted. The main idea is to check that the writes complete and can be read after the leader is killed. Done in a loop so we can be sure that we stay stable through multiple stops and restarts of the leader tablet. Change-Id: Ie93a15084a4c4c0748dc74c005b8313f443a5ba9 --- M java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java A java/kudu-client/src/test/java/org/apache/kudu/client/TestMultipleLeaderFailover.java A java/kudu-client/src/test/java/org/apache/kudu/util/AssertHelpers.java 3 files changed, 137 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/56/7456/8 -- To view, visit http://gerrit.cloudera.org:8080/7456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie93a15084a4c4c0748dc74c005b8313f443a5ba9 Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Edward FancherGerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] WIP: [iwyu] first pass
Dan Burkert has posted comments on this change. Change subject: WIP: [iwyu] first pass .. Patch Set 11: (19 comments) Thanks for banging on this some more, will be great to have with the automated checking. I only got through about 40%, but I saw a lot of repeated patterns that are kind of suspicious. Basically most uses of the pragmas surprised me. Are these pragmas indicating issues with IWYU, or with our includes? In particular logging.h, port.h, and boost headers keep reappearing. http://gerrit.cloudera.org:8080/#/c/4738/11/src/kudu/benchmarks/rle.cc File src/kudu/benchmarks/rle.cc: Line 28: #include // IWYU pragma: keep Might be good to make a comment in the commit message about these different pragma types, e.g. what are they doing, and is it a long term or temporary solution? http://gerrit.cloudera.org:8080/#/c/4738/11/src/kudu/benchmarks/tpch/line_item_tsv_importer.h File src/kudu/benchmarks/tpch/line_item_tsv_importer.h: Line 21: //#include leftover? http://gerrit.cloudera.org:8080/#/c/4738/11/src/kudu/benchmarks/tpch/rpc_line_item_dao-test.cc File src/kudu/benchmarks/tpch/rpc_line_item_dao-test.cc: Line 18: #include sys/types.h and gutil/port.h seem suspicious to me. I don't see anything in this file which might need those. http://gerrit.cloudera.org:8080/#/c/4738/11/src/kudu/benchmarks/tpch/rpc_line_item_dao.h File src/kudu/benchmarks/tpch/rpc_line_item_dao.h: Line 24: #include // IWYU pragma: keep Could you forward declare boost::function instead? http://gerrit.cloudera.org:8080/#/c/4738/11/src/kudu/benchmarks/tpch/tpch-schemas.h File src/kudu/benchmarks/tpch/tpch-schemas.h: Line 25: #include // for CHECK_OK to be picked up in status.h Perhaps we could add this include inside the '#ifdef KUDU_HEADERS_NO_STUBS' in status.h instead of including it in all the call sites? E.g. right here: https://github.com/apache/kudu/blob/d1f53cc323d5d07e79304765fe171f535c1d548a/src/kudu/util/status.h#L22 http://gerrit.cloudera.org:8080/#/c/4738/11/src/kudu/benchmarks/tpch/tpch1.cc File src/kudu/benchmarks/tpch/tpch1.cc: Line 60: #include Another un-obvious one. http://gerrit.cloudera.org:8080/#/c/4738/11/src/kudu/client/scan_predicate.cc File src/kudu/client/scan_predicate.cc: Line 48: extra http://gerrit.cloudera.org:8080/#/c/4738/11/src/kudu/client/table_alterer-internal.h File src/kudu/client/table_alterer-internal.h: Line 24: #include Hm, did plain boost/optional.hpp result in a warning? http://gerrit.cloudera.org:8080/#/c/4738/11/src/kudu/client/table_creator-internal.h File src/kudu/client/table_creator-internal.h: Line 24: #include // IWYU pragma: keep I think this could be optional_fwd.hpp http://gerrit.cloudera.org:8080/#/c/4738/11/src/kudu/client/write_op.h File src/kudu/client/write_op.h: Line 20: // NOTE: using stdint.h instead of cstdint because this file is supposed client_samples-test covers this, right? Probably no need to comment this and client.h if we do. These headers are meant to be consumed externally and I think it may be a bit confusing without context. http://gerrit.cloudera.org:8080/#/c/4738/11/src/kudu/clock/logical_clock.cc File src/kudu/clock/logical_clock.cc: Line 73: const MonoTime&) { Don't we usually do something like /* deadline */? http://gerrit.cloudera.org:8080/#/c/4738/11/src/kudu/codegen/row_projector.h File src/kudu/codegen/row_projector.h: Line 35: // IWYU pragma: no_include "kudu/gutil/int128.h" These keep showing up over and over, does that indicate some kind of issue with our organization? http://gerrit.cloudera.org:8080/#/c/4738/11/src/kudu/common/column_predicate.cc File src/kudu/common/column_predicate.cc: Line 34: extra http://gerrit.cloudera.org:8080/#/c/4738/11/src/kudu/common/partition_pruner.h File src/kudu/common/partition_pruner.h: Line 31: class Partition;// IWYU pragma: keep I think these forward declares, or perhaps the include of partition.h could be removed. http://gerrit.cloudera.org:8080/#/c/4738/11/src/kudu/common/row_changelist.cc File src/kudu/common/row_changelist.cc: Line 31: #include "kudu/gutil/port.h" This one keeps showing up, not sure what it's being used for. http://gerrit.cloudera.org:8080/#/c/4738/11/src/kudu/common/row_changelist.h File src/kudu/common/row_changelist.h: Line 41: class faststring; // IWYU pragma: keep I Don't see why the pragma would be necessary with the fwd declare? http://gerrit.cloudera.org:8080/#/c/4738/11/src/kudu/common/scan_spec.cc File src/kudu/common/scan_spec.cc: Line 41: // IWYU pragma: no_include These are the only two uses of boost in the file, that's suspicious. http://gerrit.cloudera.org:8080/#/c/4738/11/src/kudu/common/schema-test.cc File src/kudu/common/schema-test.cc: Line 18: I think it's in our style to include the tested header first.
[kudu-CR] separate DataDirManager from BlockManagers
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7602 to look at the new patch set (#4). Change subject: separate DataDirManager from BlockManagers .. separate DataDirManager from BlockManagers Currently, the DataDirManager is owned by the BlockManagers. Since only blocks are placed in 'data dirs' (i.e. subdirectories named 'data' under the roots specified by 'fs_data_dirs'), this hierarchy made sense. However, it would be nice to track other files that fall within 'fs_data_dirs' (e.g. tablet-metadata, consensus-metadata). Splitting the directory manager from the block manager makes this more feasible. Some logic previously in the FsManager and BlockManagers is moved into the DataDirManager: - canonicalization of data roots now occurs in DataDirManager, ensuring that the DataDirManager will know about all data directories, even those that fail to open/canonicalize - BlockManagers no longer have a Create() function - BlockManager's dtor will now just wait for DataDir closures to finish instead of shutting down the DataDirManager as to not directory affect the DataDirManager To clarify some vocabulary: - Root: a top-level directory specified by 'fs_data_dirs' - Data (root) dir: a subdirectory of a data root, named 'data'. Blocks are placed in this directory. Change-Id: I387e4e88bd65298970a1e4201879de08fac07ace --- M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.h M src/kudu/fs/data_dirs-test.cc M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/fs_manager-test.cc M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/util/path_util.cc M src/kudu/util/path_util.h 16 files changed, 525 insertions(+), 330 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/02/7602/4 -- To view, visit http://gerrit.cloudera.org:8080/7602 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I387e4e88bd65298970a1e4201879de08fac07ace Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] disk failure: release failed txs from tracker
Andrew Wong has posted comments on this change. Change subject: disk failure: release failed txs from tracker .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/7439/4//COMMIT_MSG Commit Message: PS4, Line 15: without crashing Kudu, regardless of state. : : T > currently I think we directly enter the 'APPLYING' state when we submit a t Realized I never clicked respond. Right, if the transaction is waiting in the queue, there's not much we can do. When the transaction starts applying, however, we can check if the tablet has been failed and return early. Yeah, a flag to check whether the tablet is on a failed disk (not shut down quite yet) should do the trick (I have essentially that in a separate patch). I think it's fair game to end the transaction mid-apply (e.g. in ApplyRowOperations), lest we fail some checks if the apply is the thing that's failing. I agree that the distinction between "Abort" and "Cancel" is a bit (maybe too) subtle. The necessity comes from the fact that, for write transactions, the TransactionDrivers have ScopedTransaction objects that call MvccManager::AbortTransaction when they're deleted (e.g. when the driver is released). The idea with "Cancel" is that we should be able to safely eject the transaction from the TransactionTracker (deleting the ScopedTransaction, and effectively "completing" the Apply, without ever officially committing the tx, and not have to worry about the constraints of the MvccManager that crash Kudu if aborting during an Apply. Maybe a cleaner API would be to have some TransactionTracker::Abort(TransactionDriver) that would release the driver regardless of state, but that'd be more of a cosmetic change. The underlying behavior would be the same. -- To view, visit http://gerrit.cloudera.org:8080/7439 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I983620f27e7226806a2cca253db7619731914d42 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] kudu client tools for hadoop and spark import/export(csv,parquet,avro)
Sandish Kumar HN has posted comments on this change. Change subject: kudu client tools for hadoop and spark import/export(csv,parquet,avro) .. Patch Set 7: (4 comments) http://gerrit.cloudera.org:8080/#/c/7421/6//COMMIT_MSG Commit Message: PS6, Line 7: avro > Did you mean to include the avro part? Avro is available in spark client tools. people didn't want me to add much to MapReduce client tools. http://gerrit.cloudera.org:8080/#/c/7421/6/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java: Line 32: import org.apache.parquet.column.ColumnDescriptor; > nit: remove Done PS6, Line 103: > nit: Parquet is always stylized with an upper case P, please fix here and b Done PS6, Line 103: > nit: exist, same below Done -- To view, visit http://gerrit.cloudera.org:8080/7421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If462af948651f3869b444e82151c3559fde19142 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sandish Kumar HNGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sandish Kumar HN Gerrit-HasComments: Yes
[kudu-CR] consensus: Don't replay config changes
Todd Lipcon has posted comments on this change. Change subject: consensus: Don't replay config changes .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/7614/1/src/kudu/tablet/tablet_bootstrap.cc File src/kudu/tablet/tablet_bootstrap.cc: Line 81: using consensus::ChangeConfigRecordPB; > warning: using decl 'ChangeConfigRecordPB' is unused [misc-unused-using-dec yea PS1, Line 1418: IllegalState maybe Corruption is more appropriate -- To view, visit http://gerrit.cloudera.org:8080/7614 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id62bf96b21560b2ef5838415e52aeb3720875e62 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Add Maintenance Manager visualizer
Todd Lipcon has posted comments on this change. Change subject: Add Maintenance Manager visualizer .. Patch Set 7: (10 comments) http://gerrit.cloudera.org:8080/#/c/7570/7/www/maintenance-manager.mustache File www/maintenance-manager.mustache: PS7, Line 106: CompactRowSetsOp { : fill: #095; : } : : .FlushDeltaMemStoresOp { : fill: #ec1; : } : : .FlushMRSOp { : fill: #16a; : } : : .LogGCOp { : fill: #2ae; : } : : .MajorDeltaCompactionOp { : fill: #c23; : } : : .MinorDeltaCompactionOp { : fill: #92c; : } : : .UndoDeltaBlockGCOp { maybe we can fill these in from the template? Line 173: } else if (millis < 1000 * 60) { I think a better threshold here might be 300 or something... otherwise anywhere from 61 to 119 seconds will be reported the same, which doesn't seem that great PS7, Line 180: // Translates a thread id hex string into a counting-number identifier. : function threadString(thread_id) { : return "Thread " + (threads.indexOf(thread_id) + 1); : } would rather figure a way to show pids here so you can correlate with trace results, pstacks, etc PS7, Line 192: html shoudl be .text() instead of .html() Line 192: .append($("").html(op.name)) why not use jQuery(popup).css(...) to set the above styles instead of the native DOM manipulation? also could just use jQuery("#popup") to find the element PS7, Line 201: var popup = document.getElementById("popup"); : popup.innerHTML = ""; : popup.style.display = "none"; jQuery("#popup").empty().hide(); PS7, Line 217: (i var i PS7, Line 219: []; an empty array here seems unexpected since it's used as an array index below PS7, Line 246: . nit: inconsistent indentation Line 328: // and a corresponding css class (as above) should be added. can this be passed in from the code? -- To view, visit http://gerrit.cloudera.org:8080/7570 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If2e7f18ac4834791a94d935b540930b11ea14532 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sam OkrentGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sam Okrent Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Web UI: switch /maintenance-manager endpoint to mustache
Todd Lipcon has posted comments on this change. Change subject: Web UI: switch /maintenance-manager endpoint to mustache .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/7607 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icf331f5d789893171608bac4d970db7680e0fcc4 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No