[kudu-CR] [cmake] introduce 'pb-gen' and 'krpc-gen' targets
Alexey Serbin has uploaded a new patch set (#2). Change subject: [cmake] introduce 'pb-gen' and 'krpc-gen' targets .. [cmake] introduce 'pb-gen' and 'krpc-gen' targets Introduced a new target to generate protobuf stubs and KRPC service and proxy files. The 'iwyu' target now depends on the newly introduced 'pb-gen' and 'krpc-gen' targets. That's because IWYU needs to have all protobuf and KRPC headers available when running. Change-Id: I48833098553ef7d3f4fae8b88079cb354b52bea1 --- M CMakeLists.txt M src/kudu/cfile/CMakeLists.txt M src/kudu/client/CMakeLists.txt M src/kudu/common/CMakeLists.txt M src/kudu/consensus/CMakeLists.txt M src/kudu/fs/CMakeLists.txt M src/kudu/master/CMakeLists.txt M src/kudu/rpc/CMakeLists.txt M src/kudu/security/CMakeLists.txt M src/kudu/server/CMakeLists.txt M src/kudu/tablet/CMakeLists.txt M src/kudu/tserver/CMakeLists.txt M src/kudu/util/CMakeLists.txt 13 files changed, 99 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/77//2 -- To view, visit http://gerrit.cloudera.org:8080/ To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I48833098553ef7d3f4fae8b88079cb354b52bea1 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [cmake] introduce 'pb-gen' and 'krpc-gen' target
Alexey Serbin has uploaded a new change for review. http://gerrit.cloudera.org:8080/ Change subject: [cmake] introduce 'pb-gen' and 'krpc-gen' target .. [cmake] introduce 'pb-gen' and 'krpc-gen' target Introduced a new target to generate protobuf stubs and KRPC service and proxy files. The 'iwyu' target now depends on the newly introduced 'pb-gen' and 'krpc-gen' targets. That's because IWYU needs to have all protobuf and KRPC headers available when running. Change-Id: I48833098553ef7d3f4fae8b88079cb354b52bea1 --- M CMakeLists.txt M src/kudu/cfile/CMakeLists.txt M src/kudu/client/CMakeLists.txt M src/kudu/common/CMakeLists.txt M src/kudu/consensus/CMakeLists.txt M src/kudu/fs/CMakeLists.txt M src/kudu/master/CMakeLists.txt M src/kudu/rpc/CMakeLists.txt M src/kudu/security/CMakeLists.txt M src/kudu/server/CMakeLists.txt M src/kudu/tablet/CMakeLists.txt M src/kudu/tserver/CMakeLists.txt M src/kudu/util/CMakeLists.txt 13 files changed, 99 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/77//1 -- To view, visit http://gerrit.cloudera.org:8080/ To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I48833098553ef7d3f4fae8b88079cb354b52bea1 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin
[kudu-CR] KUDU-1407: reassign failed tablets
Mike Percy has posted comments on this change. Change subject: KUDU-1407: reassign failed tablets .. Patch Set 19: (1 comment) http://gerrit.cloudera.org:8080/#/c/7440/19/src/kudu/consensus/consensus_queue.cc File src/kudu/consensus/consensus_queue.cc: Line 627: void PeerMessageQueue::NotifyPeerHasFailed(const string& peer_uuid, const string& reason) { consider: std::unique_lock l(queue_lock_); TrackedPeer* peer = FindPtrOrNull(peers_map_, peer_uuid); if (peer) { // Use the current term to ensure the peer will be evicted, otherwise this // notification may be ignored. int64_t current_term = queue_state_.current_term; l.unlock(); NotifyObserversOfFailedFollower(peer_uuid, current_term, reason); } -- To view, visit http://gerrit.cloudera.org:8080/7440 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345 Gerrit-PatchSet: 19 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1407: reassign failed tablets
Mike Percy has posted comments on this change. Change subject: KUDU-1407: reassign failed tablets .. Patch Set 19: (6 comments) http://gerrit.cloudera.org:8080/#/c/7440/19/src/kudu/consensus/consensus_queue.cc File src/kudu/consensus/consensus_queue.cc: PS19, Line 628: peers_map_ This should be protected by queue_lock_ PS19, Line 632: queue_state_.current_term This should be protected by queue_lock_ PS19, Line 633: NotifyObserversOfFailedFollower No need to hold queue_lock_ while calling NotifyObserversOfFailedFollower() because the thread pool it uses is set in the constructor http://gerrit.cloudera.org:8080/#/c/7440/19/src/kudu/integration-tests/ts_recovery-itest.cc File src/kudu/integration-tests/ts_recovery-itest.cc: Line 310: { "--master_tombstone_evicted_tablet_replicas=false" })); nit: add a comment for why we are specifying this flag http://gerrit.cloudera.org:8080/#/c/7440/17/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: PS17, Line 655: > Failures when writing the data directory are passed off as WARN_NOT_OK() (s ok PS17, Line 658: > Ah, I see. I'll update the comment. ok. -- To view, visit http://gerrit.cloudera.org:8080/7440 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345 Gerrit-PatchSet: 19 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1407: reassign failed tablets
Andrew Wong has posted comments on this change. Change subject: KUDU-1407: reassign failed tablets .. Patch Set 19: The jenkins failure doesn't seem to be related to these changes. This patch is still good for review. -- To view, visit http://gerrit.cloudera.org:8080/7440 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345 Gerrit-PatchSet: 19 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1407: reassign failed tablets
Hello David Ribeiro Alves, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7440 to look at the new patch set (#19). Change subject: KUDU-1407: reassign failed tablets .. KUDU-1407: reassign failed tablets Tablets put into the state tablet::FAILED are left until they are manually deleted; they are not evicted and reassigned. If a tablet fails to bootstrap, it will sit, responding to heartbeats, doing nothing else. This patch ensures failed tablets will be reassigned. As the tablets are not used, rather than directly setting replicas to FAILED, an error is first recorded and the TabletReplica::Shutdown(), leaving the final state as FAILED. A replica can no longer leave the FAILED state (calls to Shutdown() leave it FAILED). The tserver response generated by FAILED tablets is now TABLET_FAILED. Upon receiving this, a leader will immediately evict the peer. Prior to this patch, a tablet was marked FAILED if its WAL or metadata failed to delete (after already shutting down). If this occurs, there is no guarantee that the tablet's metadata reflects the deleted state. This has been made fatal. Testing is done in a few places: - raft_consensus-itest is updated to ensure that tablets that fail to bootstrap are evicted and replaced. - tablet_server-test is also updated to ensure that, instead of TABLET_NOT_RUNNING, TABLET_FAILED is returned by failed tablets. - a test is added to ts_tablet_manager-itest to test that a tablet that is manually failed while running is evicted and replaced. This patch is a part of a series of patches to handle disk failure. See section 2.5 in this doc: https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345 --- M src/kudu/client/scanner-internal.cc M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_queue.cc M src/kudu/consensus/consensus_queue.h M src/kudu/integration-tests/raft_consensus-itest.cc M src/kudu/integration-tests/ts_recovery-itest.cc M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/tserver.proto 13 files changed, 151 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/40/7440/19 -- To view, visit http://gerrit.cloudera.org:8080/7440 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345 Gerrit-PatchSet: 19 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2095 - Add scanner `keepAlive` RPC call to Java API
t...@phdata.io has posted comments on this change. Change subject: KUDU-2095 - Add scanner `keepAlive` RPC call to Java API .. Patch Set 6: (4 comments) http://gerrit.cloudera.org:8080/#/c/7749/5/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java: Line 845: final ReplicaSelection replicaSelection = scanner.getReplicaSelection(); I think this is better. A scanner would fail with ReplicaSelection.CLOSEST_REPLICA. Now that passes but since CLOSEST_REPLICA could return a random replica I don't know that this is exactly correct. http://gerrit.cloudera.org:8080/#/c/7749/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java: Line 843: final KuduRpc keepAliveRequest = scanner.getKeepAliveRequest(); I think this is better, but since ReplicaSelection.CLOSEST_REPLICA can return a random replica if they're equidistant I'm not sure how I'd know which exact replica has the scanner. It seems to rely on the order a java.util.Map iterates values in RemoteTablet. http://gerrit.cloudera.org:8080/#/c/7749/6/java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java File java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java: Line 62:* Keep the current remote scanner alive. I took the long docs from C++ for the public methods http://gerrit.cloudera.org:8080/#/c/7749/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java: Line 60: @BeforeClass Is this a good idea to change the timeouts for all tests? Doesn't seem to cause problems for me but I don't want to cause any flakiness -- To view, visit http://gerrit.cloudera.org:8080/7749 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iba647e7b5ed4ee8d223d387b6656d3fb02c41713 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: t...@phdata.io Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd LipconGerrit-Reviewer: t...@phdata.io Gerrit-HasComments: Yes
[kudu-CR] disk failure: don't open tablets on failed disks
Adar Dembo has posted comments on this change. Change subject: disk failure: don't open tablets on failed disks .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/7766/3/src/kudu/fs/data_dirs.cc File src/kudu/fs/data_dirs.cc: Line 986: if (PREDICT_FALSE(!failed_dirs.empty())) { Why do we need this condition? Won't the for loop no-op if failed_dirs is empty? http://gerrit.cloudera.org:8080/#/c/7766/3/src/kudu/fs/file_block_manager.cc File src/kudu/fs/file_block_manager.cc: PS3, Line 581: // Don't report failed directories. > shoudln't we _report_ failed directories? ok if you want to do it in a foll There's some confusion about this, I think. The FsReport is intended for the block manager to report fine grained details about the filesystem; if a data directory has "failed", the block manager can't open its on-disk state on that directory and thus can't include it in the report. This is more of an issue for the log block manager, where we actually report something meaningful. http://gerrit.cloudera.org:8080/#/c/7766/3/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 1694: // Ignore attempts to delete blocks in failed directories. Shouldn't this precede the RemoveLogBlockUnlocked? This is deleting the block from in-memory maps, but not from disk. That makes LogBlockManager::DeleteBlock differ from FileBlockManager::DeleteBlock in semantics. Line 1702: return Status::OK(); > also: if these are the semantics we want (not sure), shouldn't we have a un Agreed with Mike: if return an error, don't callers handle it appropriately (i.e. WARN or whatever since we don't care that much if block deletion fails)? -- To view, visit http://gerrit.cloudera.org:8080/7766 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id3fae98355657f6aa4b134c542f92fc07f5c0aa1 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] dense node-itest: add extra log output
Todd Lipcon has submitted this change and it was merged. Change subject: dense_node-itest: add extra log output .. dense_node-itest: add extra log output For yet-unexplained reasons, this itest still sometimes skips outputting the expected 'Time spent bootstrapping tablets' log statement. This commit simply adds a few extra log messages to help diagnose this next time it occurs. Change-Id: Ic967558e34a8db673a518c37afdb68d86262a443 Reviewed-on: http://gerrit.cloudera.org:8080/7775 Reviewed-by: Adar DemboTested-by: Kudu Jenkins --- M src/kudu/integration-tests/dense_node-itest.cc 1 file changed, 3 insertions(+), 0 deletions(-) Approvals: Adar Dembo: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7775 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ic967558e34a8db673a518c37afdb68d86262a443 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] disk failure: add persistent disk states
Adar Dembo has posted comments on this change. Change subject: disk failure: add persistent disk states .. Patch Set 26: (17 comments) http://gerrit.cloudera.org:8080/#/c/7270/26/src/kudu/fs/block_manager_util.cc File src/kudu/fs/block_manager_util.cc: Line 36 How come we don't need this? http://gerrit.cloudera.org:8080/#/c/7270/26/src/kudu/fs/data_dirs.cc File src/kudu/fs/data_dirs.cc: Line 422: Status DataDirManager::Open() { This could stand to be decomposed. It's so long and there's no clear description of what it's trying to accomplish, why in this particular order, etc. PS26, Line 540: vectorinstances; : instances.resize(path_set.all_paths_size()); I think there's a constructor that resizes. Line 543: vector failed_dirs; This is confusing w.r.t. failed_data_dirs. PS26, Line 587: path_set.all_paths(uuid_idx) Capture this in a local cref? PS26, Line 605: path_set_.Swap(_set); : auto path_set_reset = MakeScopedCleanup([&] { : path_set_.Swap(_set); : }); This is strange. Can't you parameterize the path set for UpdateInstanceFiles, then do the swap where all the other swaps go? PS26, Line 633: // Initialize the 'fullness' status of the data directories. : for (const auto& dd : data_dirs_) { : uint16_t uuid_idx; : CHECK(FindUuidIndexByDataDir(dd.get(), _idx)); : if (ContainsKey(failed_data_dirs_, uuid_idx)) { : continue; : } : Status refresh_status = dd->RefreshIsFull(DataDir::RefreshMode::ALWAYS); : if (PREDICT_FALSE(!refresh_status.ok())) { : if (refresh_status.IsDiskFailure()) { : MarkDataDirFailed(uuid_idx, refresh_status.ToString()); : continue; : } : return refresh_status; : } : } Why was this moved all the way down here? You could use instance health to determine if a data directory is healthy, no? Or is the issue that MarkDataDirFailed() won't work until all that stuff above is initialized? Line 662: string healthy_path = path_set_.all_paths(uuid_idx).path(); Not used? Line 670: set indices_to_update(std::move(updated_uuid_indices)); Why not just operate on updated_uuid_indices directly? It was passed by value so it's a copy. PS26, Line 689: indices_to_update.erase(idx_to_update); Seems like you could do this right at .begin() and avoid duplicating it. Line 701: return Status::IOError("Could not write disk states, all disks failed"); Is there a test case for this? Looks like an interesting edge case. Line 744: // A size of 0 would indicate no healthy disks, which should crash the server. Why crash and not return an IOError, letting a caller higher in the stack crash? Line 908: CHECK(!read_only_) << "Cannot handle disk failures; filesystem is " Hmm, what if we just skipped WritePathHealthStates? Would that be incorrect? Line 909: "opened in read-only mode"; Nit: funky indentation here. Line 917: CHECK_NE(failed_data_dirs_.size(), data_dirs_.size()) << "All data dirs have failed"; Also surprised to see this here; why not let the higher levels (maintenance manager, etc.) discover this and decide to crash or not? Line 953: std::lock_guard lock(write_instance_mutex_); Can you add a comment explaining what this is serializing and why? http://gerrit.cloudera.org:8080/#/c/7270/26/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: PS26, Line 2183: // Note: disk failures when opening the block manager need only mark the : // directory as failed, rather than running callbacks to shut down tablets. At : // this point, the tablet manager is not initialized, and no such callback : // exists. This is pretty fragile; what if Repair() is refactored and some of the logic is used at runtime? How will we know that we need to change the macros to call the error manager's callback? Perhaps at startup the block managers should install a callback that fails data dirs, and later on, the tablet manager will replace it with one that fails entire tablets? -- To view, visit http://gerrit.cloudera.org:8080/7270 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f Gerrit-PatchSet: 26 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Adar Dembo has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 23: (10 comments) http://gerrit.cloudera.org:8080/#/c/7207/23/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: PS23, Line 325: enum SyncMode { : SYNC, : NO_SYNC : }; No longer needed? PS23, Line 1219: Do Doing PS23, Line 1219: it in Line 1220: //unmanaged blocks which leaves gaps in data files. But the unmanaged "unmanaged" isn't really an existing concept. How about "Doing nothing can result in data-consuming gaps in containers, but these gaps can be cleaned up by hole repunching at start up"? Line 1223: //orphan blocks if the metadata is durable. But orphan blocks can be orphaned PS23, Line 1226: abortion abort PS23, Line 1227: abortion abort Line 1228: // avoid large chunk of useless consumed space and unmanaged blocks. A "chunks" and "data-consuming gaps". PS23, Line 1230: abortion abort PS23, Line 1232: abortion abort -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 23 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-2095 - Add scanner `keepAlive` RPC call to Java API
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7749 to look at the new patch set (#6). Change subject: KUDU-2095 - Add scanner `keepAlive` RPC call to Java API .. KUDU-2095 - Add scanner `keepAlive` RPC call to Java API - Adds keepAlive methods to sync and async clients - Methods return a (Deferred) Status object Where possible the behavior mimics the C++ client. Publicly accessible methods are available in AsyncKuduScanner and KuduScanner. A call to KeepAlive will return a Status object which will be OK if - The scanner has been initialized - The scanner is active and has rows left to return - The scanner is between tablets It will return a non-ok status when - The scanner has not yet been initialized - The scanner has no more rows Change-Id: Iba647e7b5ed4ee8d223d387b6656d3fb02c41713 --- M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java 4 files changed, 206 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/49/7749/6 -- To view, visit http://gerrit.cloudera.org:8080/7749 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iba647e7b5ed4ee8d223d387b6656d3fb02c41713 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: t...@phdata.io Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] dense node-itest: add extra log output
Adar Dembo has posted comments on this change. Change subject: dense_node-itest: add extra log output .. Patch Set 1: Code-Review+2 Sure, why not. -- To view, visit http://gerrit.cloudera.org:8080/7775 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic967558e34a8db673a518c37afdb68d86262a443 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hao Hao has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 22: (10 comments) > (1 comment) > > Looks good! I'm very pleased how this came together, net-net I > think the code is easier to understand with this patch. Thank you! Really appreciate yours and Adar's thorough code reviews! http://gerrit.cloudera.org:8080/#/c/7207/22//COMMIT_MSG Commit Message: PS22, Line 15: for > an Done http://gerrit.cloudera.org:8080/#/c/7207/22/src/kudu/fs/block_manager.cc File src/kudu/fs/block_manager.cc: PS22, Line 34: blocks > block Done Line 45: DEFINE_string(block_manager_flush_control, "finalize", > It sounds scarier than it really is. We should at least clarify that this f Done PS22, Line 46: when to flush > More like when to pre-flush? Done http://gerrit.cloudera.org:8080/#/c/7207/22/src/kudu/fs/block_manager.h File src/kudu/fs/block_manager.h: PS22, Line 70: is > is set to Done PS22, Line 127: is > is set to Done PS22, Line 128: asynchronous flush of dirty block data to disk > Sounds good for performance. What is the crash consistency contract? Can th Yes, in the worst case we could have 'gaps' in the data file, but this should be handled by hole repunching. Also, we only flush data in this case, and always append metadata after data is durable. This ensures metadata never points to garbage data. http://gerrit.cloudera.org:8080/#/c/7207/22/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 276: // This is called after synchronization of dirty data and metadata > doesn't finalization have to happen before the data is synced to disk? DoClose() is called by DoCloseBlocks(). If closing a group of blocks, the blocks should be finalized. But for a single block it can be called without being finalized. There is a DCHECK at L904. PS22, Line 458: rounds up > "rounds up"? Done PS22, Line 459: it > clarify: is "it" the block or the container? Done -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 22 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hello Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7207 to look at the new patch set (#23). Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. KUDU-1943: Add BlockTransaction to Block Manager This adds a new layer of abstraction 'BlockTransaction' to Block Manager, to accumulate new blocks in a logical operation, e.g. flush/compaction. Avoid using fsync() per block can potentionaly reduce a lot I/O overhead. This patch also add a new API, Finalize(), to Block Manager, so that an LBM container can be reused without flushing in-flight writable blocks to disk. This can bring down the log block container number and disk space consumption for first few flush/compaction operations when using log block manager. I performed manual test which creates a table with two tablets, 'kudu perf loadgen localhost --num-rows-per-thread=2000 --num-threads=10 --keep-auto-table --table_num_buckets=2'. With this patch, 'log_block_manager_containers' dropped from 56 to 7. A design doc can be found here: https://docs.google.com/document/d/1cB8pDs-Ho7p2HlzkWo1d-wmXr90YntIjdvdOv7u7gn0/edit?usp=sharing Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd --- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h 20 files changed, 574 insertions(+), 440 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/7207/23 -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 23 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] KUDU-2095 - Add `keepAlive` method to Java API
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7749 to look at the new patch set (#5). Change subject: KUDU-2095 - Add `keepAlive` method to Java API .. KUDU-2095 - Add `keepAlive` method to Java API - Adds keepAlive methods to sync and async clients - Methods return a (Deferred) Status object Where possible the behavior mimics the C++ client. Publicly accessible methods are available in AsyncKuduScanner and KuduScanner. A call to KeepAlive will return a Status object which will be OK if - The scanner has been initialized - The scanner is active and has rows left to return - The scanner is between tablets It will return a non-ok status when - The scanner has not yet been initialized - The scanner has no more rows Change-Id: Iba647e7b5ed4ee8d223d387b6656d3fb02c41713 --- M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java 4 files changed, 203 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/49/7749/5 -- To view, visit http://gerrit.cloudera.org:8080/7749 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iba647e7b5ed4ee8d223d387b6656d3fb02c41713 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: t...@phdata.io Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2095 - Add `keepAlive` method to Java API
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7749 to look at the new patch set (#4). Change subject: KUDU-2095 - Add `keepAlive` method to Java API .. KUDU-2095 - Add `keepAlive` method to Java API - Adds keepAlive methods to sync and async clients - Methods return a (Deferred) Status object Where possible the behavior mimics the C++ client. Publicly accessible methods are available in AsyncKuduScanner and KuduScanner. A call to KeepAlive will return a Status object which will be OK if - The scanner has been initialized - The scanner is active and has rows left to return - The scanner is between tablets It will return a non-ok status when - The scanner has not yet been initialized - The scanner has no more rows Change-Id: Iba647e7b5ed4ee8d223d387b6656d3fb02c41713 --- A java/kudu-client/kudu_style.xml M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java 5 files changed, 457 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/49/7749/4 -- To view, visit http://gerrit.cloudera.org:8080/7749 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iba647e7b5ed4ee8d223d387b6656d3fb02c41713 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: t...@phdata.io Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] dense node-itest: add extra log output
Hello Adar Dembo, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/7775 to review the following change. Change subject: dense_node-itest: add extra log output .. dense_node-itest: add extra log output For yet-unexplained reasons, this itest still sometimes skips outputting the expected 'Time spent bootstrapping tablets' log statement. This commit simply adds a few extra log messages to help diagnose this next time it occurs. Change-Id: Ic967558e34a8db673a518c37afdb68d86262a443 --- M src/kudu/integration-tests/dense_node-itest.cc 1 file changed, 3 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/7775/1 -- To view, visit http://gerrit.cloudera.org:8080/7775 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ic967558e34a8db673a518c37afdb68d86262a443 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo
[kudu-CR] KUDU-1407: reassign failed tablets
Andrew Wong has posted comments on this change. Change subject: KUDU-1407: reassign failed tablets .. Patch Set 17: (8 comments) http://gerrit.cloudera.org:8080/#/c/7440/17//COMMIT_MSG Commit Message: PS17, Line 30: is added > this is repeated Done PS17, Line 31: failed tablets while running > tablets that fail while running (due to what?) Done http://gerrit.cloudera.org:8080/#/c/7440/17/src/kudu/consensus/consensus_queue.cc File src/kudu/consensus/consensus_queue.cc: Line 629: NotifyObserversOfFailedFollower(peer_uuid, current_term, reason); > nit: No need to hold the lock while calling this method. Done http://gerrit.cloudera.org:8080/#/c/7440/17/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: Line 170: DEFINE_bool(master_tombstone_failed_tablet_replicas, true, > Should be removed per below. See master_tombstone_evicted_tablet_replica Done PS17, Line 2473: if (FLAGS_master_tombstone_failed_tablet_replicas) { : SendDeleteReplicaRequest(report.tablet_id(), TABLET_DATA_TOMBSTONED, :boost::none, :tablet->table(), ts_desc->permanent_uuid(), :Substitute("Tablet failed: $0", s.ToString())); : } > Is this required? The leader will now evict a failed follower because of th I think you're right; when the leader sees the failed tablet, it should evict and config change, and then report to the master. http://gerrit.cloudera.org:8080/#/c/7440/17/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: PS17, Line 655: metadata > Couldn't this simply happen if one of the data disks failed? Failures when writing the data directory are passed off as WARN_NOT_OK() (see TabletMetadata::DeleteOrphanedBlocks), since the blocks can always be removed in the future (eg when we next startup). PS17, Line 658: is unclear > Shouldn't the contract of DeleteTabletData() be a crash-consistent one? In Ah, I see. I'll update the comment. Line 752: auto fail_tablet = MakeScopedCleanup([&]() { > I like this approach. It is quite clean indeed! Credit to Adar for the suggestion. -- To view, visit http://gerrit.cloudera.org:8080/7440 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345 Gerrit-PatchSet: 17 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1407: reassign failed tablets
Hello David Ribeiro Alves, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7440 to look at the new patch set (#18). Change subject: KUDU-1407: reassign failed tablets .. KUDU-1407: reassign failed tablets Tablets put into the state tablet::FAILED are left until they are manually deleted; they are not evicted and reassigned. If a tablet fails to bootstrap, it will sit, responding to heartbeats, doing nothing else. This patch ensures failed tablets will be reassigned. As the tablets are not used, rather than directly setting replicas to FAILED, an error is first recorded and the TabletReplica::Shutdown(), leaving the final state as FAILED. A replica can no longer leave the FAILED state (calls to Shutdown() leave it FAILED). The tserver response generated by FAILED tablets is now TABLET_FAILED. Upon receiving this, a leader will immediately evict the peer. Prior to this patch, a tablet was marked FAILED if its WAL or metadata failed to delete (after already shutting down). If this occurs, there is no guarantee that the tablet's metadata reflects the deleted state. This has been made fatal. Testing is done in a few places: - raft_consensus-itest is updated to ensure that tablets that fail to bootstrap are evicted and replaced. - tablet_server-test is also updated to ensure that, instead of TABLET_NOT_RUNNING, TABLET_FAILED is returned by failed tablets. - a test is added to ts_tablet_manager-itest to test that a tablet that is manually failed while running is evicted and replaced. This patch is a part of a series of patches to handle disk failure. See section 2.5 in this doc: https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345 --- M src/kudu/client/scanner-internal.cc M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_queue.cc M src/kudu/consensus/consensus_queue.h M src/kudu/integration-tests/raft_consensus-itest.cc M src/kudu/integration-tests/ts_recovery-itest.cc M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tserver/tablet_server-test.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/tserver.proto 13 files changed, 151 insertions(+), 59 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/40/7440/18 -- To view, visit http://gerrit.cloudera.org:8080/7440 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345 Gerrit-PatchSet: 18 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [java] KUDU-2103 Canonicalize hostnames in client
Todd Lipcon has posted comments on this change. Change subject: [java] KUDU-2103 Canonicalize hostnames in client .. Patch Set 3: (5 comments) http://gerrit.cloudera.org:8080/#/c/7757/3/java/kudu-client/src/test/java/org/apache/kudu/client/FakeDNS.java File java/kudu-client/src/test/java/org/apache/kudu/client/FakeDNS.java: PS3, Line 50: private MapforwardResolutions = new HashMap<>(); : : private Map reverseResolutions = new HashMap<>(); these should probably be @GuardedBy("this") and have the below suggested 'add' methods be synchronized PS3, Line 54: public Map getForwardResolutions() { : return forwardResolutions; : } : : public Map getReverseResolutions() { nit: it's better practice not to return mutable private members like this. I think you should be able to get by with just a 'addForwardResolution(String host, InetAddress addr)' and the equivalent for reverse resolution PS3, Line 93: if (forwardResolutions.containsKey(host)) { : return new InetAddress[]{forwardResolutions.get(host)}; : } nit: to avoid double lookup, restructure as: InetAddress addr = forwardResolutions.get(host); if (addr != null) { return addr; } ... throw PS3, Line 105: } else if (reverseResolutions.containsKey(InetAddress.getByAddress(addr))) { : return reverseResolutions.get(InetAddress.getByAddress(addr)); : same. Also no need for the 'else' here since the previous statement returns http://gerrit.cloudera.org:8080/#/c/7757/3/java/kudu-client/src/test/java/org/apache/kudu/client/TestServerInfo.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestServerInfo.java: PS3, Line 49: InetAddress.getByName("10.1.2.3") : ); does our maven checkstyle plugin think this is OK? seems a bit funny to me, would be good to run this patch through checkstyle before committing. -- To view, visit http://gerrit.cloudera.org:8080/7757 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I77204a185c1ab8e21cc833afb645543cbc0e340f Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Attila Bukor Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] kserver: consolidate randomized failure monitors
Adar Dembo has abandoned this change. Change subject: kserver: consolidate randomized failure monitors .. Abandoned I went in a different direction and reimplemented failure detection using timer-based scheduling: https://gerrit.cloudera.org/#/c/7735/ -- To view, visit http://gerrit.cloudera.org:8080/7330 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: I096e3e89bf6e8925f6ea0382fc319d7382237787 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] util: remove old failure detector and resettable heartbeater code
Adar Dembo has submitted this change and it was merged. Change subject: util: remove old failure detector and resettable heartbeater code .. util: remove old failure detector and resettable heartbeater code With the move to periodic timers, these classes are no longer needed. Change-Id: I7a8f77a56ed66e06e182a70e18b2837a264a0a4e Reviewed-on: http://gerrit.cloudera.org:8080/7736 Reviewed-by: Dan BurkertTested-by: Kudu Jenkins --- M build-support/iwyu/iwyu-filter.awk M src/kudu/util/CMakeLists.txt D src/kudu/util/failure_detector-test.cc D src/kudu/util/failure_detector.cc D src/kudu/util/failure_detector.h D src/kudu/util/resettable_heartbeater-test.cc D src/kudu/util/resettable_heartbeater.cc D src/kudu/util/resettable_heartbeater.h 8 files changed, 0 insertions(+), 891 deletions(-) Approvals: Dan Burkert: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7736 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I7a8f77a56ed66e06e182a70e18b2837a264a0a4e Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] consensus peers: replace bespoke Raft heartbeat logic with periodic timers
Adar Dembo has submitted this change and it was merged. Change subject: consensus_peers: replace bespoke Raft heartbeat logic with periodic timers .. consensus_peers: replace bespoke Raft heartbeat logic with periodic timers Building on the generic periodic timer implementation, this patch replaces the one-off Raft heartbeating code with periodic timers. There's only one semantic change, but it's an important one: the jittering range has changed from [P/2, P] to [3P/4, 5P/4]. When I wrote commit 1070e76 I was nervous about making such a change, but since it reduces overall heartbeat load, I think it makes sense to do it. Change-Id: I5f7e1761d9f36dc6a25bd8e3e7d7a3b5c402afbf Reviewed-on: http://gerrit.cloudera.org:8080/7734 Reviewed-by: Dan BurkertTested-by: Kudu Jenkins --- M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_peers.h M src/kudu/consensus/raft_consensus_quorum-test.cc 3 files changed, 24 insertions(+), 90 deletions(-) Approvals: Dan Burkert: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7734 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I5f7e1761d9f36dc6a25bd8e3e7d7a3b5c402afbf Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] consensus: use periodic timers for failure detection
Adar Dembo has submitted this change and it was merged. Change subject: consensus: use periodic timers for failure detection .. consensus: use periodic timers for failure detection This patch replaces the existing failure detection (FD) with a new approach built using periodic timers. The existing approach had a major drawback: each failure monitor required a dedicated thread, and there was a monitor for each replica. The new approach "schedules" a failure into the future using the server's reactor thread pool, "resetting" it when leader activity is detected. There's an inherent semantic mismatch between dedicated threads that periodically wake to check for failures and this new approach; I tried to provide similar semantics as best I could. Things worth noting: - Most importantly: some FD periods are now shorter. This is because the existing implementation "double counted" failure periods when adding backoff (once in LeaderElectionExpBackoffDeltaUnlocked, and once by virtue of the failure period comparison made by the failure monitor). This seemed accidental to me, so I didn't bother preserving that behavior. - It's tough to "expire" an FD using timers. Luckily, this only happens in RaftConsensus::Start, so by making PeriodicTimer::Start accept an optional delta, we can begin FD with an early delta that reflects the desired "detect a failure immediately but not too quickly" semantic, similar to how the dedicated failure monitor thread operates. - ReportFailureDetected is now run on a shared reactor thread rather than a dedicated failure monitor thread. Since StartElection performs IO, I thunked it onto the Raft thread pool. - Timer operations cannot fail, so I removed the return values from the various FD-related functions. - I also consolidated the two SnoozeFailureDetector variants; I found that this made it easier to look at all the call-sites. Change-Id: I8acdb44e12b975fda4a226aa784db95bc7b4e330 Reviewed-on: http://gerrit.cloudera.org:8080/7735 Reviewed-by: Dan BurkertTested-by: Adar Dembo --- M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/integration-tests/exactly_once_writes-itest.cc M src/kudu/integration-tests/raft_consensus-itest.cc 4 files changed, 110 insertions(+), 184 deletions(-) Approvals: Dan Burkert: Looks good to me, approved Adar Dembo: Verified -- To view, visit http://gerrit.cloudera.org:8080/7735 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I8acdb44e12b975fda4a226aa784db95bc7b4e330 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] rpc: periodic timers
Adar Dembo has submitted this change and it was merged. Change subject: rpc: periodic timers .. rpc: periodic timers This patch introduces a generic periodic timer class. How does it work? 1. A timer is constructed with a user-provided task functor. 2. After Start() is called, the timer will run the functor repeatedly on a fixed (and optionally, jittered) period. 3. When Reset() is called, it'll reset the period. 4. After Stop() is called, it won't run the functor anymore. The implementation is based on Messenger::ScheduleOnReactor but it could just as easily build on libev directly. I chose to use the Messenger so that I wouldn't have to implement timer thread pooling. It's also just a generic version of the Raft heartbeat logic found in Peer (see commit 1070e76). In follow-on patches I will use this class to replace the bespoke Peer logic as well as for Raft failure detection. If you're curious, it's actually because of failure detection that both Start() and Reset() optionally accept deltas to override the default period; the snoozing code likes to provide heavily customized backoff. Change-Id: I23f80631a5591b1ac023974f3d7d9eb1576cdb86 Reviewed-on: http://gerrit.cloudera.org:8080/7733 Reviewed-by: Dan BurkertTested-by: Adar Dembo --- M src/kudu/rpc/CMakeLists.txt A src/kudu/rpc/periodic-test.cc A src/kudu/rpc/periodic.cc A src/kudu/rpc/periodic.h 4 files changed, 465 insertions(+), 0 deletions(-) Approvals: Dan Burkert: Looks good to me, approved Adar Dembo: Verified -- To view, visit http://gerrit.cloudera.org:8080/7733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I23f80631a5591b1ac023974f3d7d9eb1576cdb86 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] rpc: periodic timers
Adar Dembo has posted comments on this change. Change subject: rpc: periodic timers .. Patch Set 5: Verified+1 Unrelated test failure, I filed KUDU-2109 for the issue. -- To view, visit http://gerrit.cloudera.org:8080/7733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I23f80631a5591b1ac023974f3d7d9eb1576cdb86 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] [tools] Add summary-only mode to ksck
Alexey Serbin has posted comments on this change. Change subject: [tools] Add summary-only mode to ksck .. Patch Set 2: > (1 comment) > > I realized there's quite a bit more work to do to get > machine-readable output from ksck, since there's also messages > printed from the calling tool action code, and it'd be nice to > integrate checksum results and more info into the json. Let me work > some more on this and post a more complete change later. BTW, as for the machine-readable output there is an option to use libxo: https://github.com/Juniper/libxo It might worth taking a look at that in this context. -- To view, visit http://gerrit.cloudera.org:8080/7759 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3a961867a9e6a32ca91144843f9d1b8139f756f5 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-2033 (part 2). Add test for Java client failover support.
Alexey Serbin has posted comments on this change. Change subject: KUDU-2033 (part 2). Add test for Java client failover support. .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7722 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I228e08429d952f0f5b2657e0b8481366c5c572a4 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Edward FancherGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: No
[kudu-CR] consensus: use periodic timers for failure detection
Adar Dembo has posted comments on this change. Change subject: consensus: use periodic timers for failure detection .. Patch Set 5: Verified+1 Known flake in AdminCliTest.TestMoveTablet, overriding Jenkins. -- To view, visit http://gerrit.cloudera.org:8080/7735 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8acdb44e12b975fda4a226aa784db95bc7b4e330 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] disk failure: don't open tablets on failed disks
Mike Percy has posted comments on this change. Change subject: disk failure: don't open tablets on failed disks .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/7766/3/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 1702: return Status::OK(); > I'm not sure why we are returning OK here. Also, new API semantics should b also: if these are the semantics we want (not sure), shouldn't we have a unit test for this? -- To view, visit http://gerrit.cloudera.org:8080/7766 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id3fae98355657f6aa4b134c542f92fc07f5c0aa1 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] disk failure: don't open tablets on failed disks
Mike Percy has posted comments on this change. Change subject: disk failure: don't open tablets on failed disks .. Patch Set 3: (11 comments) http://gerrit.cloudera.org:8080/#/c/7766/3//COMMIT_MSG Commit Message: PS3, Line 21: Testing is done by loading data into a cluster with multi-disk : servers, failing a single directory of one of the servers, and ensuring : that the tablets spread across the failed disk get replicated upon the : next startup. how about: Testing is done by loading data into a cluster configured to use multiple directories for data blocks, failing a single directory on one of the tablet servers, and ensuring that the tablets with blocks on the failed directory get re-replicated at startup time. http://gerrit.cloudera.org:8080/#/c/7766/3/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 1702: return Status::OK(); I'm not sure why we are returning OK here. Also, new API semantics should be documented at the interface level. http://gerrit.cloudera.org:8080/#/c/7766/3/src/kudu/integration-tests/disk_failure-itest.cc File src/kudu/integration-tests/disk_failure-itest.cc: PS3, Line 43: TabletServerIntegrationTestBase Would you mind inheriting from ExternalMiniClusterITestBase instead in this class? The newer tests are inheriting from that instead. PS3, Line 96: server a tablet server PS3, Line 97: server tablet server PS3, Line 98: . while it is shut down. Line 109: write_workload.Setup(); This creates a table. Why are you creating the table yourself above? Line 110: write_workload.Start(); You should call workload.stopAndJoin() at some point during the test to shut the writer thread down again. Did you want it running this whole time? PS3, Line 114: WaitForTSAndReplicas what is the purpose of calling this function? PS3, Line 124: NO_FATALS(SetServerSurvivalFlags(ext_tservers)); > why is this not set on boot? agree http://gerrit.cloudera.org:8080/#/c/7766/3/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: Line 765: LOG(ERROR) << "Exiting bootstrapping early; tablet is in a failed directory"; how about: LOG(ERROR) << LogPrefix(tablet_id) << "aborting tablet bootstrap: tablet has data in a failed directory"; -- To view, visit http://gerrit.cloudera.org:8080/7766 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id3fae98355657f6aa4b134c542f92fc07f5c0aa1 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] util: remove old failure detector and resettable heartbeater code
Dan Burkert has posted comments on this change. Change subject: util: remove old failure detector and resettable heartbeater code .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7736 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7a8f77a56ed66e06e182a70e18b2837a264a0a4e Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] consensus: use periodic timers for failure detection
Dan Burkert has posted comments on this change. Change subject: consensus: use periodic timers for failure detection .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7735 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8acdb44e12b975fda4a226aa784db95bc7b4e330 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] consensus peers: replace bespoke Raft heartbeat logic with periodic timers
Dan Burkert has posted comments on this change. Change subject: consensus_peers: replace bespoke Raft heartbeat logic with periodic timers .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7734 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f7e1761d9f36dc6a25bd8e3e7d7a3b5c402afbf Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] rpc: periodic timers
Dan Burkert has posted comments on this change. Change subject: rpc: periodic timers .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I23f80631a5591b1ac023974f3d7d9eb1576cdb86 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] rpc: periodic timers
Dan Burkert has posted comments on this change. Change subject: rpc: periodic timers .. Patch Set 5: unrelated flake. retriggered -- To view, visit http://gerrit.cloudera.org:8080/7733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I23f80631a5591b1ac023974f3d7d9eb1576cdb86 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] consensus peers: replace bespoke Raft heartbeat logic with periodic timers
Dan Burkert has posted comments on this change. Change subject: consensus_peers: replace bespoke Raft heartbeat logic with periodic timers .. Patch Set 5: flake looked like KUDU-1736 so I retriggered it -- To view, visit http://gerrit.cloudera.org:8080/7734 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f7e1761d9f36dc6a25bd8e3e7d7a3b5c402afbf Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] rpc: periodic timers
Hello Dan Burkert, Todd Lipcon, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7733 to look at the new patch set (#5). Change subject: rpc: periodic timers .. rpc: periodic timers This patch introduces a generic periodic timer class. How does it work? 1. A timer is constructed with a user-provided task functor. 2. After Start() is called, the timer will run the functor repeatedly on a fixed (and optionally, jittered) period. 3. When Reset() is called, it'll reset the period. 4. After Stop() is called, it won't run the functor anymore. The implementation is based on Messenger::ScheduleOnReactor but it could just as easily build on libev directly. I chose to use the Messenger so that I wouldn't have to implement timer thread pooling. It's also just a generic version of the Raft heartbeat logic found in Peer (see commit 1070e76). In follow-on patches I will use this class to replace the bespoke Peer logic as well as for Raft failure detection. If you're curious, it's actually because of failure detection that both Start() and Reset() optionally accept deltas to override the default period; the snoozing code likes to provide heavily customized backoff. Change-Id: I23f80631a5591b1ac023974f3d7d9eb1576cdb86 --- M src/kudu/rpc/CMakeLists.txt A src/kudu/rpc/periodic-test.cc A src/kudu/rpc/periodic.cc A src/kudu/rpc/periodic.h 4 files changed, 465 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/7733/5 -- To view, visit http://gerrit.cloudera.org:8080/7733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I23f80631a5591b1ac023974f3d7d9eb1576cdb86 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] raft consensus-itest: robust fix for asynchronous kill
Adar Dembo has posted comments on this change. Change subject: raft_consensus-itest: robust fix for asynchronous kill .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/7561/5//COMMIT_MSG Commit Message: Line 20: repeatedly ping the paused server until it stops responding. > one idea for an even more general solution that addresses this problem ever Yeah, I remember this from an earlier discussion we had. Was hoping you'd forgotten. :) But I'll bite the bullet and try it out. -- To view, visit http://gerrit.cloudera.org:8080/7561 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99d400e971d6f9b22cc7b4483db94a98ec306e10 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] separate DataDirManager from BlockManagers
Adar Dembo has submitted this change and it was merged. Change subject: separate DataDirManager from BlockManagers .. separate DataDirManager from BlockManagers Currently, the DataDirManager is owned by the BlockManagers. Since only blocks are placed in 'data dirs' (i.e. subdirectories named 'data' under the roots specified by 'fs_data_dirs'), this hierarchy made sense. However, it would be nice to track other files that fall within 'fs_data_dirs' (e.g. tablet-metadata, consensus-metadata). Splitting the directory manager from the block manager makes this more feasible. This patch introduces a number of changes to the FsManager, BlockManager, and DataDirManager with respect to initialization. - BlockManagers no longer have a Create() function - BlockManager's dtor will now just wait for DataDir closures to finish instead of shutting down the DataDirManager as to not directly affect the DataDirManager - DataDirManagers now have two static constructors: one to open an existing layout, and another to create and open a new layout - FsManagers will only create the BlockManager when opening an fs layout - FsManagers will only Open() a DataDirManager if one has not already been constructed - A validator is added to check the value of FLAGS_block_manager before any of the above initialization can occur Change-Id: I387e4e88bd65298970a1e4201879de08fac07ace Reviewed-on: http://gerrit.cloudera.org:8080/7602 Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo--- M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.h M src/kudu/fs/data_dirs-test.cc M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/fs_manager-test.cc M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/util/path_util.cc M src/kudu/util/path_util.h 16 files changed, 443 insertions(+), 306 deletions(-) Approvals: Adar Dembo: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7602 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I387e4e88bd65298970a1e4201879de08fac07ace Gerrit-PatchSet: 14 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] separate DataDirManager from BlockManagers
Adar Dembo has posted comments on this change. Change subject: separate DataDirManager from BlockManagers .. Patch Set 13: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7602 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I387e4e88bd65298970a1e4201879de08fac07ace Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: No
[kudu-CR] consensus: use periodic timers for failure detection
Adar Dembo has posted comments on this change. Change subject: consensus: use periodic timers for failure detection .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7735/2/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: Line 319: rng_.Normal(GetFailureMonitorCheckMeanMs(), > Doesn't this patch already significantly alter the semantics of these flags You're right. I'll remove the flags entirely. -- To view, visit http://gerrit.cloudera.org:8080/7735 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8acdb44e12b975fda4a226aa784db95bc7b4e330 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1407: reassign failed tablets
Mike Percy has posted comments on this change. Change subject: KUDU-1407: reassign failed tablets .. Patch Set 17: (8 comments) http://gerrit.cloudera.org:8080/#/c/7440/17//COMMIT_MSG Commit Message: PS17, Line 30: is added this is repeated PS17, Line 31: failed tablets while running tablets that fail while running (due to what?) http://gerrit.cloudera.org:8080/#/c/7440/17/src/kudu/consensus/consensus_queue.cc File src/kudu/consensus/consensus_queue.cc: Line 629: NotifyObserversOfFailedFollower(peer_uuid, current_term, reason); nit: No need to hold the lock while calling this method. http://gerrit.cloudera.org:8080/#/c/7440/17/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: Line 170: DEFINE_bool(master_tombstone_failed_tablet_replicas, true, Should be removed per below. See master_tombstone_evicted_tablet_replica PS17, Line 2473: if (FLAGS_master_tombstone_failed_tablet_replicas) { : SendDeleteReplicaRequest(report.tablet_id(), TABLET_DATA_TOMBSTONED, :boost::none, :tablet->table(), ts_desc->permanent_uuid(), :Substitute("Tablet failed: $0", s.ToString())); : } Is this required? The leader will now evict a failed follower because of the changes in the queue in this patch. Once that eviction is committed as a new config change, the master should find out and automatically delete this replica that is part of a stale config (in a safe way that passes in cas_config_opid_index_less_or_equal). See FLAGS_master_tombstone_evicted_tablet_replicas usage in this file. http://gerrit.cloudera.org:8080/#/c/7440/17/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: PS17, Line 655: metadata Couldn't this simply happen if one of the data disks failed? PS17, Line 658: is unclear Shouldn't the contract of DeleteTabletData() be a crash-consistent one? In fact, I think it is (perhaps not well documented) from the perspective of the order in which we delete things. It's extensively tested in ts_recovery-itest. Line 752: auto fail_tablet = MakeScopedCleanup([&]() { I like this approach. -- To view, visit http://gerrit.cloudera.org:8080/7440 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345 Gerrit-PatchSet: 17 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] disk failure: don't open tablets on failed disks
David Ribeiro Alves has posted comments on this change. Change subject: disk failure: don't open tablets on failed disks .. Patch Set 3: (9 comments) http://gerrit.cloudera.org:8080/#/c/7766/3/src/kudu/fs/file_block_manager.cc File src/kudu/fs/file_block_manager.cc: PS3, Line 581: // Don't report failed directories. shoudln't we _report_ failed directories? ok if you want to do it in a follow up but this is likely one of the most important things we should report PS3, Line 693: VLOG(3) << Substitute("Block $0 is in a failed directory; not deleting", : block_id.ToString()); how about using LOG_EVERY_N at INFO level, that way we'd get something in the logs but it woudn't spam http://gerrit.cloudera.org:8080/#/c/7766/3/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: PS3, Line 1700: VLOG(3) << Substitute("Block $0 is in a failed directory; not deleting", : block_id.ToString()); same also do we have checks for the other ops? why is delete particular in this regard? http://gerrit.cloudera.org:8080/#/c/7766/3/src/kudu/integration-tests/disk_failure-itest.cc File src/kudu/integration-tests/disk_failure-itest.cc: PS3, Line 56: if (GetParam() == "file") { : flags.emplace_back("--block_manager=file"); : } else { : flags.emplace_back("--block_manager=log"); : } maybe: flags.emplace_back(Substitute("--block_manager=$0", GetParam())); PS3, Line 65: void SetupDefaultCluster(vector* ext_tservers) { : FLAGS_num_tablet_servers = 3; : vector flags = GetClusterFlags(); : NO_FATALS(CreateCluster("survivable_cluster", flags, {}, /* num_data_dirs */ 2)); : NO_FATALS(CreateClient(_)); : CHECK_EQ(FLAGS_num_tablet_servers, tablet_servers_.size()); : ext_tservers->clear(); : for (const auto& e : tablet_servers_) { : ext_tservers->push_back(cluster_->tablet_server_by_uuid(e.second->uuid())); : } : } can you add more disks? 2 is kind of special-casy because of tablet overlaps right? PS3, Line 124: NO_FATALS(SetServerSurvivalFlags(ext_tservers)); why is this not set on boot? PS3, Line 129: Substitute("--block_manager=$0", GetParam() why are you re-settting the block manager? can it change? PS3, Line 138: ASSERT_OK(cluster_->WaitForTabletsRunning(cluster_->tablet_server(0), : kNumTablets, kAgreementTimeout)); so remind me again what happened to the failed tablets? were they replicated? where? PS3, Line 137: // Ensure again that the tablets get to a running, consistent state. : ASSERT_OK(cluster_->WaitForTabletsRunning(cluster_->tablet_server(0), : kNumTablets, kAgreementTimeout)); : ASSERT_OK(WaitForServersToAgree(kAgreementTimeout, tablet_servers_, tablet_id, 1)); not sure I get this. you inject a failure, wait for it to happen and then make sure all tablets are started? and why are they agreeing on index 1, isn't that kind of a given even with the failures? -- To view, visit http://gerrit.cloudera.org:8080/7766 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id3fae98355657f6aa4b134c542f92fc07f5c0aa1 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1407: reassign failed tablets
David Ribeiro Alves has posted comments on this change. Change subject: KUDU-1407: reassign failed tablets .. Patch Set 17: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7440 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345 Gerrit-PatchSet: 17 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Mike Percy has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 22: (2 comments) http://gerrit.cloudera.org:8080/#/c/7207/22/src/kudu/fs/block_manager.cc File src/kudu/fs/block_manager.cc: Line 45: DEFINE_string(block_manager_flush_control, "finalize", > Would prefer pre-flush to pre-sync. It sounds scarier than it really is. We should at least clarify that this flag doesn't have any affect on durability, which is controlled by other flags. http://gerrit.cloudera.org:8080/#/c/7207/22/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 355: // If successful, adds all blocks to the block manager's in-memory maps. > Don't think so; the blocks are just manipulated. ah, you're right. -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 22 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hao Hao has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 22: (1 comment) http://gerrit.cloudera.org:8080/#/c/7207/22/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 1228: RETURN_NOT_OK(container_->DoCloseBlocks({ this }, LogBlockContainer::SyncMode::NO_SYNC)); > As I brought up on slack I think there may be an opportunity here to simpli Discussed with Dan and Adar offline. We came to the conclusion that it is safe to do nothing in Abort() for now. Because by doing this could either result in 'gaps' in data file or orphan blocks, and both of them can be handled today by hole repunching and blocks garbage collection. I will add a TODO in Abort() so that we can add a fine-grained handing of abortion in future. -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 22 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] separate DataDirManager from BlockManagers
Andrew Wong has posted comments on this change. Change subject: separate DataDirManager from BlockManagers .. Patch Set 13: This patch hasn't changed much; mainly been rebasing it. It's been +2'd though without significant changes since. -- To view, visit http://gerrit.cloudera.org:8080/7602 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I387e4e88bd65298970a1e4201879de08fac07ace Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: No
[kudu-CR] disk failure: add persistent disk states
Andrew Wong has posted comments on this change. Change subject: disk failure: add persistent disk states .. Patch Set 26: Addressed the comments and a slip-up on my part that caused the build to fail. This is ready for review. -- To view, visit http://gerrit.cloudera.org:8080/7270 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f Gerrit-PatchSet: 26 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1407: reassign failed tablets
Andrew Wong has posted comments on this change. Change subject: KUDU-1407: reassign failed tablets .. Patch Set 17: This patch has been +2'd, but some of the logic around failing tablet replicas overlaps with the tombstoned voting patch. Overall, this patch shouldn't need to change much to make it compatible. -- To view, visit http://gerrit.cloudera.org:8080/7440 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345 Gerrit-PatchSet: 17 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Adar Dembo has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 22: (3 comments) http://gerrit.cloudera.org:8080/#/c/7207/22/src/kudu/fs/block_manager.cc File src/kudu/fs/block_manager.cc: Line 45: DEFINE_string(block_manager_flush_control, "finalize", > I don't like the term flush because it doesn't distinguish between a memory Would prefer pre-flush to pre-sync. Also note that this is an experimental flag that largely exists for our own benchmarking, so I think the bar for documenting it is low. http://gerrit.cloudera.org:8080/#/c/7207/22/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 355: // If successful, adds all blocks to the block manager's in-memory maps. > and takes ownership of the LogWritableBlock instances, right? Don't think so; the blocks are just manipulated. PS22, Line 1119: due to KUDU-1793 > That issue is marked as RESOLVED. Is this still an issue? Yes, because KUDU-1793 existed in the wild in at least one release, so we may load on-disk deployments with misaligned blocks. -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 22 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Mike Percy has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 22: (13 comments) Looks pretty good to me, I mainly have nits. http://gerrit.cloudera.org:8080/#/c/7207/22//COMMIT_MSG Commit Message: PS22, Line 15: for an http://gerrit.cloudera.org:8080/#/c/7207/22/src/kudu/fs/block_manager.cc File src/kudu/fs/block_manager.cc: PS22, Line 34: blocks block Line 45: DEFINE_string(block_manager_flush_control, "finalize", I don't like the term flush because it doesn't distinguish between a memory buffer flush vs. an fsync. Also, from the description here it sounds like 'never' means the blocks will never be fsynced. We should clarify that this is an implicit pre-flush or pre-sync, that the global settings for syncing will still be used at close time. Perhaps we should call this flag block_manager_preflush_control instead, or block_manager_early_sync_control. PS22, Line 46: when to flush More like when to pre-flush? http://gerrit.cloudera.org:8080/#/c/7207/22/src/kudu/fs/block_manager.h File src/kudu/fs/block_manager.h: PS22, Line 70: is is set to PS22, Line 127: is is set to PS22, Line 128: asynchronous flush of dirty block data to disk Sounds good for performance. What is the crash consistency contract? Can this leak disk space? PS22, Line 283: BlockTransaction() = default; : : ~BlockTransaction() = default; nit: It seems superfluous to specify = default here, particularly since these are not something special like copy or move constructors. Am I missing something? http://gerrit.cloudera.org:8080/#/c/7207/22/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 276: // This is called after synchronization of dirty data and metadata doesn't finalization have to happen before the data is synced to disk? Line 355: // If successful, adds all blocks to the block manager's in-memory maps. and takes ownership of the LogWritableBlock instances, right? PS22, Line 458: rounds up "rounds up"? PS22, Line 459: it clarify: is "it" the block or the container? PS22, Line 1119: due to KUDU-1793 That issue is marked as RESOLVED. Is this still an issue? -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 22 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] [java] KUDU-2103 Canonicalize hostnames in client
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7757 to look at the new patch set (#3). Change subject: [java] KUDU-2103 Canonicalize hostnames in client .. [java] KUDU-2103 Canonicalize hostnames in client When Kerberos is used the service principal contains the hostname of the service. This means that if a user wants to use an alias for the master_addresses in a Kerberized environment it won't be able to connect as the service principal name is not found in the Kerberos database. This patch adds canonicalization for the hostnames to make sure the principal name the service ticket is requested for matches the one used by the master servers. Change-Id: I77204a185c1ab8e21cc833afb645543cbc0e340f --- M java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java M java/kudu-client/src/main/java/org/apache/kudu/client/ServerInfo.java M java/kudu-client/src/test/java/org/apache/kudu/client/FakeDNS.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestServerInfo.java 4 files changed, 86 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/57/7757/3 -- To view, visit http://gerrit.cloudera.org:8080/7757 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I77204a185c1ab8e21cc833afb645543cbc0e340f Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Attila BukorGerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1726: Avoid fsync-per-block in tablet copy
Dan Burkert has posted comments on this change. Change subject: KUDU-1726: Avoid fsync-per-block in tablet copy .. Patch Set 1: lgtm -- To view, visit http://gerrit.cloudera.org:8080/7701 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7534699f589c7060ffe32d7ac67546476cf21e76 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR](gh-pages) Kudu Consistency Blog Post Pt1
Andrew Wong has posted comments on this change. Change subject: Kudu Consistency Blog Post Pt1 .. Patch Set 10: (22 comments) http://gerrit.cloudera.org:8080/#/c/7019/10/_posts/2017-08-21-kudu-consistency-pt1.md File _posts/2017-08-21-kudu-consistency-pt1.md: PS10, Line 8: some of nit: remove PS10, Line 25: single machine single-machine PS10, Line 25: e thread single-threaded PS10, Line 26: m add comma PS10, Line 32: single-threaded, single-machine storage consider removing? PS10, Line 34: s add comma PS10, Line 35: , which This sentence is getting long; consider starting a new one. PS10, Line 36: in on PS10, Line 39: ' > Is this a well-known quote? If the idea here is to emphasize, consider using italics or bold, if it's a quote, use double-quotes. If neither, remove the surrounding punctuation marks. PS10, Line 47: in unbold the "in" here and below PS10, Line 68: , i.e. reads, Consider surrounding with ()s instead of commas, same below. The commas sort of break the sentence apart making it a bit more difficult to parse. PS10, Line 71: with right set with the right PS10, Line 73: (Spanner achieves 5 9's availability) Not sure if this tidbit is important unless we're planning on bringing it up later. Or maybe link to a definition Line 73: cases (Spanner achieves 5 9's availability). For the write path, we often made similar choices in Kudu. of PS10, Line 75: informed directed? PS10, Line 79: factor remove PS10, Line 94: much higher in the priority scale a much higher priority PS10, Line 137: a remove PS10, Line 143: get/set _timestamp tokens_ from/to coordinate a set of _timestamp tokens_ with clients PS10, Line 145: __STRICT SERIALIZABILITY__ Is there a reason these are caps'ed and bolded? consider lower-case and italicized, since it seems these are concepts that are being introduced here. PS10, Line 145: __LINEARIZABILITY__ and : __SERIALIZABILITY__ perhaps link to a definition? PS10, Line 146: g the user pass messages from/to clients : with the timestamp tokens having the user coordinate the timestamp tokens across clients -- To view, visit http://gerrit.cloudera.org:8080/7019 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icaec0c8ace20651a65901a8e1786e785265540d1 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Dan Burkert has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 22: (1 comment) Looks good! I'm very pleased how this came together, net-net I think the code is easier to understand with this patch. http://gerrit.cloudera.org:8080/#/c/7207/22/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 1228: RETURN_NOT_OK(container_->DoCloseBlocks({ this }, LogBlockContainer::SyncMode::NO_SYNC)); As I brought up on slack I think there may be an opportunity here to simplify DoClose and make Abort more efficient by doing nothing during abort calls (instead of writing a create + delete metadata entry). -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 22 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] util: remove old failure detector and resettable heartbeater code
Adar Dembo has posted comments on this change. Change subject: util: remove old failure detector and resettable heartbeater code .. Patch Set 4: Verified+1 Overriding Jenkins, a TSAN test failed due to an isolate failure. -- To view, visit http://gerrit.cloudera.org:8080/7736 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7a8f77a56ed66e06e182a70e18b2837a264a0a4e Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] Run the gradle build as a part of the gerrit tests
Adar Dembo has posted comments on this change. Change subject: Run the gradle build as a part of the gerrit tests .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/7651/6/build-support/jenkins/build-and-test.sh File build-support/jenkins/build-and-test.sh: Line 385: # Rerun the build using the Gradle build. Clean up the trailing whitespace in this block of new code. -- To view, visit http://gerrit.cloudera.org:8080/7651 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1430cbd05ff78a69f2439e3a8f90e1ddde83a8d7 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Adar Dembo has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 22: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 22 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: No
[kudu-CR] [java] KUDU-2103 Canonicalize hostnames in client
Attila Bukor has posted comments on this change. Change subject: [java] KUDU-2103 Canonicalize hostnames in client .. Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/7757/1/java/kudu-client/src/main/java/org/apache/kudu/client/ServerInfo.java File java/kudu-client/src/main/java/org/apache/kudu/client/ServerInfo.java: Line 74: public String getHostname() { > how about getAndCanonicalizeHostname()? that way since it has a verb in it deal http://gerrit.cloudera.org:8080/#/c/7757/1/java/kudu-client/src/test/java/org/apache/kudu/client/FakeDNS.java File java/kudu-client/src/test/java/org/apache/kudu/client/FakeDNS.java: Line 64: Method method = InetAddress.class.getDeclaredMethod("getCanonicalHostName"); > unused? yep, tried a different approach for this first, somehow forgot to delete it... PS1, Line 80: if (host.equals("master1.example.com") || host.equals("server123.example.com")) { : return new InetAddress[]{InetAddress.getByAddress(new byte[]{10, 1, 2, 3})}; : } > I think it's a bit messy to have this forward/reverse mapping hard-coded in good idea, thanks http://gerrit.cloudera.org:8080/#/c/7757/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestServerInfo.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestServerInfo.java: Line 41: public void testGetUnknownHostname() throws Exception { > can you add some comments here explaining what these test cases are asserti yep, will add Line 47: InetAddress ia = InetAddress.getByName("8.8.8.8"); > I don't understand the comment above. Can you make it a full sentence and d I think it's not actually necessary for my tests, I just copied it from the KUDU-1982 test case. Rewriting it. -- To view, visit http://gerrit.cloudera.org:8080/7757 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I77204a185c1ab8e21cc833afb645543cbc0e340f Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Attila BukorGerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [java] KUDU-2103 Canonicalize hostnames in client
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7757 to look at the new patch set (#2). Change subject: [java] KUDU-2103 Canonicalize hostnames in client .. [java] KUDU-2103 Canonicalize hostnames in client When Kerberos is used the service principal contains the hostname of the service. This means that if a user wants to use an alias for the master_addresses in a Kerberized environment it won't be able to connect as the service principal name is not found in the Kerberos database. This patch adds canonicalization for the hostnames to make sure the principal name the service ticket is requested for matches the one used by the master servers. Change-Id: I77204a185c1ab8e21cc833afb645543cbc0e340f --- M java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java M java/kudu-client/src/main/java/org/apache/kudu/client/ServerInfo.java M java/kudu-client/src/test/java/org/apache/kudu/client/FakeDNS.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestServerInfo.java 4 files changed, 84 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/57/7757/2 -- To view, visit http://gerrit.cloudera.org:8080/7757 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I77204a185c1ab8e21cc833afb645543cbc0e340f Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Attila BukorGerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [iwyu] a minor clean-up after recent merges
Alexey Serbin has submitted this change and it was merged. Change subject: [iwyu] a minor clean-up after recent merges .. [iwyu] a minor clean-up after recent merges After recent updates, some file became not compliant with IWYU. This patch fixes that to have the IWYU in the clean state since the IWYU configuration is already enabled for gerrit pre-commit verification. I also sneaked in a few other updates on other files to remove them from the 'muted' list in build-support/iwyu/iwyu-filter.awk Change-Id: I052056efc32b5173c51cd3f0f939e19e604a7091 Reviewed-on: http://gerrit.cloudera.org:8080/7773 Tested-by: Alexey SerbinReviewed-by: Todd Lipcon --- M build-support/iwyu/iwyu-filter.awk M src/kudu/client/master_rpc.cc M src/kudu/client/master_rpc.h M src/kudu/consensus/log.cc M src/kudu/consensus/mt-log-test.cc M src/kudu/consensus/raft_consensus.h M src/kudu/integration-tests/create-table-stress-test.cc M src/kudu/integration-tests/external_mini_cluster-itest-base.h M src/kudu/integration-tests/external_mini_cluster_fs_inspector.cc M src/kudu/integration-tests/internal_mini_cluster-itest-base.h M src/kudu/integration-tests/internal_mini_cluster.cc M src/kudu/integration-tests/internal_mini_cluster.h M src/kudu/integration-tests/log_verifier.cc M src/kudu/integration-tests/master_cert_authority-itest.cc M src/kudu/integration-tests/master_replication-itest.cc M src/kudu/integration-tests/table_locations-itest.cc M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/master/sys_catalog-test.cc M src/kudu/rpc/connection.h M src/kudu/tablet/memrowset.h M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_replica-test.cc M src/kudu/tools/ksck.cc M src/kudu/tools/kudu-tool-test.cc M src/kudu/tserver/tablet_copy_source_session.cc M src/kudu/tserver/tablet_server_test_util.cc M src/kudu/tserver/tablet_service.cc M src/kudu/util/net/sockaddr.cc 29 files changed, 38 insertions(+), 37 deletions(-) Approvals: Todd Lipcon: Looks good to me, approved Alexey Serbin: Verified -- To view, visit http://gerrit.cloudera.org:8080/7773 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I052056efc32b5173c51cd3f0f939e19e604a7091 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] raft consensus-itest: robust fix for asynchronous kill
Todd Lipcon has posted comments on this change. Change subject: raft_consensus-itest: robust fix for asynchronous kill .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/7561/5//COMMIT_MSG Commit Message: Line 20: repeatedly ping the paused server until it stops responding. one idea for an even more general solution that addresses this problem everywhere instead of at each call site: what if ExternalDaemon::Pause() polled /proc//status waiting for the process state to be 'T' (stopped)? it's not osx-compatible but I don't think we are really worried about 0.1% flakes on osx. -- To view, visit http://gerrit.cloudera.org:8080/7561 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99d400e971d6f9b22cc7b4483db94a98ec306e10 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [java] KUDU-2103 Canonicalize hostnames in client
Todd Lipcon has posted comments on this change. Change subject: [java] KUDU-2103 Canonicalize hostnames in client .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7757/1/java/kudu-client/src/main/java/org/apache/kudu/client/ServerInfo.java File java/kudu-client/src/main/java/org/apache/kudu/client/ServerInfo.java: Line 74: public String getHostname() { > makes sense. what if I change the method's name to getCanonicalHostname() o how about getAndCanonicalizeHostname()? that way since it has a verb in it beyond just 'get' it is more clear that it's going to do some work. -- To view, visit http://gerrit.cloudera.org:8080/7757 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I77204a185c1ab8e21cc833afb645543cbc0e340f Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Attila BukorGerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [security] avoid tickets with NULL 'renew till'
Alexey Serbin has posted comments on this change. Change subject: [security] avoid tickets with NULL 'renew_till' .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/7770/2/src/kudu/security/init.cc File src/kudu/security/init.cc: PS2, Line 295: > If we just do renew_deadline = (renew_till - 30), tickets with renew lifet I might be missing some crucial point here, but if renew_deadline is negative, that's fine with the if (ticket_expiry < now || (now + ticket_lifetime) > renew_deadline) condition, right? So, a ticket with lifetime of less than 30 seconds would be re-acquired here instead of renewing. Would be that a problem? -- To view, visit http://gerrit.cloudera.org:8080/7770 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I59194af94838f680df4ce121a8dee526a876e369 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [java] KUDU-2103 Canonicalize hostnames in client
Attila Bukor has posted comments on this change. Change subject: [java] KUDU-2103 Canonicalize hostnames in client .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7757/1/java/kudu-client/src/main/java/org/apache/kudu/client/ServerInfo.java File java/kudu-client/src/main/java/org/apache/kudu/client/ServerInfo.java: Line 74: public String getHostname() { > I think it's bad form to hide a potentially-expensive DNS request in what l makes sense. what if I change the method's name to getCanonicalHostname() or add a new method called that? that should imply there's a hostname resolution happening and this looks like an internal API so a change like that shouldn't break anything. Or should I just put it in Connection? -- To view, visit http://gerrit.cloudera.org:8080/7757 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I77204a185c1ab8e21cc833afb645543cbc0e340f Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Attila BukorGerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [iwyu] a minor clean-up after recent merges
Todd Lipcon has posted comments on this change. Change subject: [iwyu] a minor clean-up after recent merges .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7773 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I052056efc32b5173c51cd3f0f939e19e604a7091 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] [iwyu] a minor clean-up after recent merges
Alexey Serbin has posted comments on this change. Change subject: [iwyu] a minor clean-up after recent merges .. Patch Set 2: Verified+1 unrelated flake in delete_table-itest.0 -- To view, visit http://gerrit.cloudera.org:8080/7773 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I052056efc32b5173c51cd3f0f939e19e604a7091 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: No
[kudu-CR] KUDU-871. Support tombstoned voting
Alexey Serbin has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting .. Patch Set 10: (12 comments) http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/integration-tests/cluster_itest_util.h File src/kudu/integration-tests/cluster_itest_util.h: PS10, Line 236: // Request the given replica to vote. nit: could you document at least first 5 parameters of this method? http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/integration-tests/tombstoned_voting-itest.cc File src/kudu/integration-tests/tombstoned_voting-itest.cc: Line 18: #include "kudu/consensus/metadata.pb.h" nit: from IWYU src/kudu/integration-tests/tombstoned_voting-itest.cc should add these lines: #include #include #include #include #include #include #include #include #include #include #include "kudu/consensus/consensus.pb.h" #include "kudu/consensus/opid.pb.h" #include "kudu/consensus/opid_util.h" #include "kudu/fs/fs_manager.h" #include "kudu/gutil/ref_counted.h" #include "kudu/integration-tests/internal_mini_cluster.h" #include "kudu/tablet/metadata.pb.h" #include "kudu/tserver/tserver.pb.h" #include "kudu/util/env.h" #include "kudu/util/monotime.h" #include "kudu/util/status.h" #include "kudu/util/test_macros.h" #include "kudu/util/test_util.h" PS10, Line 167: TS1 nit: the tablet? Or it's possible to tombstone a tablet server now? PS10, Line 172: TServerDetails* ts1_ets nit: consider moving this closer to the call site where it's used. Line 173: scoped_refptr replica; nit: does it make sense to verify the scenario below works without errors even after restarting the tablet server TS1? PS10, Line 181: "A" Could you add a comment explaining that the actual candidate UUID is not crucial in this test? PS10, Line 249: ts0_replica->consensus()->StepDown() nit: should it be some selective error handling based on the code set in the 'resp'? What if it failed due to some other reason -- would the test handle that properly? PS10, Line 253: ts0 nit: 'TS0' since it has been referred as that already. PS10, Line 256: FOLLOWER Could it change to something like 'LEARNER' in the middle and bring some flakiness? http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/integration-tests/tombstoned_voting-stress-test.cc File src/kudu/integration-tests/tombstoned_voting-stress-test.cc: Line 18: #include nit: from IWYU report: src/kudu/integration-tests/tombstoned_voting-stress-test.cc should add these lines: #include #include #include #include #include #include #include #include #include #include "kudu/common/wire_protocol.pb.h" #include "kudu/consensus/consensus.pb.h" #include "kudu/consensus/opid.pb.h" #include "kudu/gutil/macros.h" #include "kudu/integration-tests/external_mini_cluster.h" #include "kudu/integration-tests/external_mini_cluster_fs_inspector.h" #include "kudu/tablet/metadata.pb.h" #include "kudu/util/monotime.h" #include "kudu/util/net/net_util.h" #include "kudu/util/status.h" #include "kudu/util/test_macros.h" #include "kudu/util/test_util.h" src/kudu/integration-tests/tombstoned_voting-stress-test.cc should remove these lines: - #include "kudu/consensus/metadata.pb.h" // lines 25-25 - #include "kudu/consensus/raft_consensus.h" // lines 26-26 - #include "kudu/util/random.h" // lines 32-32 http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: Line 2489: LOG(WARNING) << "Tablet " << tablet->ToString() << " has failed on TS " wouldn't it be natural to have tablet::SHUTDOWN check here instead? Or it's not guaranteed to be SHUTDOWN at this point? http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: PS10, Line 304: LOG(DFATAL) The rest of the CHECKs would work for non-debug builds as well. What is the reason of having LOG(DFATAL), not LOG(FATAL) here? If there is any, please add a comment. -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-871. Support tombstoned voting
Todd Lipcon has posted comments on this change. Change subject: KUDU-871. Support tombstoned voting .. Patch Set 10: (6 comments) http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: Line 1856: CHECK_EQ(kStopping, state_) << State_Name(state_); this is already CHECKed in SetStateUnlocked, no? Line 1863: cmeta_->set_leader_uuid(""); ClearLeaderUnlocked() ? http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/integration-tests/delete_tablet-itest.cc File src/kudu/integration-tests/delete_tablet-itest.cc: Line 95: // Tablet bootstrap failure should result in a SHUTDOWN tablet with an error per discussion on slack, let's revert this part of this patch since it isn't directly related and has external effects http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/integration-tests/tombstoned_voting-itest.cc File src/kudu/integration-tests/tombstoned_voting-itest.cc: PS10, Line 263: error (nee failed per other comment, back out this change PS10, Line 294: Shut each TS down http://gerrit.cloudera.org:8080/#/c/6960/10/src/kudu/integration-tests/tombstoned_voting-stress-test.cc File src/kudu/integration-tests/tombstoned_voting-stress-test.cc: PS10, Line 57: : num_workers_(1), this test seems to be implemented for multiple workers but only has one worker. Should this be >1? or should the code be simplified? -- To view, visit http://gerrit.cloudera.org:8080/6960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia19d75b185299443b27f41e468bbae20065e7570 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] disk failure: add persistent disk states
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7270 to look at the new patch set (#26). Change subject: disk failure: add persistent disk states .. disk failure: add persistent disk states This patch adds a new version of the path set so failed disks are not used when a server is run. A previously failed disk may continue to produce IOErrors after restart, but we may still like to start the server without the disk. To accomplish this, disk states, path names, and a timestamp are added to the path set. If any disks are failed when opening the initial FS layout, an 'unhealthy' instance with no path set metadata will be created instead of failing the startup process. CheckIntegrity() is updated to accomodate this. Rather than comparing all instances against an agreed-upon set of UUIDs, the single path set with the largest timestamp is used as the main one to compare against (if only old path sets are available, the first healthy one is used). If no instances are healthy, CheckIntegrity() will fail, as all disks are failed. Testing is done in a new iteration of CheckIntegrity(). Further testing is done in data_dirs-test, fs_manager-test, and log_block_manager-test to ensure the FS layout can be loaded with a failed directory. Some notes: - Disk failures during FS layout creation are not tolerated. In these cases, there is presumably no data on the server anyway, so operators should easily be able to fix their cluster. - If there are any unhealthy instances when upgrading the path sets (i.e. adding disk states, paths, timestamp), a complete mapping of UUIDs to paths will not be available, and CheckIntegrity() will fail. - The main path set's disk states are updated to reflect the failure of the instances. Since data on a failed disk is not used, disks that are successfully read from but are already marked FAILED by the main path set are not marked HEALTHY. - In the case of a server restart where all healthy disks fail to start up and some known failed disks start working again, the server will successfully start up with the bad disks This patch is a part of a series of patches to handle disk failures. To see how this fits, see 2.6 in: https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f --- M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager_util-test.cc M src/kudu/fs/block_manager_util.cc M src/kudu/fs/block_manager_util.h M src/kudu/fs/data_dirs-test.cc M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h M src/kudu/fs/fs.proto M src/kudu/fs/fs_manager-test.cc M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc 14 files changed, 1,213 insertions(+), 291 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/70/7270/26 -- To view, visit http://gerrit.cloudera.org:8080/7270 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f Gerrit-PatchSet: 26 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] disk failure: add persistent disk states
Andrew Wong has posted comments on this change. Change subject: disk failure: add persistent disk states .. Patch Set 25: (3 comments) http://gerrit.cloudera.org:8080/#/c/7270/25/src/kudu/fs/block_manager_util.cc File src/kudu/fs/block_manager_util.cc: Line 40: DECLARE_bool(enable_data_block_fsync); > warning: redundant 'FLAGS_enable_data_block_fsync' declaration [readability Fixed the forward decl warning, will see if it still complains. When I try removing the flag, it complains and doesn't compile. http://gerrit.cloudera.org:8080/#/c/7270/25/src/kudu/fs/data_dirs.cc File src/kudu/fs/data_dirs.cc: Line 122: using std::unordered_map; > warning: using decl 'unordered_map' is unused [misc-unused-using-decls] Done http://gerrit.cloudera.org:8080/#/c/7270/24/src/kudu/fs/fs_manager.cc File src/kudu/fs/fs_manager.cc: Line 242: } > Failed to what? Canonicalize, right? Done -- To view, visit http://gerrit.cloudera.org:8080/7270 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f Gerrit-PatchSet: 25 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR](gh-pages) Kudu Consistency Blog Post Pt1
Will Berkeley has posted comments on this change. Change subject: Kudu Consistency Blog Post Pt1 .. Patch Set 10: (15 comments) http://gerrit.cloudera.org:8080/#/c/7019/10/_posts/2017-08-21-kudu-consistency-pt1.md File _posts/2017-08-21-kudu-consistency-pt1.md: PS10, Line 7: what is currently implemented Consider "current features" instead. PS10, Line 7: what are the next step Consider just "next steps". PS10, Line 8: into on PS10, Line 12: we review/introduce some concepts This sentence has the same purpose as the last sentence of the previous paragraph, and is closely related to the first sentence of the post. Could all three be combined into one sentence, or two adjacent sentences? I suggest either introducing the series and first post in the first two sentences, or replacing the first sentence with a "hook", something that concisely and informally describes what consistency is and why it's important, and then make the last two sentences of the first paragraph about the content of the series and the content of the first post. PS10, Line 23: , Remove PS10, Line 25: a PS10, Line 26: would happen happen PS10, Line 37: having to deal that have to deal PS10, Line 39: ' Is this a well-known quote? PS10, Line 55: , ; PS10, Line 66: that execute Remove PS10, Line 106: future feature Should we link to a JIRA here? PS10, Line 121: kudu Kudu PS10, Line 122: with the particularity that causally related mutations : may be required to have increasing timestamps. What are "causally-related mutations"? PS10, Line 133: more Remove -- To view, visit http://gerrit.cloudera.org:8080/7019 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Icaec0c8ace20651a65901a8e1786e785265540d1 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-Owner: David Ribeiro AlvesGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hao Hao has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 22: (1 comment) http://gerrit.cloudera.org:8080/#/c/7207/20/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: PS20, Line 883: > Should be with a pointer (*), actually. Done -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 22 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hao Hao has posted comments on this change. Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. Patch Set 22: (7 comments) http://gerrit.cloudera.org:8080/#/c/7207/21/src/kudu/fs/block_manager-test.cc File src/kudu/fs/block_manager-test.cc: PS21, Line 522: Random rand(SeedRandom()); : vectordirty_bl > This is no longer true and can be removed, right? Right. Removed. http://gerrit.cloudera.org:8080/#/c/7207/21/src/kudu/fs/file_block_manager.cc File src/kudu/fs/file_block_manager.cc: Line 63: DECLARE_bool(block_manager_lock_dirs); > Not needed anymore? Done http://gerrit.cloudera.org:8080/#/c/7207/21/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 79: DECLARE_bool(enable_data_block_fsync); > Can be removed. Done PS21, Line 528: ck is ful > , marking Done PS21, Line 905: > Use DCHECK_EQ Done PS21, Line 1054: e conta > depending Done PS21, Line 1054: n isn' > place Done -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 22 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao Hao Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-1943: Add BlockTransaction to Block Manager
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7207 to look at the new patch set (#22). Change subject: KUDU-1943: Add BlockTransaction to Block Manager .. KUDU-1943: Add BlockTransaction to Block Manager This adds a new layer of abstraction 'BlockTransaction' to Block Manager, to accumulate new blocks in a logical operation, e.g. flush/compaction. Avoid using fsync() per block can potentionaly reduce a lot I/O overhead. This patch also add a new API, Finalize(), to Block Manager, so that for LBM container can be reused without flushing in-flight writable blocks to disk. This can bring down the log block container number and disk space consumption for first few flush/compaction operations when using log block manager. I performed manual test which creates a table with two tablets, 'kudu perf loadgen localhost --num-rows-per-thread=2000 --num-threads=10 --keep-auto-table --table_num_buckets=2'. With this patch, 'log_block_manager_containers' dropped from 56 to 7. A design doc can be found here: https://docs.google.com/document/d/1cB8pDs-Ho7p2HlzkWo1d-wmXr90YntIjdvdOv7u7gn0/edit?usp=sharing Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd --- M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.cc M src/kudu/fs/block_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/multi_column_writer.cc M src/kudu/tablet/multi_column_writer.h 20 files changed, 575 insertions(+), 449 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/7207/22 -- To view, visit http://gerrit.cloudera.org:8080/7207 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I01f3ed830cc1f86d7bb8a66b2f780914a4bd Gerrit-PatchSet: 22 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] consensus: use periodic timers for failure detection
Dan Burkert has posted comments on this change. Change subject: consensus: use periodic timers for failure detection .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7735/2/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: Line 319: rng_.Normal(GetFailureMonitorCheckMeanMs(), > I tried to preserve these semantics to minimize the scope of the patch. And Doesn't this patch already significantly alter the semantics of these flags? They are no longer having any effect on the steady-state failure detector. -- To view, visit http://gerrit.cloudera.org:8080/7735 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8acdb44e12b975fda4a226aa784db95bc7b4e330 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] disk failure: add persistent disk states
Andrew Wong has posted comments on this change. Change subject: disk failure: add persistent disk states .. Patch Set 25: (12 comments) http://gerrit.cloudera.org:8080/#/c/7270/24/src/kudu/fs/block_manager_util.cc File src/kudu/fs/block_manager_util.cc: PS24, Line 54: i > How about _s and _s2? Or _s and _s_prepended? Flipping the underscore posit Good point, done. PS24, Line 106: new_path.set_uuid(all_uuids[i]); : new_path.set_path(all_paths[i]); : new_path.set_health_state(PathHealthStatePB::HEALTHY); > Wouldn't it be fewer LOC to add one UUID at a time to deprecated_all_uuids Done http://gerrit.cloudera.org:8080/#/c/7270/24/src/kudu/fs/data_dirs-test.cc File src/kudu/fs/data_dirs-test.cc: Line 356: > Add a comment explaining this is here to bypass the DataDirsTest::SetUp im Done http://gerrit.cloudera.org:8080/#/c/7270/24/src/kudu/fs/data_dirs.cc File src/kudu/fs/data_dirs.cc: PS24, Line 523: TabletsByUuidIndex > using Done Line 653: } > Didn't we already do this above? Done http://gerrit.cloudera.org:8080/#/c/7270/24/src/kudu/fs/data_dirs.h File src/kudu/fs/data_dirs.h: PS24, Line 236: std::unique_ptr* dd_manager); : static Status OpenExistingForTests(Env* env, std::vector data_fs_roots, : DataDirManagerOptions opts, : std::unique_ptr* dd_manager); : : // Constructs a directory manager and creates its necessary files on-dis > Can we suffix these with "ForTests"? It's temporary until canonicalization Done http://gerrit.cloudera.org:8080/#/c/7270/24/src/kudu/fs/fs_manager.cc File src/kudu/fs/fs_manager.cc: PS24, Line 207: if (PREDICT_FALSE(!s.ok())) { : if (s.IsDiskFailure()) { > Update. Done PS24, Line 355: s nothing in it. > How about "at least one directory" Done http://gerrit.cloudera.org:8080/#/c/7270/24/src/kudu/fs/log_block_manager-test.cc File src/kudu/fs/log_block_manager-test.cc: Line 35: #include "kudu/util/metrics.h" > Not used? Done PS24, Line 1301: athSegments(tes > You mean 'right directory'? Done http://gerrit.cloudera.org:8080/#/c/7270/24/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 1571: } > No longer used? Done Line 1932: vectorneed_repunching; > Why these changes and the other macro changes below? At this point, the error notification callback (TSTabletManager::FailTabletsInDataDir()) has not been registered. It gets registered when the tablet server is initialized, but block manager gets initialized as part of the kserver. I added a comment above the macros noting this. There are also some changes in this iteration of log_block_manager-test to test this codepath. -- To view, visit http://gerrit.cloudera.org:8080/7270 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f Gerrit-PatchSet: 25 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [iwyu] a minor clean-up after recent merges
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7773 to look at the new patch set (#2). Change subject: [iwyu] a minor clean-up after recent merges .. [iwyu] a minor clean-up after recent merges After recent updates, some file became not compliant with IWYU. This patch fixes that to have the IWYU in the clean state since the IWYU configuration is already enabled for gerrit pre-commit verification. I also sneaked in a few other updates on other files to remove them from the 'muted' list in build-support/iwyu/iwyu-filter.awk Change-Id: I052056efc32b5173c51cd3f0f939e19e604a7091 --- M build-support/iwyu/iwyu-filter.awk M src/kudu/client/master_rpc.cc M src/kudu/client/master_rpc.h M src/kudu/consensus/log.cc M src/kudu/consensus/mt-log-test.cc M src/kudu/consensus/raft_consensus.h M src/kudu/integration-tests/create-table-stress-test.cc M src/kudu/integration-tests/external_mini_cluster-itest-base.h M src/kudu/integration-tests/external_mini_cluster_fs_inspector.cc M src/kudu/integration-tests/internal_mini_cluster-itest-base.h M src/kudu/integration-tests/internal_mini_cluster.cc M src/kudu/integration-tests/internal_mini_cluster.h M src/kudu/integration-tests/log_verifier.cc M src/kudu/integration-tests/master_cert_authority-itest.cc M src/kudu/integration-tests/master_replication-itest.cc M src/kudu/integration-tests/table_locations-itest.cc M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/master/sys_catalog-test.cc M src/kudu/rpc/connection.h M src/kudu/tablet/memrowset.h M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_replica-test.cc M src/kudu/tools/ksck.cc M src/kudu/tools/kudu-tool-test.cc M src/kudu/tserver/tablet_copy_source_session.cc M src/kudu/tserver/tablet_server_test_util.cc M src/kudu/tserver/tablet_service.cc M src/kudu/util/net/sockaddr.cc 29 files changed, 38 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/73/7773/2 -- To view, visit http://gerrit.cloudera.org:8080/7773 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I052056efc32b5173c51cd3f0f939e19e604a7091 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] consensus peers: replace bespoke Raft heartbeat logic with periodic timers
Dan Burkert has posted comments on this change. Change subject: consensus_peers: replace bespoke Raft heartbeat logic with periodic timers .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7734 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5f7e1761d9f36dc6a25bd8e3e7d7a3b5c402afbf Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] rpc: periodic timers
Dan Burkert has posted comments on this change. Change subject: rpc: periodic timers .. Patch Set 4: Code-Review+1 (1 comment) LGTM modulo the typo http://gerrit.cloudera.org:8080/#/c/7733/4/src/kudu/rpc/periodic.h File src/kudu/rpc/periodic.h: PS4, Line 68: - + -- To view, visit http://gerrit.cloudera.org:8080/7733 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I23f80631a5591b1ac023974f3d7d9eb1576cdb86 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] separate DataDirManager from BlockManagers
Hello Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7602 to look at the new patch set (#13). Change subject: separate DataDirManager from BlockManagers .. separate DataDirManager from BlockManagers Currently, the DataDirManager is owned by the BlockManagers. Since only blocks are placed in 'data dirs' (i.e. subdirectories named 'data' under the roots specified by 'fs_data_dirs'), this hierarchy made sense. However, it would be nice to track other files that fall within 'fs_data_dirs' (e.g. tablet-metadata, consensus-metadata). Splitting the directory manager from the block manager makes this more feasible. This patch introduces a number of changes to the FsManager, BlockManager, and DataDirManager with respect to initialization. - BlockManagers no longer have a Create() function - BlockManager's dtor will now just wait for DataDir closures to finish instead of shutting down the DataDirManager as to not directly affect the DataDirManager - DataDirManagers now have two static constructors: one to open an existing layout, and another to create and open a new layout - FsManagers will only create the BlockManager when opening an fs layout - FsManagers will only Open() a DataDirManager if one has not already been constructed - A validator is added to check the value of FLAGS_block_manager before any of the above initialization can occur Change-Id: I387e4e88bd65298970a1e4201879de08fac07ace --- M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.h M src/kudu/fs/data_dirs-test.cc M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/fs_manager-test.cc M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/util/path_util.cc M src/kudu/util/path_util.h 16 files changed, 443 insertions(+), 306 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/02/7602/13 -- To view, visit http://gerrit.cloudera.org:8080/7602 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I387e4e88bd65298970a1e4201879de08fac07ace Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] disk failure: add persistent disk states
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7270 to look at the new patch set (#25). Change subject: disk failure: add persistent disk states .. disk failure: add persistent disk states This patch adds a new version of the path set so failed disks are not used when a server is run. A previously failed disk may continue to produce IOErrors after restart, but we may still like to start the server without the disk. To accomplish this, disk states, path names, and a timestamp are added to the path set. If any disks are failed when opening the initial FS layout, an 'unhealthy' instance with no path set metadata will be created instead of failing the startup process. CheckIntegrity() is updated to accomodate this. Rather than comparing all instances against an agreed-upon set of UUIDs, the single path set with the largest timestamp is used as the main one to compare against (if only old path sets are available, the first healthy one is used). If no instances are healthy, CheckIntegrity() will fail, as all disks are failed. Testing is done in a new iteration of CheckIntegrity(). Further testing is done in data_dirs-test, fs_manager-test, and log_block_manager-test to ensure the FS layout can be loaded with a failed directory. Some notes: - Disk failures during FS layout creation are not tolerated. In these cases, there is presumably no data on the server anyway, so operators should easily be able to fix their cluster. - If there are any unhealthy instances when upgrading the path sets (i.e. adding disk states, paths, timestamp), a complete mapping of UUIDs to paths will not be available, and CheckIntegrity() will fail. - The main path set's disk states are updated to reflect the failure of the instances. Since data on a failed disk is not used, disks that are successfully read from but are already marked FAILED by the main path set are not marked HEALTHY. - In the case of a server restart where all healthy disks fail to start up and some known failed disks start working again, the server will successfully start up with the bad disks This patch is a part of a series of patches to handle disk failures. To see how this fits, see 2.6 in: https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f --- M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager_util-test.cc M src/kudu/fs/block_manager_util.cc M src/kudu/fs/block_manager_util.h M src/kudu/fs/data_dirs-test.cc M src/kudu/fs/data_dirs.cc M src/kudu/fs/data_dirs.h M src/kudu/fs/fs.proto M src/kudu/fs/fs_manager-test.cc M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc 14 files changed, 1,217 insertions(+), 293 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/70/7270/25 -- To view, visit http://gerrit.cloudera.org:8080/7270 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f Gerrit-PatchSet: 25 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [tools] Add summary-only mode to ksck
Will Berkeley has posted comments on this change. Change subject: [tools] Add summary-only mode to ksck .. Patch Set 2: (1 comment) I realized there's quite a bit more work to do to get machine-readable output from ksck, since there's also messages printed from the calling tool action code, and it'd be nice to integrate checksum results and more info into the json. Let me work some more on this and post a more complete change later. http://gerrit.cloudera.org:8080/#/c/7759/2/src/kudu/tools/ksck.cc File src/kudu/tools/ksck.cc: Line 57: "If set, ksck prints only summary information. Use with the --format flag to " > Does this description still make sense when also doing a checksum check? In does, in the sense that the checksum output will also be suppressed (though still reflected in the return status/value of ksck). -- To view, visit http://gerrit.cloudera.org:8080/7759 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3a961867a9e6a32ca91144843f9d1b8139f756f5 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] [iwyu] a minor clean-up after recent merges
Alexey Serbin has uploaded a new change for review. http://gerrit.cloudera.org:8080/7773 Change subject: [iwyu] a minor clean-up after recent merges .. [iwyu] a minor clean-up after recent merges After recent updates, some file became not compliant with IWYU. This patch fixes that to have the IWYU in the clean state since the IWYU configuration is already enabled for gerrit pre-commit verification. I also sneaked in a few other updates on other files to remove them from the 'muted' list in build-support/iwyu/iwyu-filter.awk Change-Id: I052056efc32b5173c51cd3f0f939e19e604a7091 --- M build-support/iwyu/iwyu-filter.awk M src/kudu/client/master_rpc.cc M src/kudu/client/master_rpc.h M src/kudu/consensus/log.cc M src/kudu/consensus/mt-log-test.cc M src/kudu/consensus/raft_consensus.h M src/kudu/integration-tests/create-table-stress-test.cc M src/kudu/integration-tests/external_mini_cluster-itest-base.h M src/kudu/integration-tests/external_mini_cluster_fs_inspector.cc M src/kudu/integration-tests/internal_mini_cluster-itest-base.h M src/kudu/integration-tests/internal_mini_cluster.cc M src/kudu/integration-tests/internal_mini_cluster.h M src/kudu/integration-tests/log_verifier.cc M src/kudu/integration-tests/master_cert_authority-itest.cc M src/kudu/integration-tests/master_replication-itest.cc M src/kudu/integration-tests/table_locations-itest.cc M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/integration-tests/ts_tablet_manager-itest.cc M src/kudu/master/sys_catalog-test.cc M src/kudu/rpc/connection.h M src/kudu/tablet/memrowset.h M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_replica-test.cc M src/kudu/tools/ksck.cc M src/kudu/tools/kudu-tool-test.cc M src/kudu/tserver/tablet_copy_source_session.cc M src/kudu/tserver/tablet_server_test_util.cc M src/kudu/tserver/tablet_service.cc M src/kudu/util/net/sockaddr.cc 29 files changed, 37 insertions(+), 36 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/73/7773/1 -- To view, visit http://gerrit.cloudera.org:8080/7773 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I052056efc32b5173c51cd3f0f939e19e604a7091 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin
[kudu-CR] [security] avoid tickets with NULL 'renew till'
Sailesh Mukil has posted comments on this change. Change subject: [security] avoid tickets with NULL 'renew_till' .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/7770/2//COMMIT_MSG Commit Message: PS2, Line 7: Adjust kerberos renewal logic to avoid tickets with NULL 'renew_till' timestamp > nit: could you shorten it to fit about 50 symbols, like Done http://gerrit.cloudera.org:8080/#/c/7770/2/src/kudu/security/init.cc File src/kudu/security/init.cc: PS2, Line 295: std::min(static_cast(30), ticket_lifetime) > nit: consider using Done PS2, Line 295: ticket_lifetime > Maybe, then compute renew_deadline as If we just do renew_deadline = (renew_till - 30), tickets with renew lifetime less than 30s will cause the renew_deadline to have a negative value. The idea is to add some slop for tickets that have renew lifetimes > 30s. For tickets that have < 30s renew lifetimes, we just end up renewing at the time we hit the deadline. -- To view, visit http://gerrit.cloudera.org:8080/7770 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I59194af94838f680df4ce121a8dee526a876e369 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [security] avoid tickets with NULL 'renew till'
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7770 to look at the new patch set (#3). Change subject: [security] avoid tickets with NULL 'renew_till' .. [security] avoid tickets with NULL 'renew_till' It was found that if we use a file based credential cache that is shared between the C++ side and the java side of a process, and we encounter the specific edge case where we renew a ticket that has less than 'ticket_lifetime' left before its 'renew_lifetime' expires, the ticket is set to have a NULL 'renew_till' timestamp. Eg: ticket_lifetime = 10m renew_lifetime = 100m [current ticket being renewed] at '15:30:00' endtime = '15:30:30' renew_till = '15:31:00' This ticket will be renewed and the renewed ticket will have the following values: endtime = '15:31:00' renew_till = null The Java krb5 library refuses to read these kinds of tickets which have the RENEWABLE flag set but no 'renew_till' set, causing unexpected failures. We work around this by reacquiring a new ticket instead of renewing the existing ticket if there is less that 'ticket_lifetime' left between now and the 'renew_till' deadline. Tracked on the Java side by: http://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8186576 Change-Id: I59194af94838f680df4ce121a8dee526a876e369 --- M src/kudu/integration-tests/external_mini_cluster-test.cc M src/kudu/security/init.cc 2 files changed, 12 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/70/7770/3 -- To view, visit http://gerrit.cloudera.org:8080/7770 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I59194af94838f680df4ce121a8dee526a876e369 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Sailesh MukilGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2033 (part 2). Add test for Java client failover support.
Edward Fancher has posted comments on this change. Change subject: KUDU-2033 (part 2). Add test for Java client failover support. .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/7722/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestClientFailoverSupport.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestClientFailoverSupport.java: PS1, Line 58: blic void testMu > Move this closer to the point of the usage. Done PS1, Line 60: > Is it crucial to have that suffix? I would expect it's not, so consider dr Done PS1, Line 61: getBas > nit: as the builder is not needed in the code below, consider not creating Done Line 85: } > Consider dropping the table in the very end, making sure it can be successf Done -- To view, visit http://gerrit.cloudera.org:8080/7722 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I228e08429d952f0f5b2657e0b8481366c5c572a4 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Edward FancherGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Edward Fancher Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: Yes
[kudu-CR] KUDU-2033 (part 2). Add test for Java client failover support.
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7722 to look at the new patch set (#2). Change subject: KUDU-2033 (part 2). Add test for Java client failover support. .. KUDU-2033 (part 2). Add test for Java client failover support. New test which validates that that the java client correctly gets metadata updates after the Master leader is killed. Change-Id: I228e08429d952f0f5b2657e0b8481366c5c572a4 --- A java/kudu-client/src/test/java/org/apache/kudu/client/TestClientFailoverSupport.java 1 file changed, 90 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/22/7722/2 -- To view, visit http://gerrit.cloudera.org:8080/7722 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I228e08429d952f0f5b2657e0b8481366c5c572a4 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Edward FancherGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy
[kudu-CR](branch-1.3.x) KUDU-1942. Kerberos fails to log in on hostnames with capital letters
Jean-Daniel Cryans has submitted this change and it was merged. Change subject: KUDU-1942. Kerberos fails to log in on hostnames with capital letters .. KUDU-1942. Kerberos fails to log in on hostnames with capital letters This ensures that servers canonicalize their FQDNs to lower-case before generating Kerberos principal names. With this change I was able to set up a working cluster on my laptop with a capitalized hostname, where before it would fail as described in the JIRA. I also verified that I was able to connect from both C++ and Java clients. Change-Id: I5ef65dd827459476a2d225d8e3f7c80ff2fdf627 Reviewed-on: http://gerrit.cloudera.org:8080/7693 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin(cherry picked from commit ecfcb312161a4bc6f8da0ebb837169a2acf08eef) Reviewed-on: http://gerrit.cloudera.org:8080/7764 Reviewed-by: Jean-Daniel Cryans --- M src/kudu/security/init.cc 1 file changed, 3 insertions(+), 0 deletions(-) Approvals: Jean-Daniel Cryans: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7764 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I5ef65dd827459476a2d225d8e3f7c80ff2fdf627 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: branch-1.3.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins
[kudu-CR](branch-1.3.x) KUDU-1942. Kerberos fails to log in on hostnames with capital letters
Jean-Daniel Cryans has posted comments on this change. Change subject: KUDU-1942. Kerberos fails to log in on hostnames with capital letters .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7764 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5ef65dd827459476a2d225d8e3f7c80ff2fdf627 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-1.3.x Gerrit-Owner: Todd LipconGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR](branch-1.4.x) KUDU-1942. Kerberos fails to log in on hostnames with capital letters
Jean-Daniel Cryans has submitted this change and it was merged. Change subject: KUDU-1942. Kerberos fails to log in on hostnames with capital letters .. KUDU-1942. Kerberos fails to log in on hostnames with capital letters This ensures that servers canonicalize their FQDNs to lower-case before generating Kerberos principal names. With this change I was able to set up a working cluster on my laptop with a capitalized hostname, where before it would fail as described in the JIRA. I also verified that I was able to connect from both C++ and Java clients. Change-Id: I5ef65dd827459476a2d225d8e3f7c80ff2fdf627 Reviewed-on: http://gerrit.cloudera.org:8080/7693 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin(cherry picked from commit ecfcb312161a4bc6f8da0ebb837169a2acf08eef) Reviewed-on: http://gerrit.cloudera.org:8080/7763 Tested-by: Todd Lipcon Reviewed-by: Jean-Daniel Cryans --- M src/kudu/security/init.cc 1 file changed, 3 insertions(+), 0 deletions(-) Approvals: Jean-Daniel Cryans: Looks good to me, approved Todd Lipcon: Verified -- To view, visit http://gerrit.cloudera.org:8080/7763 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I5ef65dd827459476a2d225d8e3f7c80ff2fdf627 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: branch-1.4.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Todd Lipcon
[kudu-CR](branch-1.4.x) KUDU-1942. Kerberos fails to log in on hostnames with capital letters
Jean-Daniel Cryans has posted comments on this change. Change subject: KUDU-1942. Kerberos fails to log in on hostnames with capital letters .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7763 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5ef65dd827459476a2d225d8e3f7c80ff2fdf627 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-1.4.x Gerrit-Owner: Todd LipconGerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] Run the gradle build as a part of the gerrit tests
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7651 to look at the new patch set (#6). Change subject: Run the gradle build as a part of the gerrit tests .. Run the gradle build as a part of the gerrit tests Change-Id: I1430cbd05ff78a69f2439e3a8f90e1ddde83a8d7 --- M build-support/jenkins/build-and-test.sh 1 file changed, 21 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/7651/6 -- To view, visit http://gerrit.cloudera.org:8080/7651 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1430cbd05ff78a69f2439e3a8f90e1ddde83a8d7 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] [build-support] IWYU build configuration for Jenkins
Alexey Serbin has submitted this change and it was merged. Change subject: [build-support] IWYU build configuration for Jenkins .. [build-support] IWYU build configuration for Jenkins Added provisions to run the include-what-you-use tool against the updated files in the changelist. Those are based on the newly introduced 'iwyu' target to be used like already existing 'lint' target. The 'iwyu' target runs IWYU against the compilation database. So, as a part of this changelist I updated the top-level CMakeLists.txt to always generate the compilation database. The compilation database is a JSON file containing information on how the targets are built. Change-Id: I45f2e8529891dd7da18c9fa6383e0c41e6b4182e Reviewed-on: http://gerrit.cloudera.org:8080/7750 Reviewed-by: Adar DemboTested-by: Kudu Jenkins --- M .ycm_extra_conf.py M CMakeLists.txt A build-support/iwyu/iwyu.sh A build-support/iwyu/iwyu_tool.py M build-support/jenkins/build-and-test.sh 5 files changed, 361 insertions(+), 18 deletions(-) Approvals: Adar Dembo: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7750 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I45f2e8529891dd7da18c9fa6383e0c41e6b4182e Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] disk failure: add persistent disk states
Adar Dembo has posted comments on this change. Change subject: disk failure: add persistent disk states .. Patch Set 24: (13 comments) I didn't finish reviewing data_dirs.cc. Looked at the rest, though. http://gerrit.cloudera.org:8080/#/c/7270/24/src/kudu/fs/block_manager_util.cc File src/kudu/fs/block_manager_util.cc: PS24, Line 54: s_ How about _s and _s2? Or _s and _s_prepended? Flipping the underscore position is a super confusing way to create a second variable name. PS24, Line 106: const google::protobuf::RepeatedPtrField all_uuids_pb(all_uuids.begin(), all_uuids.end()); : new_path_set->mutable_deprecated_all_uuids()->Reserve(all_uuids.size()); : new_path_set->mutable_deprecated_all_uuids()->CopyFrom(all_uuids_pb); Wouldn't it be fewer LOC to add one UUID at a time to deprecated_all_uuids in the loop on L99? This isn't exactly a hot path. http://gerrit.cloudera.org:8080/#/c/7270/24/src/kudu/fs/data_dirs-test.cc File src/kudu/fs/data_dirs-test.cc: Line 356: KuduTest::SetUp(); Add a comment explaining this is here to bypass the DataDirsTest::SetUp impl. Or just deal with the results of the (unnecessary?) call. Anyone who looks at this will instinctively think this override does nothing and should be removed. http://gerrit.cloudera.org:8080/#/c/7270/24/src/kudu/fs/data_dirs.cc File src/kudu/fs/data_dirs.cc: PS24, Line 523: std::unordered_map using Line 653: dd->ExecClosure(Bind(, env_, dd->dir())); Didn't we already do this above? http://gerrit.cloudera.org:8080/#/c/7270/24/src/kudu/fs/data_dirs.h File src/kudu/fs/data_dirs.h: PS24, Line 236: static Status CreateNew(Env* env, std::vector data_fs_roots, : DataDirManagerOptions opts, : std::unique_ptr* dd_manager); : static Status OpenExisting(Env* env, std::vector data_fs_roots, : DataDirManagerOptions opts, : std::unique_ptr* dd_manager); Can we suffix these with "ForTests"? It's temporary until canonicalization moves into the DataDirManager, right? http://gerrit.cloudera.org:8080/#/c/7270/24/src/kudu/fs/fs_manager.cc File src/kudu/fs/fs_manager.cc: PS24, Line 207: // If the directory fails to canonicalize due to disk failure, prefix : // it with an unsanitized string. Update. Line 242: // The server cannot start if the WAL root or metadata root failed. Failed to what? Canonicalize, right? PS24, Line 355: some directories How about "at least one directory" http://gerrit.cloudera.org:8080/#/c/7270/24/src/kudu/fs/log_block_manager-test.cc File src/kudu/fs/log_block_manager-test.cc: Line 35: #include "kudu/util/env_util.h" Not used? PS24, Line 1301: directory right You mean 'right directory'? http://gerrit.cloudera.org:8080/#/c/7270/24/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 1571: vector uuid_indices(dd_manager_->data_dirs().size()); No longer used? Line 1932: if (!s.ok()) { Why these changes and the other macro changes below? -- To view, visit http://gerrit.cloudera.org:8080/7270 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifddf0817fe1a82044077f5544c400c88de20769f Gerrit-PatchSet: 24 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [build-support] IWYU build configuration for Jenkins
Alexey Serbin has posted comments on this change. Change subject: [build-support] IWYU build configuration for Jenkins .. Patch Set 9: > You removed it anyway? Okay. Yep, I didn't notice your earlier +2. Also, the section provides a link : http://clang.llvm.org/docs/JSONCompilationDatabase.html In the referenced doc, there is enough information about using CMake CMAKE_EXPORT_COMPILE_COMMANDS variable to enable generation of the compilation database. -- To view, visit http://gerrit.cloudera.org:8080/7750 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I45f2e8529891dd7da18c9fa6383e0c41e6b4182e Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] [build-support] IWYU build configuration for Jenkins
Adar Dembo has posted comments on this change. Change subject: [build-support] IWYU build configuration for Jenkins .. Patch Set 10: Code-Review+2 You removed it anyway? Okay. -- To view, visit http://gerrit.cloudera.org:8080/7750 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I45f2e8529891dd7da18c9fa6383e0c41e6b4182e Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] [build-support] IWYU build configuration for Jenkins
Hello Dan Burkert, Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7750 to look at the new patch set (#10). Change subject: [build-support] IWYU build configuration for Jenkins .. [build-support] IWYU build configuration for Jenkins Added provisions to run the include-what-you-use tool against the updated files in the changelist. Those are based on the newly introduced 'iwyu' target to be used like already existing 'lint' target. The 'iwyu' target runs IWYU against the compilation database. So, as a part of this changelist I updated the top-level CMakeLists.txt to always generate the compilation database. The compilation database is a JSON file containing information on how the targets are built. Change-Id: I45f2e8529891dd7da18c9fa6383e0c41e6b4182e --- M .ycm_extra_conf.py M CMakeLists.txt A build-support/iwyu/iwyu.sh A build-support/iwyu/iwyu_tool.py M build-support/jenkins/build-and-test.sh 5 files changed, 361 insertions(+), 18 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/50/7750/10 -- To view, visit http://gerrit.cloudera.org:8080/7750 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I45f2e8529891dd7da18c9fa6383e0c41e6b4182e Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon