[kudu-CR] Integrate the request tracker with the client
David Ribeiro Alves has posted comments on this change. Change subject: Integrate the request tracker with the client .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/3080/8//COMMIT_MSG Commit Message: Line 15: any specific tests. > it's not possible to use RetriableRpc within rpc_stub-test to test the clie I guess I can come up with a unit test for retriable rpc, if that helps, but it wouldn't test it with the meta cache (retriable rpc received a server picker that, on the client, is backed by the meta cache). -- To view, visit http://gerrit.cloudera.org:8080/3080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I94207c294452fcbdb3a7901fdb702674d47553ee Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Integrate the ResultTracker into the rpc subsystem and add a test
David Ribeiro Alves has posted comments on this change. Change subject: Integrate the ResultTracker into the rpc subsystem and add a test .. Patch Set 12: Verified+1 (12 comments) http://gerrit.cloudera.org:8080/#/c/3192/6//COMMIT_MSG Commit Message: Line 10: to specify a new method option when defining rpc service methods. > I think you missed this. hrm, you're right. I could swear I had fixed this. must have lost a patch along the way. Done http://gerrit.cloudera.org:8080/#/c/3192/11/src/kudu/common/common.proto File src/kudu/common/common.proto: Line 316 > I dont think this should be in 'common'. 'common' is for data-model type st Done http://gerrit.cloudera.org:8080/#/c/3192/11/src/kudu/rpc/CMakeLists.txt File src/kudu/rpc/CMakeLists.txt: Line 89: rpc_header_proto > oh, now I see why it builds. This dependency seems misplaced. (eg keep in m Done http://gerrit.cloudera.org:8080/#/c/3192/11/src/kudu/rpc/protoc-gen-krpc.cc File src/kudu/rpc/protoc-gen-krpc.cc: Line 46: #include "kudu/rpc/rpc_header.pb.h" > nit: sorting Done Line 467: "mi->result_tracker.reset($result_tracker$);\n" > we have a separate ResultTracker per RPC type? that doesn't seem right. we do, why doesn't it seem right? easier to track where memory goes , maps will be smaller so faster. locks will be less contended... why wouldn't we want to have one per rpc? http://gerrit.cloudera.org:8080/#/c/3192/11/src/kudu/rpc/rpc-stress-test.cc File src/kudu/rpc/rpc-stress-test.cc: PS11, Line 73: random amount > update Done Line 134: vector> adders; > use unique_ptr? Done http://gerrit.cloudera.org:8080/#/c/3192/11/src/kudu/rpc/rpc-test-base.h File src/kudu/rpc/rpc-test-base.h: Line 288: std::atomic_int exactly_once_test_val_; > hrm, is this atomic? haven't seen this yeah it's typedefd, want me to change it? http://gerrit.cloudera.org:8080/#/c/3192/11/src/kudu/rpc/rpc_context.cc File src/kudu/rpc/rpc_context.cc: PS11, Line 66: call_->RespondSuccess(*response_pb_); : delete this; : > per the comment on the other review, it seems weird we now have this code s well yeah, for the case when the result aren't being tracked. ideally we'd like to at least coalesce the log statements at least, was kinda bugging me but didn't spend too much time on it. Any ideas? http://gerrit.cloudera.org:8080/#/c/3192/11/src/kudu/rpc/rtest.proto File src/kudu/rpc/rtest.proto: Line 129: } > add something to design-docs/rpc.md about this feature? Done http://gerrit.cloudera.org:8080/#/c/3192/6/src/kudu/rpc/service_if.cc File src/kudu/rpc/service_if.cc: PS6, Line 96: req.release(), :resp, : call->header().has_request_id() ? method_info->result_tracker : : nullptr); : > Missed this? Done Line 116: } else { > Log the state too. Done -- To view, visit http://gerrit.cloudera.org:8080/3192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1d624810350feceefe244e0319e22cac241bf0d6 Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Add a ResultTracker class that will track server side results
David Ribeiro Alves has posted comments on this change. Change subject: Add a ResultTracker class that will track server side results .. Patch Set 13: Verified+1 unrelated flake ReplicatedAlterTableTest.TestReplicatedAlter -- To view, visit http://gerrit.cloudera.org:8080/3190 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6718951a9998a6c9b0db35e8f09ff8304591e8b1 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] Add a RpcContext::RespondFailure() method
Kudu Jenkins has posted comments on this change. Change subject: Add a RpcContext::RespondFailure() method .. Patch Set 12: Build Started http://104.196.14.100/job/kudu-gerrit/1962/ -- To view, visit http://gerrit.cloudera.org:8080/3191 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9f1387ba0f837046a8056e77f73a3982f06c73a2 Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] Integrate the ResultTracker into the rpc subsystem and add a test
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3192 to look at the new patch set (#12). Change subject: Integrate the ResultTracker into the rpc subsystem and add a test .. Integrate the ResultTracker into the rpc subsystem and add a test This integrates the ResultTracker into the rpc subsystem by allowing to specify a new method option when defining rpc service methods. When this method option is specificed _and_ the call's rpc header has a RequestIdPB the results for the call will be tracked and the call may have exactly once semantics. This allows to have the simplest form of exactly once semantics for single server calls. Of course limitations apply, like response persistence across restarts is not supported, neither is propagating/rebuilding responses to/on replicas for replicated calls. If support for this is needed it will have to be done ad-hoc, case by case. Change-Id: I1d624810350feceefe244e0319e22cac241bf0d6 --- M docs/design-docs/rpc.md M src/kudu/rpc/CMakeLists.txt M src/kudu/rpc/inbound_call.cc M src/kudu/rpc/protoc-gen-krpc.cc A src/kudu/rpc/rpc-stress-test.cc M src/kudu/rpc/rpc-test-base.h M src/kudu/rpc/rpc_context.cc M src/kudu/rpc/rpc_context.h M src/kudu/rpc/rpc_header.proto M src/kudu/rpc/rtest.proto M src/kudu/rpc/service_if.cc M src/kudu/rpc/service_if.h 12 files changed, 356 insertions(+), 44 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/92/3192/12 -- To view, visit http://gerrit.cloudera.org:8080/3192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1d624810350feceefe244e0319e22cac241bf0d6 Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Integrate the ResultTracker into the rpc subsystem and add a test
Kudu Jenkins has posted comments on this change. Change subject: Integrate the ResultTracker into the rpc subsystem and add a test .. Patch Set 12: Build Started http://104.196.14.100/job/kudu-gerrit/1961/ -- To view, visit http://gerrit.cloudera.org:8080/3192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1d624810350feceefe244e0319e22cac241bf0d6 Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] Allow for reserving disk space for non-Kudu processes
Kudu Jenkins has posted comments on this change. Change subject: Allow for reserving disk space for non-Kudu processes .. Patch Set 7: Build Started http://104.196.14.100/job/kudu-gerrit/1960/ -- To view, visit http://gerrit.cloudera.org:8080/3135 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifd0451d4dbddc1783019a53302de0263080939c7 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] Allow for reserving disk space for non-Kudu processes
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3135 to look at the new patch set (#7). Change subject: Allow for reserving disk space for non-Kudu processes .. Allow for reserving disk space for non-Kudu processes Adds gflags to reserve disk space such that Kudu will not use more than specified. Hadoop calls this functionality "du.reserved". If a WAL preallocation is attempted while the log disk is past its reservation limit, the log write will fail. As a result, RaftConsensus will cause the process to crash. The log block manager will use non-full disks if possible until all of the disks are full. If a flush or compaction is attempted when all disks are beyond their configured capacity then the LogBlockManager will return an error. As a result, the maintenance manager task will cause the process to crash. This initial implementation provides a "best effort" approach. Disk space checks are only done at preallocation time, and if writes continue beyond the preallocated point (for both a WAL segment and a data block) those writes will not be prevented. This makes it easier to provide a "friendly" option where the block manager will divert new writes to non-full disks, avoiding a hard crash when only one disk is past its reservation limit. In the future, we may want to add "hard" and "soft" limits, such that going beyond the soft limit will do what we do today, and going beyond the hard limit (say, by writing a very large data block past its preallocation point) will result in a crash. This patch includes: * Unit tests. * End-to-end test for flushing / compaction falling back to non-full disks due to disk space backpressure and finally crashing when there is no space left in any data dir. * End-to-end test for writes failing due to WAL disk space backpressure, causing a crash. Change-Id: Ifd0451d4dbddc1783019a53302de0263080939c7 --- M src/kudu/consensus/log-test.cc M src/kudu/consensus/log.cc M src/kudu/consensus/log_util.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/disk_reservation-itest.cc M src/kudu/tablet/tablet_peer_mm_ops.cc M src/kudu/util/CMakeLists.txt A src/kudu/util/env_util-test.cc M src/kudu/util/env_util.cc M src/kudu/util/env_util.h M src/kudu/util/scoped_cleanup.h 14 files changed, 610 insertions(+), 76 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/35/3135/7 -- To view, visit http://gerrit.cloudera.org:8080/3135 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifd0451d4dbddc1783019a53302de0263080939c7 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR](branch-0.9.x) KUDU-1473: fix some tablet lock usage in CatalogManager
Mike Percy has posted comments on this change. Change subject: KUDU-1473: fix some tablet lock usage in CatalogManager .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/3460 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: No
[kudu-CR] Add a ResultTracker class that will track server side results
David Ribeiro Alves has posted comments on this change. Change subject: Add a ResultTracker class that will track server side results .. Patch Set 12: (1 comment) http://gerrit.cloudera.org:8080/#/c/3190/12/src/kudu/rpc/result_tracker.cc File src/kudu/rpc/result_tracker.cc: Line 221: } > lots of dup code in the above methods. any refactor doable? (even one that Done -- To view, visit http://gerrit.cloudera.org:8080/3190 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6718951a9998a6c9b0db35e8f09ff8304591e8b1 Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Add a ResultTracker class that will track server side results
David Ribeiro Alves has posted comments on this change. Change subject: Add a ResultTracker class that will track server side results .. Patch Set 13: (24 comments) http://gerrit.cloudera.org:8080/#/c/3190/12/src/kudu/rpc/result_tracker.cc File src/kudu/rpc/result_tracker.cc: Line 53: CHECK(clients_.emplace(request_id.client_id(), move(new_client_state)).second); > PREDICT_TRUE would be nice documentation value here that this is the hot pa Done Line 71: completion_record->ongoing_rpcs.push_back({response, DCHECK_NOTNULL(context)}); > this block might be somewhat expensive to do inside the lock... I guess it' true on both counts. I'd like to, in time and when we have more tests for this, have per state locking. Line 72: } > reaching into the guts of RpcContext seems a little odd here. yeah, the reason for that is that context will wrap the result tracker, so if we call the external api methods of the context here we'd have a cycle. PS12, Line 146: > nit: missing space Done Line 176: << " was not marked as the driver for the RPC."; > dumping the response stringified here would be useful for debugging such a Done http://gerrit.cloudera.org:8080/#/c/3190/12/src/kudu/rpc/result_tracker.h File src/kudu/rpc/result_tracker.h: Line 41: // If the client wishes to track the result of a given rpc it must send on the rpc header > is this class entirely internal to the RPC system or would it be used by se this is internal to the RPC system for non-replicated rpc's, but needs to be used ad-hoc for replicated ones (writes, etc...) PS12, Line 90: > I think this one-line summary is a bit inaccurate, since it actually will _ I can change this to something like TrackRpc(), would that make sense? I don't want to call it "Start" since it might not be the start of that rpc. I'll add some info to the comment below to make it clear that for anything but NEW, the passed context and response will be taken over Line 98: // collection watermark and we do not recall the original response. > What if you broke out the guts of this method, so this just returned the Rp then the door is open to TOCTOU situations. See a follow up patch for the interaction of this with the transaction driver. PS12, Line 107: > nit: of Done Line 108: // The caller still owns the passed 'response' and 'context'. > clarify: on the same server, right? Done PS12, Line 112: ontext'. > why does RPC2 call FailAndRespond here? not following RPC2 (which here means an RPC from another replica) calls FailAndRespond() to "cancel" the client's RPC and establish itself as the current driver of this request. See comment on transaction_driver.h on the follow up patch that integrates this with writes (this is only used for replicated rpcs) Line 114: google::protobuf::Message* response, > still confused: I thought we just called FailAndRespond, why do we now call RPC2 calls FailAndRespond() to cancel RPC1 and establish itself as the driver. Line 124: // from another replica and should take precedence: > I remember discussing a scenario like this before I went on vacation, but I I've added some more info that hopefully makes this clearer. This is this is super specific to replicated ops and has no relevance for single server ops. I've added a very detailed comment to transaction_driver.h on a follow up patch that hopefully clears up why and when this is needed. PS12, Line 128: d(), cance > typo Done PS12, Line 128: ef > nit: no comma Done PS12, Line 129: is still alive. > retries of the same RPC? Done Line 133: // 6 - RPC2 - Also completes. The RPC has now been executed twice. > I wonder whether this could be "hooked" in the RpcContext in such a way tha that's exactly what https://gerrit.cloudera.org/#/c/3192/11 does. for non-replicated rpcs server code never touches the result tracker. PS12, Line 167: > this 'driver' term shows up in this class a bit but seems new. is it in the this is kind of new and was needed to address that racy situation that we talked about a while ago. Added some more info in the header to make this clear. PS12, Line 167: > where does this identifier come from? not sure what it corresponds to. Done Line 176: struct OnGoingRpcInfo { > can you make it a map of unique_ptr<> now that we have c++11? then you don' Done PS12, Line 181: > Remove is probably better than Delete since it actually gets returned Done PS12, Line 185: received RpcState:: > naming is inconsistent with the pattern above Done PS12, Line 193: from; > unique_ptr? Done Line 194: std::map> completion_records; > DISALLOW_COPY_AND_ASSIGN Done -- To view, visit http://gerrit.cloudera.org:8080/3190 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6718951a9998a6c9b0db35e8f09ff8304591e8b1 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-
[kudu-CR] Add a ResultTracker class that will track server side results
Kudu Jenkins has posted comments on this change. Change subject: Add a ResultTracker class that will track server side results .. Patch Set 13: Build Started http://104.196.14.100/job/kudu-gerrit/1959/ -- To view, visit http://gerrit.cloudera.org:8080/3190 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6718951a9998a6c9b0db35e8f09ff8304591e8b1 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] Add a ResultTracker class that will track server side results
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3190 to look at the new patch set (#13). Change subject: Add a ResultTracker class that will track server side results .. Add a ResultTracker class that will track server side results This adds the initial version of the ResultTracker class that will be responsible for tracking responses when we want exactly once call semantics. While this is minimally working, i.e it's able to track responses and can be used for exactly once semantics, it's not complete. Future patches will address the missing functionality. Still missing are: - Time based client state cleaning. - Watermark based per client state cleaning. I initially had a unit test for this class, but it relied on templating out the RpcContext (since it's not easy to build one for a unit test and it's not clear what it would do) which became problematic as the class grew. So I decided to add a new rpc-stress-tess that will test this class within the rpc framework. I feel it's easier to do it that way and that, since the client is not being used, the test can still be very straight forward. This test is in a followup patch as it can only run after the integration with the rest of the rpc subsystem. Change-Id: I6718951a9998a6c9b0db35e8f09ff8304591e8b1 --- M src/kudu/rpc/CMakeLists.txt A src/kudu/rpc/result_tracker.cc A src/kudu/rpc/result_tracker.h M src/kudu/rpc/rpc_context.h 4 files changed, 475 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/90/3190/13 -- To view, visit http://gerrit.cloudera.org:8080/3190 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6718951a9998a6c9b0db35e8f09ff8304591e8b1 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR](gh-pages) Control how content is cached on the Kudu web site
Todd Lipcon has submitted this change and it was merged. Change subject: Control how content is cached on the Kudu web site .. Control how content is cached on the Kudu web site Change-Id: I08619464d8d458582282044a7db685f8fbbca483 Reviewed-on: http://gerrit.cloudera.org:8080/3462 Reviewed-by: Todd Lipcon Tested-by: Todd Lipcon --- A .htaccess 1 file changed, 18 insertions(+), 0 deletions(-) Approvals: Todd Lipcon: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/3462 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I08619464d8d458582282044a7db685f8fbbca483 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-Owner: Mike Percy Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR](gh-pages) Control how content is cached on the Kudu web site
Todd Lipcon has posted comments on this change. Change subject: Control how content is cached on the Kudu web site .. Patch Set 1: Code-Review+2 Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/3462 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08619464d8d458582282044a7db685f8fbbca483 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-Owner: Mike Percy Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] Allow for reserving disk space for non-Kudu processes
Kudu Jenkins has posted comments on this change. Change subject: Allow for reserving disk space for non-Kudu processes .. Patch Set 6: Build Started http://104.196.14.100/job/kudu-gerrit/1958/ -- To view, visit http://gerrit.cloudera.org:8080/3135 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ifd0451d4dbddc1783019a53302de0263080939c7 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] Allow for reserving disk space for non-Kudu processes
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3135 to look at the new patch set (#6). Change subject: Allow for reserving disk space for non-Kudu processes .. Allow for reserving disk space for non-Kudu processes Adds gflags to reserve disk space such that Kudu will not use more than specified. Hadoop calls this functionality "du.reserved". If a WAL preallocation is attempted while the log disk is past its reservation limit, the log write will fail. As a result, RaftConsensus will cause the process to crash. The log block manager will use non-full disks if possible until all of the disks are full. If a flush or compaction is attempted when all disks are beyond their configured capacity then the LogBlockManager will return an error. As a result, the maintenance manager task will cause the process to crash. This initial implementation provides a "best effort" approach. Disk space checks are only done at preallocation time, and if writes continue beyond the preallocated point (for both a WAL segment and a data block) those writes will not be prevented. This makes it easier to provide a "friendly" option where the block manager will divert new writes to non-full disks, avoiding a hard crash when only one disk is past its reservation limit. In the future, we may want to add "hard" and "soft" limits, such that going beyond the soft limit will do what we do today, and going beyond the hard limit (say, by writing a very large data block past its preallocation point) will result in a crash. This patch includes: * Unit tests. * End-to-end test for flushing / compaction falling back to non-full disks due to disk space backpressure and finally crashing when there is no space left in any data dir. * End-to-end test for writes failing due to WAL disk space backpressure, causing a crash. Change-Id: Ifd0451d4dbddc1783019a53302de0263080939c7 --- M src/kudu/consensus/log-test.cc M src/kudu/consensus/log.cc M src/kudu/consensus/log_util.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/disk_reservation-itest.cc M src/kudu/tablet/tablet_peer_mm_ops.cc M src/kudu/util/CMakeLists.txt A src/kudu/util/env_util-test.cc M src/kudu/util/env_util.cc M src/kudu/util/env_util.h M src/kudu/util/scoped_cleanup.h 14 files changed, 608 insertions(+), 73 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/35/3135/6 -- To view, visit http://gerrit.cloudera.org:8080/3135 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifd0451d4dbddc1783019a53302de0263080939c7 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Allow for reserving disk space for non-Kudu processes
Mike Percy has posted comments on this change. Change subject: Allow for reserving disk space for non-Kudu processes .. Patch Set 5: (24 comments) http://gerrit.cloudera.org:8080/#/c/3135/5/src/kudu/consensus/log.cc File src/kudu/consensus/log.cc: Line 111: DEFINE_int64(log_dir_reserved_bytes, 0, > Nit: I think this would be more clear as fs_wal_dir_reserved_bytes, as it w Oh, that's the one I had intended to mirror. Done. Line 113: TAG_FLAG(log_dir_reserved_bytes, runtime); > Hmm, I'm not sure if we should allow changing these values at runtime. If s That's ok. They are already "soft" in terms of enforcement. In any case, this one will cause a crash. I am using runtime modification in the integration tests. http://gerrit.cloudera.org:8080/#/c/3135/5/src/kudu/fs/block_manager-test.cc File src/kudu/fs/block_manager-test.cc: PS5, Line 951: We use a COARSE clock so we need to let a : // little bit of time pass so we get at least one unit of time greater than : // before when we call MonoTime::Now() again. > Let's switch to FINE clock so we won't have this limitation. I was hoping using COARSE would help keep the code paths fast but I agree it's probably overkill. http://gerrit.cloudera.org:8080/#/c/3135/5/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 64: DEFINE_int32(full_disk_cache_seconds, 30, > Nit: please prefix with log_block_manager. It's super long, yes, but it's t Done Line 70: DEFINE_int64(data_dirs_reserved_bytes, 0, > Please rename to fs_data_dirs_reserved_bytes (see my comment on log_dir_res Done Line 96: "Number of unavailable log block containers", > Nit: Number of Unavailable Block Containers (to be consistent with the exis Done Line 98: "Number of non-full Block Containers that are under root paths that " > Nit: "Number of non-full log block containers that are under root paths who Done Line 265: RWFile* data_file() const { return data_file_.get(); } > We're only using this for the filename. Could you make narrow accessor that Done Line 267: int64_t preallocated_offset() const { return preallocated_offset_; } > Where is this used? Looks unused. Removed Line 273: Env* env() const { return block_manager_->env(); } > Is this ever used? Nope, also removed Line 312: int64_t preallocated_offset_ = 0; > Huh, didn't know C++11 allowed this. Could you make the same change for tot Done Line 1230: // Indexes of root paths that are below their reserved space threshold. > Can we defer this work to the case where GetAvailableContainer() has return I had to change the implementation because of a race I discovered in the cache expiration logic, so if we prepopulate this cache then it has to happen here. The race caused an infinite loop when the cache expiration was set to 0. Line 1250: return Status::IOError("Unable to allocate block: All data directories are full. " > We should add ENOSPC to this status, right? Done Line 1292: FLAGS_log_container_preallocate_bytes, > Preallocate() may not actually preallocate FLAGS_log_container_preallocate_ Yes, preallocation sounds a little bit broken. However, what you're describing further on is a "hard" limit, as opposed to a "soft" limit, which is what I've implemented. A hard limit is much harder (ha ha) to implement because the failure cases are broad. We have already allocated a block. Now we're applying backpressure on writes. Doing a reservation at block allocation time, which is what this patch does, is much easier because block manager API clients don't even know that it's happening. In short, if we want to do a hard limit, it's a whole 'nother thing, and could reasonably be implemented with a whole 'nother set of gflags. Also, if you hit a hard limit, the most reasonable thing to do is crash. That said, unless I missed your point (it's certainly possible), I don't think it's particularly bad to overshoot on verifying sufficient space here with FLAGS_log_container_preallocate_bytes since even if we intended to preallocate less space than that, we're close to the limit and it's reasonable to stop writing to this disk. Line 1413: MonoTime now = MonoTime::Now(MonoTime::COARSE); > FINE here too. And this can be done outside the lock. Done Line 1432: } // Release lock_. > Nit: FWIW, I think this is implied by the scope (i.e. don't really see the agree Line 1758: if (expires->ComesBefore(MonoTime::Now(MonoTime::COARSE))) return false; // Expired. > Use FINE here and below; that's what we default to when introducing timeout Done http://gerrit.cloudera.org:8080/#/c/3135/5/src/kudu/fs/log_block_manager.h File src/kudu/fs/log_block_manager.h: Line 156: // Returns true if the given 'root_path' has been marked full and 'now' is > Nit: 'now' (with single qu
[kudu-CR] catalog manager: prevent spurious dirty callbacks from crashing the process
Kudu Jenkins has posted comments on this change. Change subject: catalog_manager: prevent spurious dirty callbacks from crashing the process .. Patch Set 2: Build Started http://104.196.14.100/job/kudu-gerrit/1957/ -- To view, visit http://gerrit.cloudera.org:8080/3465 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5bc602a51a4914ec13f926bfca555c816a2ee509 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] catalog manager: prevent spurious dirty callbacks from crashing the process
Adar Dembo has uploaded a new patch set (#2). Change subject: catalog_manager: prevent spurious dirty callbacks from crashing the process .. catalog_manager: prevent spurious dirty callbacks from crashing the process The recent change to switch from LocalConsensus to RaftConsensus has led to an increased number of dirty callback calls when using just one master. Some of these calls occur with no term change; the catalog manager treats them as any other calls and reloads the on-disk metadata. In theory that's just unnecessary work, but CheckIsLeaderAndReady() doesn't provide adequate protection for the rest of the master when in this state, so nearly every RPC is allowed in during this time. That's an absolute disaster for correctness: imagine a GetTableLocations() returning only a subset of a table's tablets because the rest were still being loaded from disk. Luckily for us it can also manifest as a crash [1] so we noticed it quickly. I chose to fix it by ignoring these calls within CatalogManager::VisitTablesAndTabletsTask and not SysCatalogTable::SysCatalogStateChanged because the synchronization in the former is more straight forward thanks to the size of worker_pool_. The new test led to a crash 100% of the time without this fix. There's an argument to be made for changing TableInfo::TabletMap's raw pointers to shared pointers thus avoiding this crash altogether, but it's still an incorrect state to be in, so I don't see the value in doing that. While I was here, I snuck a few other changes in: - Remove a lock acquisition from ElectedAsLeaderCb; it did nothing. - Remove old_role_ from SysCatalogTable; it also did nothing. - Narrow the lock acquisition in VisitTablesAndTabletsTask; it only needs to protect the visiting logic. 1. Sample crash output: F0618 05:47:41.795367 9330 ref_counted.cc:74] Check failed: !in_dtor_ *** Check failure stack trace: *** @ 0x7f99fab05f7d google::LogMessage::Fail() at ??:0 @ 0x7f99fab07e7d google::LogMessage::SendToLog() at ??:0 @ 0x7f99fab05ab9 google::LogMessage::Flush() at ??:0 @ 0x7f99fab0891f google::LogMessageFatal::~LogMessageFatal() at ??:0 @ 0x7f99fae8a637 kudu::subtle::RefCountedThreadSafeBase::AddRef() at ??:0 @ 0x7f9a07c486d8 make_scoped_refptr<>() at ??:0 @ 0x7f9a07c2f7f7 kudu::master::TableInfo::GetTabletsInRange() at ??:0 @ 0x7f9a07c2ec35 kudu::master::CatalogManager::GetTableLocations() at ??:0 @ 0x51962e kudu::master::WaitForRunningTabletCount() at ... @ 0x51ce59 kudu::CreateTableStressTest_RestartMasterDuringCreation_Test::TestBody() at ... @ 0x7f99fc7f5b48 testing::internal::HandleExceptionsInMethodIfSupported<>() at ??:0 @ 0x7f99fc7ea012 testing::Test::Run() at ??:0 @ 0x7f99fc7ea158 testing::TestInfo::Run() at ??:0 @ 0x7f99fc7ea235 testing::TestCase::Run() at ??:0 @ 0x7f99fc7ea518 testing::internal::UnitTestImpl::RunAllTests() at ??:0 @ 0x7f99fc7f6058 testing::internal::HandleExceptionsInMethodIfSupported<>() at ??:0 @ 0x7f99fc7ea7fd testing::UnitTest::Run() at ??:0 @ 0x7f9a08520cf6 main at ??:0 @ 0x7f99f97a9d5d __libc_start_main at ??:0 @ 0x43e06d (unknown) at ??:0 Change-Id: I5bc602a51a4914ec13f926bfca555c816a2ee509 --- M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h M src/kudu/master/master-test.cc M src/kudu/master/sys_catalog.cc M src/kudu/master/sys_catalog.h 5 files changed, 71 insertions(+), 38 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/65/3465/2 -- To view, visit http://gerrit.cloudera.org:8080/3465 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5bc602a51a4914ec13f926bfca555c816a2ee509 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] catalog manager: prevent spurious dirty callbacks from crashing the process
Kudu Jenkins has posted comments on this change. Change subject: catalog_manager: prevent spurious dirty callbacks from crashing the process .. Patch Set 1: Build Started http://104.196.14.100/job/kudu-gerrit/1956/ -- To view, visit http://gerrit.cloudera.org:8080/3465 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I5bc602a51a4914ec13f926bfca555c816a2ee509 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] catalog manager: prevent spurious dirty callbacks from crashing the process
Hello Mike Percy, Todd Lipcon, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/3465 to review the following change. Change subject: catalog_manager: prevent spurious dirty callbacks from crashing the process .. catalog_manager: prevent spurious dirty callbacks from crashing the process The recent change to switch from LocalConsensus to RaftConsensus has led to an increased number of dirty callback calls when using just one master. Some of these calls occur with no term change; the catalog manager treats them as any other calls and reloads the on-disk metadata. In theory that's just unnecessary work, but CheckIsLeaderAndReady() doesn't provide adequate protection for the rest of the master when in this state, so nearly every RPC is allowed in during this time. That's an absolute disaster for correctness: imagine a GetTableLocations() returning only a subset of a table's tablets because the rest were still being loaded from disk. Luckily for us it can also manifest as a crash [1] so we noticed it quickly. I chose to fix it by ignoring these calls within CatalogManager::VisitTablesAndTabletsTask and not SysCatalogTable::SysCatalogStateChanged because the synchronization in the former is more straight forward thanks to the size of worker_pool_. The new test led to a crash 100% of the time without this fix. There's an argument to be made for changing TableInfo::TabletMap's raw pointers to shared pointers thus avoiding this crash altogether, but it's still an incorrect state to be in, so I don't see the value in doing that. While I was here, I snuck a few other changes in: - Remove a lock acquisition from ElectedAsLeaderCb; it did nothing. - Remove old_role_ from SysCatalogTable; it also did nothing. - Narrow the lock acquisition in VisitTablesAndTabletsTask; it only needs to protect the visiting logic. 1. Sample crash output: F0618 05:47:41.795367 9330 ref_counted.cc:74] Check failed: !in_dtor_ *** Check failure stack trace: *** @ 0x7f99fab05f7d google::LogMessage::Fail() at ??:0 @ 0x7f99fab07e7d google::LogMessage::SendToLog() at ??:0 @ 0x7f99fab05ab9 google::LogMessage::Flush() at ??:0 @ 0x7f99fab0891f google::LogMessageFatal::~LogMessageFatal() at ??:0 @ 0x7f99fae8a637 kudu::subtle::RefCountedThreadSafeBase::AddRef() at ??:0 @ 0x7f9a07c486d8 make_scoped_refptr<>() at ??:0 @ 0x7f9a07c2f7f7 kudu::master::TableInfo::GetTabletsInRange() at ??:0 @ 0x7f9a07c2ec35 kudu::master::CatalogManager::GetTableLocations() at ??:0 @ 0x51962e kudu::master::WaitForRunningTabletCount() at /data1/jenkins-workspace/kudu@ 0x51ce59 kudu::CreateTableStressTest_RestartMasterDuringCreation_Test::TestBody() a@ 0x7f99fc7f5b48 testing::internal::HandleExceptionsInMethodIfSupported<>() at ??:0 @ 0x7f99fc7ea012 testing::Test::Run() at ??:0 @ 0x7f99fc7ea158 testing::TestInfo::Run() at ??:0 @ 0x7f99fc7ea235 testing::TestCase::Run() at ??:0 @ 0x7f99fc7ea518 testing::internal::UnitTestImpl::RunAllTests() at ??:0 @ 0x7f99fc7f6058 testing::internal::HandleExceptionsInMethodIfSupported<>() at ??:0 @ 0x7f99fc7ea7fd testing::UnitTest::Run() at ??:0 @ 0x7f9a08520cf6 main at ??:0 @ 0x7f99f97a9d5d __libc_start_main at ??:0 @ 0x43e06d (unknown) at ??:0 Change-Id: I5bc602a51a4914ec13f926bfca555c816a2ee509 --- M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h M src/kudu/master/master-test.cc M src/kudu/master/sys_catalog.cc M src/kudu/master/sys_catalog.h 5 files changed, 71 insertions(+), 38 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/65/3465/1 -- To view, visit http://gerrit.cloudera.org:8080/3465 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I5bc602a51a4914ec13f926bfca555c816a2ee509 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR](branch-0.9.x) Release notes for 0.9.1
Jean-Daniel Cryans has posted comments on this change. Change subject: Release notes for 0.9.1 .. Patch Set 1: > Does this belong on the branch or on master? wasn't sure Doh saw this after writing my review. It should be both I think. -- To view, visit http://gerrit.cloudera.org:8080/3463 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I797980d105e0e1000420aedf58e094f9505053c6 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR](branch-0.9.x) Release notes for 0.9.1
Todd Lipcon has posted comments on this change. Change subject: Release notes for 0.9.1 .. Patch Set 1: Does this go on Master or the branch? wasn't sure. (both?) -- To view, visit http://gerrit.cloudera.org:8080/3463 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I797980d105e0e1000420aedf58e094f9505053c6 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup
Todd Lipcon has submitted this change and it was merged. Change subject: KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup .. KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup This fixes a bug seen in a recent YCSB stress test that I ran in which I was accidentally writing tens of thousands of duplicate keys per second. After a tablet server restarted, it failed to come up due to a pending commit which referred to no mutated stores (e.g. because all of the operations were duplicate key inserts). This patch tweaks the logic for this safety check: a commit with no mutated stores trivially has "no active stores". However, that's not the same as having "only inactive stores" -- the subtlety is in the difference in behavior when a commit has no stores at all. The patch adds a new targeted test in tablet_bootstrap-test as well as a more end-to-end test in ts_recovery-itest. Both reproduced the bug reliably before this patch. Change-Id: I8ecf8d780de1aa89fae4e0510d8291eb1f1cee11 Reviewed-on: http://gerrit.cloudera.org:8080/3321 Tested-by: Kudu Jenkins Reviewed-by: David Ribeiro Alves --- M src/kudu/consensus/log-test-base.h M src/kudu/integration-tests/raft_consensus-itest.cc M src/kudu/integration-tests/test_workload.cc M src/kudu/integration-tests/test_workload.h M src/kudu/integration-tests/ts_recovery-itest.cc M src/kudu/tablet/tablet_bootstrap-test.cc M src/kudu/tablet/tablet_bootstrap.cc 7 files changed, 148 insertions(+), 23 deletions(-) Approvals: David Ribeiro Alves: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/3321 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I8ecf8d780de1aa89fae4e0510d8291eb1f1cee11 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR](branch-0.9.x) KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup .. Patch Set 1: Build Started http://104.196.14.100/job/kudu-gerrit/1955/ -- To view, visit http://gerrit.cloudera.org:8080/3464 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8ecf8d780de1aa89fae4e0510d8291eb1f1cee11 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR](branch-0.9.x) Release notes for 0.9.1
Jean-Daniel Cryans has posted comments on this change. Change subject: Release notes for 0.9.1 .. Patch Set 1: (1 comment) Are you going to post a patch for master too? http://gerrit.cloudera.org:8080/#/c/3463/1/docs/release_notes.adoc File docs/release_notes.adoc: PS1, Line 73: Consensus nit: I thought we were writing either "our Raft consensus implementation" or just "RaftConsensus". -- To view, visit http://gerrit.cloudera.org:8080/3463 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I797980d105e0e1000420aedf58e094f9505053c6 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR](branch-0.9.x) KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup
Todd Lipcon has uploaded a new change for review. http://gerrit.cloudera.org:8080/3464 Change subject: KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup .. KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup This fixes a bug seen in a recent YCSB stress test that I ran in which I was accidentally writing tens of thousands of duplicate keys per second. After a tablet server restarted, it failed to come up due to a pending commit which referred to no mutated stores (e.g. because all of the operations were duplicate key inserts). This patch tweaks the logic for this safety check: a commit with no mutated stores trivially has "no active stores". However, that's not the same as having "only inactive stores" -- the subtlety is in the difference in behavior when a commit has no stores at all. The patch adds a new targeted test in tablet_bootstrap-test as well as a more end-to-end test in ts_recovery-itest. Both reproduced the bug reliably before this patch. Change-Id: I8ecf8d780de1aa89fae4e0510d8291eb1f1cee11 Reviewed-on: http://gerrit.cloudera.org:8080/3321 Tested-by: Kudu Jenkins Reviewed-by: David Ribeiro Alves (cherry picked from commit 6894438a406a635dc8a8f3bd77862294163cc7fb) --- M src/kudu/consensus/log-test-base.h M src/kudu/integration-tests/raft_consensus-itest.cc M src/kudu/integration-tests/test_workload.cc M src/kudu/integration-tests/test_workload.h M src/kudu/integration-tests/ts_recovery-itest.cc M src/kudu/tablet/tablet_bootstrap-test.cc M src/kudu/tablet/tablet_bootstrap.cc 7 files changed, 148 insertions(+), 23 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/64/3464/1 -- To view, visit http://gerrit.cloudera.org:8080/3464 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I8ecf8d780de1aa89fae4e0510d8291eb1f1cee11 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Todd Lipcon
[kudu-CR](branch-0.9.x) Release notes for 0.9.1
Todd Lipcon has posted comments on this change. Change subject: Release notes for 0.9.1 .. Patch Set 1: Does this belong on the branch or on master? wasn't sure -- To view, visit http://gerrit.cloudera.org:8080/3463 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I797980d105e0e1000420aedf58e094f9505053c6 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR](branch-0.9.x) Release notes for 0.9.1
Kudu Jenkins has posted comments on this change. Change subject: Release notes for 0.9.1 .. Patch Set 1: Build Started http://104.196.14.100/job/kudu-gerrit/1954/ -- To view, visit http://gerrit.cloudera.org:8080/3463 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I797980d105e0e1000420aedf58e094f9505053c6 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR](branch-0.9.x) Release notes for 0.9.1
Hello Jean-Daniel Cryans, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/3463 to review the following change. Change subject: Release notes for 0.9.1 .. Release notes for 0.9.1 Change-Id: I797980d105e0e1000420aedf58e094f9505053c6 --- M docs/installation.adoc M docs/release_notes.adoc 2 files changed, 34 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/3463/1 -- To view, visit http://gerrit.cloudera.org:8080/3463 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I797980d105e0e1000420aedf58e094f9505053c6 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Jean-Daniel Cryans
[kudu-CR](branch-0.9.x) catalog manager: fix a locking error
Adar Dembo has submitted this change and it was merged. Change subject: catalog_manager: fix a locking error .. catalog_manager: fix a locking error This lock acquisition took the wrong lock, which meant it didn't add any protection against table/tablet visition races at all. See commit 1681d9a for context. Change-Id: I8db7a78309bb9182ae1fdb92485e8467b90a84b2 Reviewed-on: http://gerrit.cloudera.org:8080/3425 Tested-by: Kudu Jenkins Reviewed-by: David Ribeiro Alves (cherry picked from commit 5f7d79910d8da492195b5c22a6d9a7911f950107) Reviewed-on: http://gerrit.cloudera.org:8080/3461 Tested-by: Adar Dembo --- M src/kudu/master/catalog_manager.cc 1 file changed, 1 insertion(+), 1 deletion(-) Approvals: David Ribeiro Alves: Looks good to me, approved Adar Dembo: Verified -- To view, visit http://gerrit.cloudera.org:8080/3461 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I8db7a78309bb9182ae1fdb92485e8467b90a84b2 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves
[kudu-CR] KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup
David Ribeiro Alves has posted comments on this change. Change subject: KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup .. Patch Set 3: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/3321/1/src/kudu/tablet/tablet_bootstrap.cc File src/kudu/tablet/tablet_bootstrap.cc: Line 271: // At least one operation resulted in a mutation to a store, and at least > Changed the API as you suggested, though not sure what you mean about the " though there would be some cases where we'd do both. even if not like it better like this. thanks for fixing. -- To view, visit http://gerrit.cloudera.org:8080/3321 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8ecf8d780de1aa89fae4e0510d8291eb1f1cee11 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Integrate the ResultTracker into the rpc subsystem and add a test
Todd Lipcon has posted comments on this change. Change subject: Integrate the ResultTracker into the rpc subsystem and add a test .. Patch Set 11: (9 comments) http://gerrit.cloudera.org:8080/#/c/3192/11/src/kudu/common/common.proto File src/kudu/common/common.proto: Line 316: optional bool track_rpc_result = 50006 [default=false]; I dont think this should be in 'common'. 'common' is for data-model type stuff. Perhaps 'rpc_header.proto' or a new proto file in the rpc/ subdir. (rpc doesn't depend on common, so actually slightly surprised this builds properly) http://gerrit.cloudera.org:8080/#/c/3192/11/src/kudu/rpc/CMakeLists.txt File src/kudu/rpc/CMakeLists.txt: Line 89: kudu_common_proto oh, now I see why it builds. This dependency seems misplaced. (eg keep in mind that Impala's thinking about using krpc, so we shouldn't have upwards dependencies from RPC into common/) http://gerrit.cloudera.org:8080/#/c/3192/11/src/kudu/rpc/protoc-gen-krpc.cc File src/kudu/rpc/protoc-gen-krpc.cc: Line 46: #include "kudu/common/common.pb.h" nit: sorting Line 467: "mi->result_tracker.reset($result_tracker$);\n" we have a separate ResultTracker per RPC type? that doesn't seem right. http://gerrit.cloudera.org:8080/#/c/3192/11/src/kudu/rpc/rpc-stress-test.cc File src/kudu/rpc/rpc-stress-test.cc: PS11, Line 73: random amount update Line 134: vector adders; use unique_ptr? http://gerrit.cloudera.org:8080/#/c/3192/11/src/kudu/rpc/rpc-test-base.h File src/kudu/rpc/rpc-test-base.h: Line 288: std::atomic_int exactly_once_test_val_; hrm, is this atomic? haven't seen this http://gerrit.cloudera.org:8080/#/c/3192/11/src/kudu/rpc/rpc_context.cc File src/kudu/rpc/rpc_context.cc: PS11, Line 66: call_->RespondSuccess(*response_pb_); : delete this; : per the comment on the other review, it seems weird we now have this code split between ResultTracker and here. http://gerrit.cloudera.org:8080/#/c/3192/11/src/kudu/rpc/rtest.proto File src/kudu/rpc/rtest.proto: Line 129: option (kudu.track_rpc_result) = true; add something to design-docs/rpc.md about this feature? -- To view, visit http://gerrit.cloudera.org:8080/3192 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1d624810350feceefe244e0319e22cac241bf0d6 Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR](branch-0.9.x) catalog manager: fix a locking error
David Ribeiro Alves has posted comments on this change. Change subject: catalog_manager: fix a locking error .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/3461 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8db7a78309bb9182ae1fdb92485e8467b90a84b2 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-HasComments: No
[kudu-CR](branch-0.9.x) KUDU-1469. Fix handling of fully-deduped requests after a leader change
Todd Lipcon has posted comments on this change. Change subject: KUDU-1469. Fix handling of fully-deduped requests after a leader change .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/3455 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iced21ae1b69c1079efc9aa9cf23e2fa592b8bebd Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR](branch-0.9.x) Fix stray memory writes due to tcmalloc profiling
Todd Lipcon has posted comments on this change. Change subject: Fix stray memory writes due to tcmalloc profiling .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/3456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9afca83d9cc24585960f6bf68d8996c4736ce6cb Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR](branch-0.9.x) KUDU-1469. Fix handling of fully-deduped requests after a leader change
Todd Lipcon has submitted this change and it was merged. Change subject: KUDU-1469. Fix handling of fully-deduped requests after a leader change .. KUDU-1469. Fix handling of fully-deduped requests after a leader change This fixes KUDU-1469, a bug in which the leader and follower could get into a tight loop of RPCs making no progress replicating operations. The newly included test fails without the bug fix, and passes with it. See its comments for details on the bug itself. Change-Id: Iced21ae1b69c1079efc9aa9cf23e2fa592b8bebd Reviewed-on: http://gerrit.cloudera.org:8080/3228 Tested-by: Kudu Jenkins Reviewed-by: David Ribeiro Alves Reviewed-by: Mike Percy (cherry picked from commit d685747426472d428b1d071df00d112d9f775117) Reviewed-on: http://gerrit.cloudera.org:8080/3455 Reviewed-by: Todd Lipcon --- M src/kudu/consensus/consensus.proto M src/kudu/consensus/consensus_queue.cc M src/kudu/consensus/log_cache.cc M src/kudu/consensus/raft_consensus-test.cc M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus_state.cc M src/kudu/consensus/raft_consensus_state.h M src/kudu/integration-tests/raft_consensus-itest.cc 8 files changed, 158 insertions(+), 24 deletions(-) Approvals: Todd Lipcon: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/3455 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Iced21ae1b69c1079efc9aa9cf23e2fa592b8bebd Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR](branch-0.9.x) [java-client] RPC timeout may sometimes be reported as max attempts violation
Todd Lipcon has submitted this change and it was merged. Change subject: [java-client] RPC timeout may sometimes be reported as max attempts violation .. [java-client] RPC timeout may sometimes be reported as max attempts violation Fixes an issue where an RPC timeout could be reported in the error message as too many attempts. Change-Id: I9c5676ab5bd05170505ef0313b919c5475ae6b37 Reviewed-on: http://gerrit.cloudera.org:8080/3330 Tested-by: Kudu Jenkins Reviewed-by: Dan Burkert (cherry picked from commit d002e3257d49fe8e420c3d50eac54bb2d1952722) Reviewed-on: http://gerrit.cloudera.org:8080/3457 Reviewed-by: Jean-Daniel Cryans --- M java/kudu-client/src/main/java/org/kududb/client/AsyncKuduClient.java 1 file changed, 5 insertions(+), 4 deletions(-) Approvals: Jean-Daniel Cryans: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/3457 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I9c5676ab5bd05170505ef0313b919c5475ae6b37 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR](branch-0.9.x) Fix stray memory writes due to tcmalloc profiling
Todd Lipcon has submitted this change and it was merged. Change subject: Fix stray memory writes due to tcmalloc profiling .. Fix stray memory writes due to tcmalloc profiling This fixes an issue that has been causing frequent crashes in JD.com's production cluster as well as various Cloudera test clusters. The crashes would be in various different places, but the key signature was that offset 120 in some data structure or array (eg the 16th element of a vector) would be corrupted. After doing a git bisect using an integration testing cluster running an ITBLL workload, I found that this was a regression caused by the introduction of tcmalloc contention profiling[1]. The short explanation is that, if we experienced contention while freeing a Trace object, we could in some cases increment offset 120 of some other allocation which occurred soon after the deallocation of the Trace. The issue is described in more detail in a new comment in trace.h. With this patch, I was unable to reproduce the issue on the test cluster. No new test is added since this is quite timing-dependent and not amenable to unit testing or even stress testing. [1] commit f6691e744b9cb796e1bbc6e07953f21f387c9a88 Change-Id: I9afca83d9cc24585960f6bf68d8996c4736ce6cb Reviewed-on: http://gerrit.cloudera.org:8080/3445 Reviewed-by: David Ribeiro Alves Reviewed-by: Mike Percy Tested-by: Kudu Jenkins (cherry picked from commit 0bb59836fb3e4b92e2d88674e8b88546faac5c8b) Reviewed-on: http://gerrit.cloudera.org:8080/3456 Reviewed-by: Todd Lipcon --- M src/kudu/util/trace.h 1 file changed, 23 insertions(+), 3 deletions(-) Approvals: Todd Lipcon: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/3456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I9afca83d9cc24585960f6bf68d8996c4736ce6cb Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR](branch-0.9.x) catalog manager: fix a locking error
Adar Dembo has posted comments on this change. Change subject: catalog_manager: fix a locking error .. Patch Set 2: Verified+1 Overriding (what I believe to be a) spurious Java TSAN test error. -- To view, visit http://gerrit.cloudera.org:8080/3461 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8db7a78309bb9182ae1fdb92485e8467b90a84b2 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR](branch-0.9.x) [java-client] RPC timeout may sometimes be reported as max attempts violation
Jean-Daniel Cryans has posted comments on this change. Change subject: [java-client] RPC timeout may sometimes be reported as max attempts violation .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/3457 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9c5676ab5bd05170505ef0313b919c5475ae6b37 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR](branch-0.9.x) KUDU-1473: fix some tablet lock usage in CatalogManager
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1473: fix some tablet lock usage in CatalogManager .. Patch Set 2: Build Started http://104.196.14.100/job/kudu-gerrit/1953/ -- To view, visit http://gerrit.cloudera.org:8080/3460 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: No
[kudu-CR](branch-0.9.x) catalog manager: fix a locking error
Kudu Jenkins has posted comments on this change. Change subject: catalog_manager: fix a locking error .. Patch Set 2: Build Started http://104.196.14.100/job/kudu-gerrit/1952/ -- To view, visit http://gerrit.cloudera.org:8080/3461 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8db7a78309bb9182ae1fdb92485e8467b90a84b2 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] Allow to set RequestId in the RPC RequestHeader
David Ribeiro Alves has submitted this change and it was merged. Change subject: Allow to set RequestId in the RPC RequestHeader .. Allow to set RequestId in the RPC RequestHeader This allows to set a RequestId in the RpcController that will in turn be set in the RequestHeader. This doesn't have specific tests, as other patches test this functionality, but having this be stand-alone allows to work on the client and the server simultaneously. Change-Id: I92566250683ee69fad57a9ae694842d4a0b17ab4 Reviewed-on: http://gerrit.cloudera.org:8080/3179 Tested-by: Kudu Jenkins Reviewed-by: Todd Lipcon --- M src/kudu/rpc/outbound_call.cc M src/kudu/rpc/rpc_controller.cc M src/kudu/rpc/rpc_controller.h 3 files changed, 34 insertions(+), 1 deletion(-) Approvals: Todd Lipcon: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/3179 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I92566250683ee69fad57a9ae694842d4a0b17ab4 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR](branch-0.9.x) KUDU-1469. Fix handling of fully-deduped requests after a leader change
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1469. Fix handling of fully-deduped requests after a leader change .. Patch Set 2: Build Started http://104.196.14.100/job/kudu-gerrit/1950/ -- To view, visit http://gerrit.cloudera.org:8080/3455 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iced21ae1b69c1079efc9aa9cf23e2fa592b8bebd Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR](branch-0.9.x) Fix stray memory writes due to tcmalloc profiling
Kudu Jenkins has posted comments on this change. Change subject: Fix stray memory writes due to tcmalloc profiling .. Patch Set 2: Build Started http://104.196.14.100/job/kudu-gerrit/1949/ -- To view, visit http://gerrit.cloudera.org:8080/3456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9afca83d9cc24585960f6bf68d8996c4736ce6cb Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR](branch-0.9.x) [java-client] RPC timeout may sometimes be reported as max attempts violation
Kudu Jenkins has posted comments on this change. Change subject: [java-client] RPC timeout may sometimes be reported as max attempts violation .. Patch Set 2: Build Started http://104.196.14.100/job/kudu-gerrit/1951/ -- To view, visit http://gerrit.cloudera.org:8080/3457 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9c5676ab5bd05170505ef0313b919c5475ae6b37 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR](branch-0.9.x) Bump version to 0.9.1-SNAPSHOT
Todd Lipcon has submitted this change and it was merged. Change subject: Bump version to 0.9.1-SNAPSHOT .. Bump version to 0.9.1-SNAPSHOT Change-Id: I472c2999c0b13a2c046366a9e1ad7474eedf68d0 Reviewed-on: http://gerrit.cloudera.org:8080/3458 Reviewed-by: Jean-Daniel Cryans Tested-by: Todd Lipcon --- M java/interface-annotations/pom.xml M java/kudu-client-tools/pom.xml M java/kudu-client/pom.xml M java/kudu-csd/pom.xml M java/kudu-flume-sink/pom.xml M java/kudu-mapreduce/pom.xml M java/kudu-spark/pom.xml M java/pom.xml M version.txt 9 files changed, 10 insertions(+), 10 deletions(-) Approvals: Jean-Daniel Cryans: Looks good to me, approved Todd Lipcon: Verified -- To view, visit http://gerrit.cloudera.org:8080/3458 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I472c2999c0b13a2c046366a9e1ad7474eedf68d0 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Todd Lipcon
[kudu-CR](branch-0.9.x) Bump version to 0.9.1-SNAPSHOT
Todd Lipcon has posted comments on this change. Change subject: Bump version to 0.9.1-SNAPSHOT .. Patch Set 1: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/3458 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I472c2999c0b13a2c046366a9e1ad7474eedf68d0 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR](branch-0.9.x) update dist-test configuration for new client.py location
Todd Lipcon has submitted this change and it was merged. Change subject: update dist-test configuration for new client.py location .. update dist-test configuration for new client.py location client.py has been moved to bin/client Change-Id: I48e4617222b8b503b21eeb560800dc5619d7ac26 Reviewed-on: http://gerrit.cloudera.org:8080/3332 Reviewed-by: Adar Dembo Tested-by: Kudu Jenkins (cherry picked from commit e4833d2df35721c03955ef0902a70c2de2536f99) Reviewed-on: http://gerrit.cloudera.org:8080/3459 Reviewed-by: Todd Lipcon --- M build-support/dist_test.py M build-support/jenkins/build-and-test.sh 2 files changed, 3 insertions(+), 3 deletions(-) Approvals: Todd Lipcon: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/3459 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I48e4617222b8b503b21eeb560800dc5619d7ac26 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR](branch-0.9.x) update dist-test configuration for new client.py location
Todd Lipcon has posted comments on this change. Change subject: update dist-test configuration for new client.py location .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/3459 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I48e4617222b8b503b21eeb560800dc5619d7ac26 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR](gh-pages) Control how content is cached on the Kudu web site
Mike Percy has posted comments on this change. Change subject: Control how content is cached on the Kudu web site .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/3462/1/.htaccess File .htaccess: Line 1: # Don't cache content or styles / scripts served by us. > I forget, do we serve our own bootstrap/etc? or use a cdn? Mostly CDN, but it's a mix. We should probably try to get more from a CDN. e.g. right now we serve bootstrap, but jquery is on a CDN. If we had our own CDN, we could also put our own (versioned) files up there and have it cache the files indefinitely. Line 5: Header unset ETag > why not use etag based caching? ETags are not distributed; different servers will give different ETags for the same content. Since the upstream site appears to have mirrors (based on nslookup) then I don't think ETags will work. See also https://developer.yahoo.com/performance/rules.html#etags= However I didn't do a thorough investigation of how ASF load balancing is set up. -- To view, visit http://gerrit.cloudera.org:8080/3462 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08619464d8d458582282044a7db685f8fbbca483 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-Owner: Mike Percy Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Will Berkeley has posted comments on this change. Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. Patch Set 9: > Looks like a small compile error here :( whoops...was too hasty... -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. Patch Set 10: Build Started http://104.196.14.100/job/kudu-gerrit/1948/ -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3304 to look at the new patch set (#10). Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. KUDU-1398 CFile index blocks can store shortest separating prefix (No changes: resubmitting to trigger Jenkins build) This changes the values stored as index keys to be a shortest key between the first key of the data block and the last key of the previous data block. However, this change does not apply to deltafiles, because deltafiles expect to be able to decode an index key into a DeltaKey. The way the change works is illustrated with the example from the JIRA, extended: Block 1: apple, banana, cardamom Block 2: carrot, epazote, fennel Block 3: fig, guava, kiwi Before: ['apple' -> block 1, 'carrot' -> block 2, 'fig' -> block 3] After: ['a' -> block 1, 'carr' -> block 2, 'fi' -> block 3] Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 --- M src/kudu/cfile/binary_dict_block.cc M src/kudu/cfile/binary_dict_block.h M src/kudu/cfile/binary_plain_block.cc M src/kudu/cfile/binary_plain_block.h M src/kudu/cfile/binary_prefix_block.cc M src/kudu/cfile/binary_prefix_block.h M src/kudu/cfile/block_encodings.h M src/kudu/cfile/bloomfile.cc M src/kudu/cfile/bloomfile.h M src/kudu/cfile/bshuf_block.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_util.cc M src/kudu/cfile/cfile_util.h M src/kudu/cfile/cfile_writer.cc M src/kudu/cfile/cfile_writer.h M src/kudu/cfile/gvint_block.cc M src/kudu/cfile/gvint_block.h M src/kudu/cfile/index-test.cc M src/kudu/cfile/index_block.cc M src/kudu/cfile/plain_bitmap_block.h M src/kudu/cfile/plain_block.h M src/kudu/cfile/rle_block.h M src/kudu/tablet/deltafile.cc 23 files changed, 234 insertions(+), 71 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/04/3304/10 -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-1491. Address TSAN warning for compaction
Todd Lipcon has submitted this change and it was merged. Change subject: KUDU-1491. Address TSAN warning for compaction .. KUDU-1491. Address TSAN warning for compaction This adds a missing memory barrier that was exposed when I removed TSAN suppressions for mutation lists. Without the patch, TSAN failed on mt-tablet-test a few percent of the time. With the patch, I looped it 50 times with no failures: http://dist-test.cloudera.org/job?job_id=todd.1466638821.4940 Change-Id: I42d2566eca096da172440ce3a88d0393dda9d324 Reviewed-on: http://gerrit.cloudera.org:8080/3454 Reviewed-by: Adar Dembo Tested-by: Kudu Jenkins --- M src/kudu/tablet/compaction.cc 1 file changed, 4 insertions(+), 3 deletions(-) Approvals: Adar Dembo: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/3454 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I42d2566eca096da172440ce3a88d0393dda9d324 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR](gh-pages) Control how content is cached on the Kudu web site
Todd Lipcon has posted comments on this change. Change subject: Control how content is cached on the Kudu web site .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/3462/1/.htaccess File .htaccess: Line 1: # Don't cache content or styles / scripts served by us. I forget, do we serve our own bootstrap/etc? or use a cdn? Line 5: Header unset ETag why not use etag based caching? -- To view, visit http://gerrit.cloudera.org:8080/3462 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08619464d8d458582282044a7db685f8fbbca483 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-Owner: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR](gh-pages) Control how content is cached on the Kudu web site
Hello Todd Lipcon, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/3462 to review the following change. Change subject: Control how content is cached on the Kudu web site .. Control how content is cached on the Kudu web site Change-Id: I08619464d8d458582282044a7db685f8fbbca483 --- A .htaccess 1 file changed, 18 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/62/3462/1 -- To view, visit http://gerrit.cloudera.org:8080/3462 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I08619464d8d458582282044a7db685f8fbbca483 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-Owner: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Avoid missing 'override' keyword warnings in raft consensus-test.cc
Todd Lipcon has posted comments on this change. Change subject: Avoid missing 'override' keyword warnings in raft_consensus-test.cc .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/3289/2/src/kudu/consensus/CMakeLists.txt File src/kudu/consensus/CMakeLists.txt: Line 140: set_source_files_properties(raft_consensus-test.cc OK. fair enough on not upgrading, but maybe drop a comment here explaining why we are suppressing? -- To view, visit http://gerrit.cloudera.org:8080/3289 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I687a05ed560003721d4a05bd7a01261e03bd77d9 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1398 CFile index blocks can store shortest separating prefix
Todd Lipcon has posted comments on this change. Change subject: KUDU-1398 CFile index blocks can store shortest separating prefix .. Patch Set 9: Looks like a small compile error here -- To view, visit http://gerrit.cloudera.org:8080/3304 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I68ae9146fabd4a19b17d103d118d2d60e28bb315 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] Allow to set RequestId in the RPC RequestHeader
Todd Lipcon has posted comments on this change. Change subject: Allow to set RequestId in the RPC RequestHeader .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/3179 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I92566250683ee69fad57a9ae694842d4a0b17ab4 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] Clarify the difference between 'call id' and 'request id'
Todd Lipcon has posted comments on this change. Change subject: Clarify the difference between 'call_id' and 'request_id' .. Patch Set 6: Code-Review+2 Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/3409 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If040e79baea55f4d16b43ff64b7ec27a403fc18f Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] Clarify the difference between 'call id' and 'request id'
Todd Lipcon has submitted this change and it was merged. Change subject: Clarify the difference between 'call_id' and 'request_id' .. Clarify the difference between 'call_id' and 'request_id' This adds a bit more information on the difference between 'call_id' and 'request_id' to reduce confusion. Change-Id: If040e79baea55f4d16b43ff64b7ec27a403fc18f Reviewed-on: http://gerrit.cloudera.org:8080/3409 Reviewed-by: Todd Lipcon Tested-by: Todd Lipcon --- M src/kudu/rpc/rpc_header.proto 1 file changed, 9 insertions(+), 3 deletions(-) Approvals: Todd Lipcon: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/3409 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: If040e79baea55f4d16b43ff64b7ec27a403fc18f Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Todd Lipcon
[kudu-CR](branch-0.9.x) catalog manager: fix a locking error
David Ribeiro Alves has posted comments on this change. Change subject: catalog_manager: fix a locking error .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/3461 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8db7a78309bb9182ae1fdb92485e8467b90a84b2 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR](branch-0.9.x) KUDU-1473: fix some tablet lock usage in CatalogManager
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1473: fix some tablet lock usage in CatalogManager .. Patch Set 1: Build Started http://104.196.14.100/job/kudu-gerrit/1946/ -- To view, visit http://gerrit.cloudera.org:8080/3460 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: No
[kudu-CR](branch-0.9.x) catalog manager: fix a locking error
Hello David Ribeiro Alves, Kudu Jenkins, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/3461 to review the following change. Change subject: catalog_manager: fix a locking error .. catalog_manager: fix a locking error This lock acquisition took the wrong lock, which meant it didn't add any protection against table/tablet visition races at all. See commit 1681d9a for context. Change-Id: I8db7a78309bb9182ae1fdb92485e8467b90a84b2 Reviewed-on: http://gerrit.cloudera.org:8080/3425 Tested-by: Kudu Jenkins Reviewed-by: David Ribeiro Alves (cherry picked from commit 5f7d79910d8da492195b5c22a6d9a7911f950107) --- M src/kudu/master/catalog_manager.cc 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/3461/1 -- To view, visit http://gerrit.cloudera.org:8080/3461 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I8db7a78309bb9182ae1fdb92485e8467b90a84b2 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins
[kudu-CR](branch-0.9.x) catalog manager: fix a locking error
Kudu Jenkins has posted comments on this change. Change subject: catalog_manager: fix a locking error .. Patch Set 1: Build Started http://104.196.14.100/job/kudu-gerrit/1947/ -- To view, visit http://gerrit.cloudera.org:8080/3461 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8db7a78309bb9182ae1fdb92485e8467b90a84b2 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR](branch-0.9.x) KUDU-1473: fix some tablet lock usage in CatalogManager
Hello Mike Percy, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/3460 to review the following change. Change subject: KUDU-1473: fix some tablet lock usage in CatalogManager .. KUDU-1473: fix some tablet lock usage in CatalogManager This was probably due to the refactoring done in commit 59ff89d. Now that we're releasing tablet write locks as early as possible, we need to reacquire them (in read mode) for some operations. The lock misuse reared its head most often in the Java test TestKuduTable.testGetLocations, but could trigger in just about any test that didn't wait for table creation to finish before accessing the table. For example, CreateTableITest.TestCreateWhenMajorityOfReplicasFailCreation was a little flaky too. The backtrace was always the same: cow_object.h:82] Check failed: lock_.HasReaders() || lock_.HasWriteLock() @ 0x7fecb643e37b kudu::CowObject<>::state() at ??:0 @ 0x7fecb6422fe9 kudu::master::CatalogManager::ProcessPendingAssignments() at ??:0 @ 0x7fecb6421d3a kudu::master::CatalogManagerBgTasks::Run() at ??:0 @ 0x7fecb645c1f7 boost::_mfi::mf0<>::operator()() at ??:0 @ 0x7fecb645c15b boost::_bi::list1<>::operator()<>() at ??:0 @ 0x7fecb645c104 boost::_bi::bind_t<>::operator()() at ??:0 @ 0x7fecb645bf2a boost::detail::function::void_function_obj_invoker0<>::invoke() at ??:0 @ 0x7fecb0b66d82 boost::function0<>::operator()() at ??:0 @ 0x7fecafba96c0 kudu::Thread::SuperviseThread() at ??:0 @ 0x423faa __tsan_thread_start_func at ??:0 @ 0x7fecb28459d1 start_thread at ??:0 @ 0x7fecacd108fd clone at ??:0 Note: on branch-0.9.x, SendDeleteTabletRequest() does not need to acquire a tablet lock because it still uses the per-tablet replica cache which has since been removed on master. Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6 Reviewed-on: http://gerrit.cloudera.org:8080/3309 Reviewed-by: Mike Percy Tested-by: Adar Dembo (cherry picked from commit 679c6b2c93313bd4371916678b4ea8f52b621a50) --- M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h 2 files changed, 33 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/3460/1 -- To view, visit http://gerrit.cloudera.org:8080/3460 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Mike Percy
[kudu-CR](branch-0.9.x) update dist-test configuration for new client.py location
Kudu Jenkins has posted comments on this change. Change subject: update dist-test configuration for new client.py location .. Patch Set 1: Build Started http://104.196.14.100/job/kudu-gerrit/1945/ -- To view, visit http://gerrit.cloudera.org:8080/3459 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I48e4617222b8b503b21eeb560800dc5619d7ac26 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR](branch-0.9.x) update dist-test configuration for new client.py location
Todd Lipcon has uploaded a new change for review. http://gerrit.cloudera.org:8080/3459 Change subject: update dist-test configuration for new client.py location .. update dist-test configuration for new client.py location client.py has been moved to bin/client Change-Id: I48e4617222b8b503b21eeb560800dc5619d7ac26 Reviewed-on: http://gerrit.cloudera.org:8080/3332 Reviewed-by: Adar Dembo Tested-by: Kudu Jenkins (cherry picked from commit e4833d2df35721c03955ef0902a70c2de2536f99) --- M build-support/dist_test.py M build-support/jenkins/build-and-test.sh 2 files changed, 3 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/3459/1 -- To view, visit http://gerrit.cloudera.org:8080/3459 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I48e4617222b8b503b21eeb560800dc5619d7ac26 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Dan Burkert
[kudu-CR] KUDU-1491. Address TSAN warning for compaction
Adar Dembo has posted comments on this change. Change subject: KUDU-1491. Address TSAN warning for compaction .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/3454 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I42d2566eca096da172440ce3a88d0393dda9d324 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] Integrate the request tracker with the client
Todd Lipcon has posted comments on this change. Change subject: Integrate the request tracker with the client .. Patch Set 8: (1 comment) http://gerrit.cloudera.org:8080/#/c/3080/8/src/kudu/rpc/retriable_rpc.h File src/kudu/rpc/retriable_rpc.h: Line 210: request_tracker_->RpcCompleted(sequence_number_); > nit: mid-comment as a safety check, I think we should probably set the seq number back to NO_SEQ_NO, and in the dtor, DCHECK that it's NO_SEQ_NO. Otherwise we might have some very tricky-to-catch "leaks" of RPCs in the request tracker. -- To view, visit http://gerrit.cloudera.org:8080/3080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I94207c294452fcbdb3a7901fdb702674d47553ee Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Integrate the request tracker with the client
Todd Lipcon has posted comments on this change. Change subject: Integrate the request tracker with the client .. Patch Set 8: (10 comments) http://gerrit.cloudera.org:8080/#/c/3080/8//COMMIT_MSG Commit Message: Line 15: any specific tests. it's not possible to use RetriableRpc within rpc_stub-test to test the client part? http://gerrit.cloudera.org:8080/#/c/3080/8/src/kudu/client/batcher.cc File src/kudu/client/batcher.cc: Line 194:const scoped_refptr& request_tracker, what about storing this in the Batcher? or grabbing it via batcher->client->data->request_tracker as necessary? seems like an unnecessary extra parameter (and refcount incr/decr) http://gerrit.cloudera.org:8080/#/c/3080/8/src/kudu/client/client-internal.cc File src/kudu/client/client-internal.cc: Line 72: using rpc::RequestTracker; spurious changes in this file? http://gerrit.cloudera.org:8080/#/c/3080/8/src/kudu/client/client-internal.h File src/kudu/client/client-internal.h: Line 27: #include "kudu/rpc/request_tracker.h" can be a forward decl, no? http://gerrit.cloudera.org:8080/#/c/3080/8/src/kudu/client/meta_cache.h File src/kudu/client/meta_cache.h: Line 34: #include "kudu/rpc/request_tracker.h" spurious? http://gerrit.cloudera.org:8080/#/c/3080/8/src/kudu/rpc/retriable_rpc.h File src/kudu/rpc/retriable_rpc.h: Line 125: // in flight, so we retry with a delay. I think this merits a warning. Line 193: request_id->set_first_incomplete_seq_no(internal::NO_SEQ_NO); why not just leave it unset in the PB? Line 210: request_tracker_->RpcCompleted(sequence_number_); nit: mid-comment http://gerrit.cloudera.org:8080/#/c/3080/8/src/kudu/rpc/rpc_controller.cc File src/kudu/rpc/rpc_controller.cc: Line 103: return *request_id_.get(); nit: I think you dont need .get() http://gerrit.cloudera.org:8080/#/c/3080/8/src/kudu/rpc/rpc_controller.h File src/kudu/rpc/rpc_controller.h: Line 114: const RequestIdPB& request_id() const; I think this is somewhat error prone - IIRC the RPC layer "steals" the request ID out of the RpcController at send time, no? Seems like it's worth a doc comment that these are not accessible after an RPC has been sent. And maybe a DCHECK like some of the other accessors have? -- To view, visit http://gerrit.cloudera.org:8080/3080 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I94207c294452fcbdb3a7901fdb702674d47553ee Gerrit-PatchSet: 8 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: David Ribeiro Alves Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1491. Address TSAN warning for compaction
Hello Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3454 to look at the new patch set (#2). Change subject: KUDU-1491. Address TSAN warning for compaction .. KUDU-1491. Address TSAN warning for compaction This adds a missing memory barrier that was exposed when I removed TSAN suppressions for mutation lists. Without the patch, TSAN failed on mt-tablet-test a few percent of the time. With the patch, I looped it 50 times with no failures: http://dist-test.cloudera.org/job?job_id=todd.1466638821.4940 Change-Id: I42d2566eca096da172440ce3a88d0393dda9d324 --- M src/kudu/tablet/compaction.cc 1 file changed, 4 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/3454/2 -- To view, visit http://gerrit.cloudera.org:8080/3454 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I42d2566eca096da172440ce3a88d0393dda9d324 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1491. Address TSAN warning for compaction
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1491. Address TSAN warning for compaction .. Patch Set 2: Build Started http://104.196.14.100/job/kudu-gerrit/1944/ -- To view, visit http://gerrit.cloudera.org:8080/3454 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I42d2566eca096da172440ce3a88d0393dda9d324 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1491. Address TSAN warning for compaction
Todd Lipcon has posted comments on this change. Change subject: KUDU-1491. Address TSAN warning for compaction .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/3454/1/src/kudu/tablet/compaction.cc File src/kudu/tablet/compaction.cc: Line 508: static void AdvanceToLastInList(const Mutation** m) { > Do the next() calls here need to be converted into acquire_next()? hm good catch... I think so. -- To view, visit http://gerrit.cloudera.org:8080/3454 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I42d2566eca096da172440ce3a88d0393dda9d324 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR](branch-0.9.x) Bump version to 0.9.1-SNAPSHOT
Kudu Jenkins has posted comments on this change. Change subject: Bump version to 0.9.1-SNAPSHOT .. Patch Set 1: Build Started http://104.196.14.100/job/kudu-gerrit/1943/ -- To view, visit http://gerrit.cloudera.org:8080/3458 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I472c2999c0b13a2c046366a9e1ad7474eedf68d0 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR](branch-0.9.x) Bump version to 0.9.1-SNAPSHOT
Jean-Daniel Cryans has posted comments on this change. Change subject: Bump version to 0.9.1-SNAPSHOT .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/3458 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I472c2999c0b13a2c046366a9e1ad7474eedf68d0 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR](branch-0.9.x) Bump version to 0.9.1-SNAPSHOT
Hello Jean-Daniel Cryans, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/3458 to review the following change. Change subject: Bump version to 0.9.1-SNAPSHOT .. Bump version to 0.9.1-SNAPSHOT Change-Id: I472c2999c0b13a2c046366a9e1ad7474eedf68d0 --- M java/interface-annotations/pom.xml M java/kudu-client-tools/pom.xml M java/kudu-client/pom.xml M java/kudu-csd/pom.xml M java/kudu-flume-sink/pom.xml M java/kudu-mapreduce/pom.xml M java/kudu-spark/pom.xml M java/pom.xml M version.txt 9 files changed, 10 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/58/3458/1 -- To view, visit http://gerrit.cloudera.org:8080/3458 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I472c2999c0b13a2c046366a9e1ad7474eedf68d0 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Jean-Daniel Cryans
[kudu-CR] KUDU-1491. Address TSAN warning for compaction
Adar Dembo has posted comments on this change. Change subject: KUDU-1491. Address TSAN warning for compaction .. Patch Set 1: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/3454/1/src/kudu/tablet/compaction.cc File src/kudu/tablet/compaction.cc: Line 508: static void AdvanceToLastInList(const Mutation** m) { Do the next() calls here need to be converted into acquire_next()? -- To view, visit http://gerrit.cloudera.org:8080/3454 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I42d2566eca096da172440ce3a88d0393dda9d324 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: Yes
[kudu-CR](branch-0.9.x) [java-client] RPC timeout may sometimes be reported as max attempts violation
Todd Lipcon has uploaded a new change for review. http://gerrit.cloudera.org:8080/3457 Change subject: [java-client] RPC timeout may sometimes be reported as max attempts violation .. [java-client] RPC timeout may sometimes be reported as max attempts violation Fixes an issue where an RPC timeout could be reported in the error message as too many attempts. Change-Id: I9c5676ab5bd05170505ef0313b919c5475ae6b37 Reviewed-on: http://gerrit.cloudera.org:8080/3330 Tested-by: Kudu Jenkins Reviewed-by: Dan Burkert (cherry picked from commit d002e3257d49fe8e420c3d50eac54bb2d1952722) --- M java/kudu-client/src/main/java/org/kududb/client/AsyncKuduClient.java 1 file changed, 5 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/57/3457/1 -- To view, visit http://gerrit.cloudera.org:8080/3457 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I9c5676ab5bd05170505ef0313b919c5475ae6b37 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Dan Burkert
[kudu-CR](branch-0.9.x) [java-client] RPC timeout may sometimes be reported as max attempts violation
Kudu Jenkins has posted comments on this change. Change subject: [java-client] RPC timeout may sometimes be reported as max attempts violation .. Patch Set 1: Build Started http://104.196.14.100/job/kudu-gerrit/1942/ -- To view, visit http://gerrit.cloudera.org:8080/3457 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9c5676ab5bd05170505ef0313b919c5475ae6b37 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR](branch-0.9.x) Fix stray memory writes due to tcmalloc profiling
Kudu Jenkins has posted comments on this change. Change subject: Fix stray memory writes due to tcmalloc profiling .. Patch Set 1: Build Started http://104.196.14.100/job/kudu-gerrit/1941/ -- To view, visit http://gerrit.cloudera.org:8080/3456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9afca83d9cc24585960f6bf68d8996c4736ce6cb Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR](branch-0.9.x) Fix stray memory writes due to tcmalloc profiling
Todd Lipcon has uploaded a new change for review. http://gerrit.cloudera.org:8080/3456 Change subject: Fix stray memory writes due to tcmalloc profiling .. Fix stray memory writes due to tcmalloc profiling This fixes an issue that has been causing frequent crashes in JD.com's production cluster as well as various Cloudera test clusters. The crashes would be in various different places, but the key signature was that offset 120 in some data structure or array (eg the 16th element of a vector) would be corrupted. After doing a git bisect using an integration testing cluster running an ITBLL workload, I found that this was a regression caused by the introduction of tcmalloc contention profiling[1]. The short explanation is that, if we experienced contention while freeing a Trace object, we could in some cases increment offset 120 of some other allocation which occurred soon after the deallocation of the Trace. The issue is described in more detail in a new comment in trace.h. With this patch, I was unable to reproduce the issue on the test cluster. No new test is added since this is quite timing-dependent and not amenable to unit testing or even stress testing. [1] commit f6691e744b9cb796e1bbc6e07953f21f387c9a88 Change-Id: I9afca83d9cc24585960f6bf68d8996c4736ce6cb Reviewed-on: http://gerrit.cloudera.org:8080/3445 Reviewed-by: David Ribeiro Alves Reviewed-by: Mike Percy Tested-by: Kudu Jenkins (cherry picked from commit 0bb59836fb3e4b92e2d88674e8b88546faac5c8b) --- M src/kudu/util/trace.h 1 file changed, 23 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/56/3456/1 -- To view, visit http://gerrit.cloudera.org:8080/3456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I9afca83d9cc24585960f6bf68d8996c4736ce6cb Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Todd Lipcon
[kudu-CR](branch-0.9.x) KUDU-1469. Fix handling of fully-deduped requests after a leader change
Todd Lipcon has uploaded a new change for review. http://gerrit.cloudera.org:8080/3455 Change subject: KUDU-1469. Fix handling of fully-deduped requests after a leader change .. KUDU-1469. Fix handling of fully-deduped requests after a leader change This fixes KUDU-1469, a bug in which the leader and follower could get into a tight loop of RPCs making no progress replicating operations. The newly included test fails without the bug fix, and passes with it. See its comments for details on the bug itself. Change-Id: Iced21ae1b69c1079efc9aa9cf23e2fa592b8bebd Reviewed-on: http://gerrit.cloudera.org:8080/3228 Tested-by: Kudu Jenkins Reviewed-by: David Ribeiro Alves Reviewed-by: Mike Percy (cherry picked from commit d685747426472d428b1d071df00d112d9f775117) --- M src/kudu/consensus/consensus.proto M src/kudu/consensus/consensus_queue.cc M src/kudu/consensus/log_cache.cc M src/kudu/consensus/raft_consensus-test.cc M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus_state.cc M src/kudu/consensus/raft_consensus_state.h M src/kudu/integration-tests/raft_consensus-itest.cc 8 files changed, 158 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/55/3455/1 -- To view, visit http://gerrit.cloudera.org:8080/3455 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Iced21ae1b69c1079efc9aa9cf23e2fa592b8bebd Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Todd Lipcon
[kudu-CR](branch-0.9.x) KUDU-1469. Fix handling of fully-deduped requests after a leader change
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1469. Fix handling of fully-deduped requests after a leader change .. Patch Set 1: Build Started http://104.196.14.100/job/kudu-gerrit/1940/ -- To view, visit http://gerrit.cloudera.org:8080/3455 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iced21ae1b69c1079efc9aa9cf23e2fa592b8bebd Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-0.9.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] KUDU-1491. Address TSAN warning for compaction
Hello Mike Percy, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/3454 to review the following change. Change subject: KUDU-1491. Address TSAN warning for compaction .. KUDU-1491. Address TSAN warning for compaction This adds a missing memory barrier that was exposed when I removed TSAN suppressions for mutation lists. Without the patch, TSAN failed on mt-tablet-test a few percent of the time. With the patch, I looped it 50 times with no failures: http://dist-test.cloudera.org/job?job_id=todd.1466638821.4940 Change-Id: I42d2566eca096da172440ce3a88d0393dda9d324 --- M src/kudu/tablet/compaction.cc 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/3454/1 -- To view, visit http://gerrit.cloudera.org:8080/3454 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I42d2566eca096da172440ce3a88d0393dda9d324 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Mike Percy
[kudu-CR] KUDU-1491. Address TSAN warning for compaction
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1491. Address TSAN warning for compaction .. Patch Set 1: Build Started http://104.196.14.100/job/kudu-gerrit/1939/ -- To view, visit http://gerrit.cloudera.org:8080/3454 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I42d2566eca096da172440ce3a88d0393dda9d324 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: No
[kudu-CR] KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup .. Patch Set 3: Build Started http://104.196.14.100/job/kudu-gerrit/1938/ -- To view, visit http://gerrit.cloudera.org:8080/3321 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8ecf8d780de1aa89fae4e0510d8291eb1f1cee11 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1469. Fix handling of fully-deduped requests after a leader change
Mike Percy has submitted this change and it was merged. Change subject: KUDU-1469. Fix handling of fully-deduped requests after a leader change .. KUDU-1469. Fix handling of fully-deduped requests after a leader change This fixes KUDU-1469, a bug in which the leader and follower could get into a tight loop of RPCs making no progress replicating operations. The newly included test fails without the bug fix, and passes with it. See its comments for details on the bug itself. Change-Id: Iced21ae1b69c1079efc9aa9cf23e2fa592b8bebd Reviewed-on: http://gerrit.cloudera.org:8080/3228 Tested-by: Kudu Jenkins Reviewed-by: David Ribeiro Alves Reviewed-by: Mike Percy --- M src/kudu/consensus/consensus.proto M src/kudu/consensus/consensus_queue.cc M src/kudu/consensus/log_cache.cc M src/kudu/consensus/raft_consensus-test.cc M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus_state.cc M src/kudu/consensus/raft_consensus_state.h M src/kudu/integration-tests/raft_consensus-itest.cc 8 files changed, 158 insertions(+), 24 deletions(-) Approvals: David Ribeiro Alves: Looks good to me, approved Mike Percy: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/3228 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Iced21ae1b69c1079efc9aa9cf23e2fa592b8bebd Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1469. Fix handling of fully-deduped requests after a leader change
Mike Percy has posted comments on this change. Change subject: KUDU-1469. Fix handling of fully-deduped requests after a leader change .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/3228/4/src/kudu/consensus/raft_consensus_state.cc File src/kudu/consensus/raft_consensus_state.cc: Line 653: last_received_op_id_current_leader_ = last_received_op_id_; > Looked at this for a while, and I think it wasn't causing an issue because ok, thanks for looking. -- To view, visit http://gerrit.cloudera.org:8080/3228 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iced21ae1b69c1079efc9aa9cf23e2fa592b8bebd Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3321 to look at the new patch set (#2). Change subject: KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup .. KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup This fixes a bug seen in a recent YCSB stress test that I ran in which I was accidentally writing tens of thousands of duplicate keys per second. After a tablet server restarted, it failed to come up due to a pending commit which referred to no mutated stores (e.g. because all of the operations were duplicate key inserts). This patch tweaks the logic for this safety check: a commit with no mutated stores trivially has "no active stores". However, that's not the same as having "only inactive stores" -- the subtlety is in the difference in behavior when a commit has no stores at all. The patch adds a new targeted test in tablet_bootstrap-test as well as a more end-to-end test in ts_recovery-itest. Both reproduced the bug reliably before this patch. Change-Id: I8ecf8d780de1aa89fae4e0510d8291eb1f1cee11 --- M src/kudu/consensus/log-test-base.h M src/kudu/integration-tests/raft_consensus-itest.cc M src/kudu/integration-tests/test_workload.cc M src/kudu/integration-tests/test_workload.h M src/kudu/integration-tests/ts_recovery-itest.cc M src/kudu/tablet/tablet_bootstrap-test.cc M src/kudu/tablet/tablet_bootstrap.cc 7 files changed, 148 insertions(+), 23 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/3321/2 -- To view, visit http://gerrit.cloudera.org:8080/3321 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8ecf8d780de1aa89fae4e0510d8291eb1f1cee11 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup .. Patch Set 2: Build Started http://104.196.14.100/job/kudu-gerrit/1937/ -- To view, visit http://gerrit.cloudera.org:8080/3321 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8ecf8d780de1aa89fae4e0510d8291eb1f1cee11 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup
Todd Lipcon has posted comments on this change. Change subject: KUDU-1477. Pending COMMIT message for failed write operation can prevent tablet startup .. Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/3321/1//COMMIT_MSG Commit Message: Line 12: up due to a pending commit which referred to no mutated stores. > ... because all the rows had errors? please clarify Done http://gerrit.cloudera.org:8080/#/c/3321/1/src/kudu/consensus/log-test-base.h File src/kudu/consensus/log-test-base.h: Line 273: void AppendFailedCommit(const OpId& original_opid) { > what's a "FailedCommit" this seems to imply that a we're appending a commit Done http://gerrit.cloudera.org:8080/#/c/3321/1/src/kudu/integration-tests/test_workload.h File src/kudu/integration-tests/test_workload.h: Line 116: void set_write_pattern(WritePattern pattern) { > nit: blank line above Done http://gerrit.cloudera.org:8080/#/c/3321/1/src/kudu/integration-tests/ts_recovery-itest.cc File src/kudu/integration-tests/ts_recovery-itest.cc: Line 148:1, // TODO > TODO what? Done http://gerrit.cloudera.org:8080/#/c/3321/1/src/kudu/tablet/tablet_bootstrap.cc File src/kudu/tablet/tablet_bootstrap.cc: Line 271: bool AreAllStoresInactive(const CommitMsg& commit); > by "iterating through the stores" I meant through the ops on the commit mes Changed the API as you suggested, though not sure what you mean about the "multiple iterations". We only use this method for some sanity checks on orphan/pending commits. -- To view, visit http://gerrit.cloudera.org:8080/3321 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I8ecf8d780de1aa89fae4e0510d8291eb1f1cee11 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Allow for reserving disk space for non-Kudu processes
Adar Dembo has posted comments on this change. Change subject: Allow for reserving disk space for non-Kudu processes .. Patch Set 5: (24 comments) http://gerrit.cloudera.org:8080/#/c/3135/5/src/kudu/consensus/log.cc File src/kudu/consensus/log.cc: Line 111: DEFINE_int64(log_dir_reserved_bytes, 0, Nit: I think this would be more clear as fs_wal_dir_reserved_bytes, as it would closely parallel fs_wal_dir. Not sure if that implies it ought to be defined in the fs module and declared here, I don't have a strong opinion on that. Line 113: TAG_FLAG(log_dir_reserved_bytes, runtime); Hmm, I'm not sure if we should allow changing these values at runtime. If someone increases the reservation, we'll stop allowing new containers (good), but we have no way of reducing Kudu's disk space usage to meet the reservation (bad). Meaning, we can't fully satisfy the operator's desired semantics for a runtime change of these flags. http://gerrit.cloudera.org:8080/#/c/3135/5/src/kudu/fs/block_manager-test.cc File src/kudu/fs/block_manager-test.cc: PS5, Line 951: We use a COARSE clock so we need to let a : // little bit of time pass so we get at least one unit of time greater than : // before when we call MonoTime::Now() again. Let's switch to FINE clock so we won't have this limitation. http://gerrit.cloudera.org:8080/#/c/3135/5/src/kudu/fs/log_block_manager.cc File src/kudu/fs/log_block_manager.cc: Line 64: DEFINE_int32(full_disk_cache_seconds, 30, Nit: please prefix with log_block_manager. It's super long, yes, but it's the only way we have to "namespace" the flags. Line 70: DEFINE_int64(data_dirs_reserved_bytes, 0, Please rename to fs_data_dirs_reserved_bytes (see my comment on log_dir_reserved_bytes for the rationale). Also indicate in the help that it only works with the log block manager. Line 96: "Number of unavailable log block containers", Nit: Number of Unavailable Block Containers (to be consistent with the existing container metrics). Line 98: "Number of non-full Block Containers that are under root paths that " Nit: "Number of non-full log block containers that are under root paths whose disks are full" (TBCWTECM). Line 265: RWFile* data_file() const { return data_file_.get(); } We're only using this for the filename. Could you make narrow accessor that just returns the data file's name? You could call it DataFilePath() to parallel MetadataFilePath(). As for the implementation, since it'll parallel MetadataFilePath(), could you use the same implementation for both? That is, either file.filename(), or append the right suffix to path_. Line 267: int64_t preallocated_offset() const { return preallocated_offset_; } Where is this used? Line 273: Env* env() const { return block_manager_->env(); } Is this ever used? Line 312: int64_t preallocated_offset_ = 0; Huh, didn't know C++11 allowed this. Could you make the same change for total_bytes_written_? Would be nice for both primitives to be initialized consistently. Line 1230: // Indexes of root paths that are below their reserved space threshold. Can we defer this work to the case where GetAvailableContainer() has returned null? That would speed up CreateBlock() for the common case. Line 1250: return Status::IOError("Unable to allocate block: All data directories are full. " We should add ENOSPC to this status, right? Line 1292: FLAGS_log_container_preallocate_bytes, Preallocate() may not actually preallocate FLAGS_log_container_preallocate_bytes bytes. When the container is first created it will, but after that, the preallocate base and bounds are total_bytes_written_ and FLAGS_log_container_preallocate_bytes respectively. Which means, each successive Preallocate() will only preallocate by the size of the last block to be written to the container. To be honest, I'm not even sure if the above produces desirable results. More specifically, suppose we have a container backed by a single ext4 extent that was preallocated when the container was created. Will subsequent preallocations "extend" that extent? Or will they add new extents, each the size of the last block written? If it's the latter, then the preallocation logic is hot garbage and should be replaced. It's not fair of me to ask you to fix preallocation (though you're certainly welcome to if you prefer to do that), but I do think we need to pass the right value to VerifySufficientDiskSpace(). In the context of preallocation here, I'm not really sure what it should be. As an alternative, what do you think of moving disk space enforcement into Container::WriteData()? It's not ideal because that's not where disk space is actually consumed, and we'd need to verify that a failure there doesn't orphan blocks (I think the failure should
[kudu-CR] KUDU-1469. Fix handling of fully-deduped requests after a leader change
Todd Lipcon has posted comments on this change. Change subject: KUDU-1469. Fix handling of fully-deduped requests after a leader change .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/3228/4/src/kudu/consensus/raft_consensus_state.cc File src/kudu/consensus/raft_consensus_state.cc: Line 653: last_received_op_id_current_leader_ = last_received_op_id_; > Err. Is that something that would not be hard to fix? It seems like missing Looked at this for a while, and I think it wasn't causing an issue because on any term change, if the new leader's first operation causes us to truncate our log, we've already reset the last_received_op_id_ down to that truncation point. And if it didn't cause us to truncate our log (i.e it's ahead), then sending back our own last_received would cause the leader to re-send the next batch at that same index, which would cause the truncation/reset on the next round trip. So, either one is actually correct. I can't think of a scenario that would break in either case, at least (this one might be more conservative, though) -- To view, visit http://gerrit.cloudera.org:8080/3228 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iced21ae1b69c1079efc9aa9cf23e2fa592b8bebd Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Add Env::GetBytesFree() method
Mike Percy has submitted this change and it was merged. Change subject: Add Env::GetBytesFree() method .. Add Env::GetBytesFree() method This method returns the number of bytes free on the filesystem represented by its path argument. Change-Id: Ica484be987312b275bafdd6e6fa79239ed3ac1ff Reviewed-on: http://gerrit.cloudera.org:8080/3451 Reviewed-by: Adar Dembo Tested-by: Mike Percy --- M src/kudu/util/env-test.cc M src/kudu/util/env.h M src/kudu/util/env_posix.cc 3 files changed, 57 insertions(+), 1 deletion(-) Approvals: Mike Percy: Verified Adar Dembo: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/3451 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ica484be987312b275bafdd6e6fa79239ed3ac1ff Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Mike Percy
[kudu-CR] Add Env::GetBytesFree() method
Mike Percy has posted comments on this change. Change subject: Add Env::GetBytesFree() method .. Patch Set 3: Verified+1 Overriding unrelated failure -- To view, visit http://gerrit.cloudera.org:8080/3451 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ica484be987312b275bafdd6e6fa79239ed3ac1ff Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Mike Percy Gerrit-HasComments: No
[kudu-CR] Add Env::GetBytesFree() method
Mike Percy has posted comments on this change. Change subject: Add Env::GetBytesFree() method .. Patch Set 3: Filed a JIRA for the data race discovered by Jenkins: https://issues.apache.org/jira/browse/KUDU-1491 -- To view, visit http://gerrit.cloudera.org:8080/3451 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ica484be987312b275bafdd6e6fa79239ed3ac1ff Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: No