[kudu-CR] tablet copy: Allow voting from crashed initial tablet copies
Alexey Serbin has posted comments on this change. Change subject: tablet copy: Allow voting from crashed initial tablet copies .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/7961/3/src/kudu/integration-tests/tablet_copy-itest.cc File src/kudu/integration-tests/tablet_copy-itest.cc: PS3, Line 1155: ASSERT_EVENTUALLY By default, this times out in 30 seconds (which is exactly kTimeout for this test). Would it make sense to set some custom timeout while awaiting for eventual success (i.e. multiple of kTimeout)? PS3, Line 1161: ASSERT_OK What it itest::AddServer() returns other than IllegalState, not-the-leader status? Is there a risk to overlook some other bug in that case if using such this sort of 'blanket' retry? PS3, Line 1187: LOG(INFO) << "Restarting tablet servers. New replica TS UUID: " << new_replica_uuid : << ", tablet data state: " : << tablet::TabletDataState_Name(superblock.tablet_data_state()) : << ", last-logged opid: " << superblock.tombstone_last_logged_opid(); nit: would it make sense to move this after ASSERT_OPID_EQ() and before ASSERT_OK(cluster_->master()->Restart()) ? http://gerrit.cloudera.org:8080/#/c/7961/3/src/kudu/tserver/tablet_copy_client.cc File src/kudu/tserver/tablet_copy_client.cc: PS3, Line 401: RETURN_NOT_OK(meta_->ReplaceSuperBlock(*superblock_)); If not clearing the tombstone_last_logged_opid field in the superblock, would tombstone_last_logged_opid_ be still consistent after call of the DeleteTabletData() below? Or it's not relevant here? -- To view, visit http://gerrit.cloudera.org:8080/7961 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idbc9cecac701c24ebe05759adbbb1fbe2b140506 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [tests] fixed flake in delete table-itest
Alexey Serbin has uploaded a new patch set (#2). Change subject: [tests] fixed flake in delete_table-itest .. [tests] fixed flake in delete_table-itest Fixed flake in DeleteTableITest.TestNoDeleteTombstonedTablets: if leader election happened in the middle of the AddServer/RemoveServer sequence, the test failed with error messages like the following: src/kudu/integration-tests/delete_table-itest.cc:1217: Failure Failed Bad status: Illegal state: Replica 37783b00d5d34ffe87953cb90fa60e7c \ is not leader of this config. Role: FOLLOWER. If running tests against a DEBUG build, it were about 3 failures per 1K run: http://dist-test.cloudera.org/job?job_id=aserbin.1504655654.10466 After the fix, there were no failures in multiple 1K runs: http://dist-test.cloudera.org/job?job_id=aserbin.1504670739.3127 Change-Id: I7dda40ae1054becaf25963a6d301ecaed5a926f9 --- M src/kudu/integration-tests/delete_table-itest.cc 1 file changed, 106 insertions(+), 31 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/72/7972/2 -- To view, visit http://gerrit.cloudera.org:8080/7972 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7dda40ae1054becaf25963a6d301ecaed5a926f9 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot
[kudu-CR] [tests] fixed flake in delete table-itest
Alexey Serbin has uploaded a new change for review. http://gerrit.cloudera.org:8080/7972 Change subject: [tests] fixed flake in delete_table-itest .. [tests] fixed flake in delete_table-itest Fixed flake in DeleteTableITest.TestNoDeleteTombstonedTablets: if leader election happened in the middle of the AddServer/RemoveServer sequence, the test failed with error messages like the following: src/kudu/integration-tests/delete_table-itest.cc:1217: Failure Failed Bad status: Illegal state: Replica 37783b00d5d34ffe87953cb90fa60e7c \ is not leader of this config. Role: FOLLOWER. If running tests against a DEBUG build, it were about 3 failures per 1K run: http://dist-test.cloudera.org/job?job_id=aserbin.1504655654.10466 After the fix, there were no failures in multiple 1K runs: http://dist-test.cloudera.org/job?job_id=aserbin.1504670739.3127 Change-Id: I7dda40ae1054becaf25963a6d301ecaed5a926f9 --- M src/kudu/integration-tests/delete_table-itest.cc 1 file changed, 104 insertions(+), 30 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/72/7972/1 -- To view, visit http://gerrit.cloudera.org:8080/7972 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I7dda40ae1054becaf25963a6d301ecaed5a926f9 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey Serbin
[kudu-CR] KUDU-1807: improve IsCreateInProgress performance
Adar Dembo has posted comments on this change. Change subject: KUDU-1807: improve IsCreateInProgress performance .. Patch Set 2: (11 comments) > Is there any way we can add a debug-mode consistency check of some > sort that verifies that the map of counts maps the calculated > iteration? even if it's correct right now it seems like it may > regress and be really hard to debug at some point later. (iirc the > Java client just loops forever trying to fetch locations if it > thinks the table is in-progress) I added a function to do this verification, and added a call to it in IsCreateInProgress. Let me know if you think I should call it elsewhere, or add something else. http://gerrit.cloudera.org:8080/#/c/7957/2/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: Line 637: // Then, commit the mutations. > is there some race window here where a caller ends up asking for GetTableLo You're right that there's a race window here, and that it can cause GetTableLocations to think that a tablet isn't yet running. However, all of our client implementations will treat this as a transient state and retry with some backoff. Given the above, I wasn't sure whether this is worth fixing. What ultimately convinced me was that it's needed for the correctness of a "compare the tablet state count map contents against the actual state counts" function. PS2, Line 663: set_state > perhaps this should be named swap_state to indicate its return value better Done Line 665: state_changes_[tablet->table()][old_state]--; > can we DCHECK here that this never goes negative? On the contrary: it probably will be negative, because state_changes_ represents the deltas that will be applied rather than the absolute counts. In any case, this is moot now because we're no longer aggregating state changes for tables. Line 672: const PersistentTabletInfo& data( > nit: is this wrapping necessary? I like to wrap at 80, and these two lines were 84 and 86 respectively. But I'll unwrap since you found it distracting. Line 681: PersistentTabletInfo* mutable_data( > nit: same (wrapping)? Done PS2, Line 3697: unlocker_in > maybe 'committer' is a better name Done, though I preserved the 'in' and 'out' suffixes to help navigate between these functions and ProcessPendingAssignments. PS2, Line 4468: SysTabletsEntryPB::RUNNING > don't we have to worry about REPLACED and DELETED? or do those actually get I tried to maintain the same semantics as the old implementation, which returned true when it found any non-RUNNING tablet. A tablet switches to REPLACED in ProcessPendingAssignments but is implicitly removed from the map when AddRemoveTablets is used to add the actual replacement tablet. It's interesting that here the mutation is committed before AddRemoveTablets is called, so there's a very brief window where a REPLACED tablet could be visible. I'm guessing this is harmless because it also implies a CREATING tablet, which means the table is definitely still being created. A tablet switches to DELETED either in DeleteTable or in AlterTable (when a range partition is dropped). In the former, the entire table switches to DELETED, which causes IsCreateInProgress callers to short-circuit. In the latter, a DELETED tablet is explicitly dropped via AddRemoveTablets. And unlike ProcessPendingAssignments, the mutation is committed after the tablet is dropped, so the DELETED tablet is never visible. PS2, Line 4591: DCHECK_GT(it->second, 0); > can you DCHECK_GE() vs 'delta'? Moot, Increment/Decrement now always add/remove just one. http://gerrit.cloudera.org:8080/#/c/7957/2/src/kudu/master/catalog_manager.h File src/kudu/master/catalog_manager.h: PS2, Line 123: get_state > unusual naming for a getter It's because the three callers are all tablet->metadata().state().get_state(). So if this were just state(), the calls would be tablet->metadata().state().state(), which I found confusing. I can still change this if you prefer. PS2, Line 244: TabletStateMap > how about TabletStateCountMap? otherwise I would expect this type to be key Done PS2, Line 318: void IncrementTabletStateCountUnlocked(SysTabletsEntryPB::State state, : int64_t delta = 1); : void DecrementTabletStateCountUnlocked(SysTabletsEntryPB::State state, : int64_t delta = 1); > if you are taking counts here, it seems a little odd to have the two separa My original implementation used a single method which allowed negative deltas, but then I saw that the implementation was pretty different for negative deltas (due to additional invariant enforcement), and the symmetry with the SchemaVersionCount methods looked nice. Anyway, this is moot now because per-table state change batching is no longer a thing, so we always
[kudu-CR] KUDU-1807: improve IsCreateInProgress performance
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7957 to look at the new patch set (#3). Change subject: KUDU-1807: improve IsCreateInProgress performance .. KUDU-1807: improve IsCreateInProgress performance For some inexplicable reason, IsCreateInProgress is invoked when servicing a GetTableSchema RPC. While it's attractive to change that, it's also untenable without affecting backwards compatibility. So instead, here's a patch that adds in-memory caching to the master so that IsCreateInProgress needn't acquire N tablet locks. Just like schema_version_counts_, the cached state is a map of all tablet states to the number of tablets currently in that state. For this particular problem a single count of "number of creating tablets" might suffice, but I liked the consistency with schema_version_counts_ and I found it easier to handle the various corner cases when accounting for all states. Because the map contents reflect persistent state, the locking semantics differ somewhat from schema_version_counts_: 1. Tablet state changes are usually wrapped in a tablet metadata lock, so care must be taken to only update the state if the lock commits. 2. Further, the count must be updated with the tablet metadata lock held. Why? So that count updates are serialized in the same order as state changes themselves. Consider an example where two threads are trying to update a tablet's state (currently X) to Y and Z respectively. The count of state X begins at 1 (C_X=1) and the counts of states Y and Z are 0 (C_Y=C_Z=0). There's no defined order so whichever state change happens second "wins". However, if the count updates were allowed outside the metadata locks, it would be possible for the order of operations to be X->Y, Y->Z, (C_Y-- C_Z++), (C_X-- C_Y++), causing C_Y to momentarily go negative. If count updates are restricted to inside the metadata locks, a state change order of X->Y, Y->Z forces the count update order to be (C_X-- C_Y++), (C_Y--, C_Z++). 3. Lastly, the count must also be updated with the tablet metadata lock in commit mode. Otherwise it's possible for the state change to "leak" to another thread via the cached map (and thus affect IsCreateInProgress) but not via the actual tablet state. To accommodate these semantics, the ScopedTabletInfoCommitter was modified to support tracking state changes. There are two notable (and confusing) exceptions which are documented in the code. Change-Id: Idca379bb1249e54c0b7fbb0df408ed829ab9fbd2 --- M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h M src/kudu/master/sys_catalog-test.cc M src/kudu/util/cow_object.h 4 files changed, 261 insertions(+), 80 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/57/7957/3 -- To view, visit http://gerrit.cloudera.org:8080/7957 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idca379bb1249e54c0b7fbb0df408ed829ab9fbd2 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2119. Fix failure in encoding-test
Andrew Wong has posted comments on this change. Change subject: KUDU-2119. Fix failure in encoding-test .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7967/1/src/kudu/cfile/encoding-test.cc File src/kudu/cfile/encoding-test.cc: PS1, Line 255: or '\0', but Will brought this up when we were discussing this earlier and thought I'd bring it up: do we miss out test coverage for binary blobs that do start with '\0' or are there other tests that cover? -- To view, visit http://gerrit.cloudera.org:8080/7967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic46d2a993235e560475d931c7b023eb5b4faf7c2 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] catalog manager: improve IsAlterInProgress performance
Adar Dembo has submitted this change and it was merged. Change subject: catalog_manager: improve IsAlterInProgress performance .. catalog_manager: improve IsAlterInProgress performance To mentally prepare for KUDU-1807, I decided to also tackle the O(n) behavior of IsAlterInProgress. It's not nearly as bad as IsCreateInProgress (since we aren't taking cow locks on each tablet) but is still not great since IsAlterInProgress is called repeatedly by clients. The solution involves a new map in TableInfo to track the schema versions of all tablets. Keeping the map up-to-date requires additional work when adding/removing tablets and when tablet reports include a schema change, but these are infrequent operations relative to IsAlterInProgress so I figured the trade off is worth it. I'm not particularly happy about the lock acquisition in set_reported_schema_version() but couldn't see a good way to avoid it. I looped 1000 runs of alter_table-randomized-test. The ~5 failures I saw were due to KUDU-1539, which I don't think this patch makes us any more or less vulnerable to. Change-Id: Id8c1f48c0febad038833edd555ee88f1db83249d Reviewed-on: http://gerrit.cloudera.org:8080/7950 Reviewed-by: Dan BurkertTested-by: Kudu Jenkins Reviewed-by: Todd Lipcon --- M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h 2 files changed, 105 insertions(+), 22 deletions(-) Approvals: Dan Burkert: Looks good to me, approved Todd Lipcon: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7950 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Id8c1f48c0febad038833edd555ee88f1db83249d Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] cow object: lazily copy dirty state
Adar Dembo has submitted this change and it was merged. Change subject: cow_object: lazily copy dirty state .. cow_object: lazily copy dirty state This addresses a long-standing TODO that ought to improve performance of tablet reports in the steady-state: within the COWed objects, defer the creation of the "dirty" copy until it's actually needed (i.e. the first call to mutable_dirty()). The deferral introduces new branches in dirty() and mutable_dirty(), but given the paucity of those calls for any given lock instance, I think this is a reasonable trade-off. Change-Id: Ic00c573717e7cb8df1942ba0760726fd0961e1d6 Reviewed-on: http://gerrit.cloudera.org:8080/7951 Reviewed-by: Dan BurkertTested-by: Kudu Jenkins Reviewed-by: Todd Lipcon --- M src/kudu/master/catalog_manager.cc M src/kudu/util/cow_object.h 2 files changed, 21 insertions(+), 15 deletions(-) Approvals: Dan Burkert: Looks good to me, approved Todd Lipcon: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ic00c573717e7cb8df1942ba0760726fd0961e1d6 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2119. Fix failure in encoding-test
Adar Dembo has posted comments on this change. Change subject: KUDU-2119. Fix failure in encoding-test .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7967/1//COMMIT_MSG Commit Message: Line 16: The fix just replaces any '\0' characters in the random format string. Does this also fix http://dist-test.cloudera.org:8080/diagnose?key=a32eeefe-906c-11e7-8752-0242ac11000f? That's a slightly different failure: [ RUN ] TestEncoding.TestBinaryPrefixBlockBuilderRoundTrip I0902 22:56:26.483465 8078 test_util.cc:195] Using random seed: -1483122939 I0902 22:56:26.483530 8078 encoding-test.cc:288] Block: 00: 1000 3930 1000 ..90.. /data/jenkins-workspace/kudu-workspace/src/kudu/cfile/encoding-test.cc:295: Failure Failed Bad status: Not found: no keys in data block -- To view, visit http://gerrit.cloudera.org:8080/7967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic46d2a993235e560475d931c7b023eb5b4faf7c2 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] KUDU-2134: Defer block transaction commit to the end of a tablet copy
Adar Dembo has posted comments on this change. Change subject: KUDU-2134: Defer block transaction commit to the end of a tablet copy .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/7966/1//COMMIT_MSG Commit Message: PS1, Line 10: expires to expire http://gerrit.cloudera.org:8080/#/c/7966/1/src/kudu/tserver/tablet_copy_client.h File src/kudu/tserver/tablet_copy_client.h: Line 125: Status Commit(fs::BlockTransaction* transaction); Why not do this as part of Finish? You could make the BlockTransaction part of TabletCopyClient to avoid passing it back and forth too. -- To view, visit http://gerrit.cloudera.org:8080/7966 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4feabc08c2199aad8d08be56f09ac06924345f2b Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao HaoGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-HasComments: Yes
[kudu-CR] KUDU-2119. Fix failure in encoding-test
Alexey Serbin has posted comments on this change. Change subject: KUDU-2119. Fix failure in encoding-test .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7967/1/src/kudu/cfile/encoding-test.cc File src/kudu/cfile/encoding-test.cc: PS1, Line 277: LOG(INFO) << "format string: " << format_string; nit: is it important to have this information to be output during the test even in case of success? If not, consider replaces this LOG(INFO) and LOG(INFO) at like 289 below with signle SCOPED_TRACE(strings::Substitute("format string: $0; block: $1", format_string, HexDump(s))); -- To view, visit http://gerrit.cloudera.org:8080/7967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic46d2a993235e560475d931c7b023eb5b4faf7c2 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] tablet copy: Allow voting from crashed initial tablet copies
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7961 to look at the new patch set (#3). Change subject: tablet copy: Allow voting from crashed initial tablet copies .. tablet copy: Allow voting from crashed initial tablet copies The change in c35e4f0093dcdec625d1f647924d854c7bc9f3de missed an important case, which is when the target tablet server in a tablet copy operation crashes while in the middle of a tablet copy. In that case, the 'tombstone_last_logged_opid' of 1.0 was not persisted in the superblock. This manifested as a very flaky TabletCopyITest.TestTabletCopyNewReplicaFailureCanVote test, which would fail about 15% of the time. With the change in this patch, that test now passes reliably: http://dist-test.cloudera.org/job?job_id=mpercy.1504654266.31446 Change-Id: Idbc9cecac701c24ebe05759adbbb1fbe2b140506 --- M src/kudu/consensus/consensus-test-util.h M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/tablet-harness.h M src/kudu/tablet/tablet_bootstrap-test.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tools/kudu-tool-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/ts_tablet_manager.cc 10 files changed, 90 insertions(+), 33 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/7961/3 -- To view, visit http://gerrit.cloudera.org:8080/7961 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idbc9cecac701c24ebe05759adbbb1fbe2b140506 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR](branch-1.2.x) KUDU-2083. Decrement running maintenance ops on failed prepare
Todd Lipcon has uploaded a new change for review. http://gerrit.cloudera.org:8080/7970 Change subject: KUDU-2083. Decrement running maintenance ops on failed prepare .. KUDU-2083. Decrement running maintenance ops on failed prepare There is currently a bug where we don't decrement the number of running ops when an op->Prepare() fails. Although rare, when this bug is hit, it will decrease the number of simultaneous mm ops that can run until none can, causing the tserver to run OOM. Change-Id: I8022bcd4c6470dfef2dece0cbefede916a752291 Reviewed-on: http://gerrit.cloudera.org:8080/7610 Reviewed-by: Todd LipconTested-by: Kudu Jenkins (cherry picked from commit b365ad0bd6372d459f547da0f1cb82e36148c541) --- M src/kudu/util/maintenance_manager.cc 1 file changed, 1 insertion(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/70/7970/1 -- To view, visit http://gerrit.cloudera.org:8080/7970 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I8022bcd4c6470dfef2dece0cbefede916a752291 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-1.2.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: David Ribeiro Alves
[kudu-CR](branch-1.3.x) KUDU-2083. Decrement running maintenance ops on failed prepare
Todd Lipcon has uploaded a new change for review. http://gerrit.cloudera.org:8080/7969 Change subject: KUDU-2083. Decrement running maintenance ops on failed prepare .. KUDU-2083. Decrement running maintenance ops on failed prepare There is currently a bug where we don't decrement the number of running ops when an op->Prepare() fails. Although rare, when this bug is hit, it will decrease the number of simultaneous mm ops that can run until none can, causing the tserver to run OOM. Change-Id: I8022bcd4c6470dfef2dece0cbefede916a752291 Reviewed-on: http://gerrit.cloudera.org:8080/7610 Reviewed-by: Todd LipconTested-by: Kudu Jenkins (cherry picked from commit b365ad0bd6372d459f547da0f1cb82e36148c541) --- M src/kudu/util/maintenance_manager.cc 1 file changed, 1 insertion(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/69/7969/1 -- To view, visit http://gerrit.cloudera.org:8080/7969 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I8022bcd4c6470dfef2dece0cbefede916a752291 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-1.3.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: David Ribeiro Alves
[kudu-CR](branch-1.4.x) KUDU-2083. Decrement running maintenance ops on failed prepare
Todd Lipcon has uploaded a new change for review. http://gerrit.cloudera.org:8080/7968 Change subject: KUDU-2083. Decrement running maintenance ops on failed prepare .. KUDU-2083. Decrement running maintenance ops on failed prepare There is currently a bug where we don't decrement the number of running ops when an op->Prepare() fails. Although rare, when this bug is hit, it will decrease the number of simultaneous mm ops that can run until none can, causing the tserver to run OOM. Change-Id: I8022bcd4c6470dfef2dece0cbefede916a752291 Reviewed-on: http://gerrit.cloudera.org:8080/7610 Reviewed-by: Todd LipconTested-by: Kudu Jenkins (cherry picked from commit b365ad0bd6372d459f547da0f1cb82e36148c541) --- M src/kudu/util/maintenance_manager.cc 1 file changed, 1 insertion(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/68/7968/1 -- To view, visit http://gerrit.cloudera.org:8080/7968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I8022bcd4c6470dfef2dece0cbefede916a752291 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-1.4.x Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: David Ribeiro Alves
[kudu-CR] KUDU-2119. Fix failure in encoding-test
Hello Will Berkeley, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/7967 to review the following change. Change subject: KUDU-2119. Fix failure in encoding-test .. KUDU-2119. Fix failure in encoding-test In commit d1f53cc32 I introduced randomization for the format string used for the generated string data in this test. The random format string could sometimes incorporate '\0' bytes, which, in the worst case, could result in a string of length 0 or 1. This would then cause a later assertion to fail that was checking that the encoded data be at least two bytes per string. The fix just replaces any '\0' characters in the random format string. Change-Id: Ic46d2a993235e560475d931c7b023eb5b4faf7c2 --- M src/kudu/cfile/encoding-test.cc 1 file changed, 3 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/7967/1 -- To view, visit http://gerrit.cloudera.org:8080/7967 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ic46d2a993235e560475d931c7b023eb5b4faf7c2 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2130 (part 2): more fixes for ITClientStress
Dan Burkert has submitted this change and it was merged. Change subject: KUDU-2130 (part 2): more fixes for ITClientStress .. KUDU-2130 (part 2): more fixes for ITClientStress This fixes some more race conditions in connection termination in the same vein as part 1. It also filters benign SSLException from being returned back to callers. Change-Id: Ic3f518513931c660fd93b4272d1b1fceb268f191 Reviewed-on: http://gerrit.cloudera.org:8080/7964 Reviewed-by: Alexey SerbinTested-by: Kudu Jenkins --- M java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java M java/kudu-client/src/test/java/org/apache/kudu/client/ITClientStress.java 2 files changed, 22 insertions(+), 8 deletions(-) Approvals: Alexey Serbin: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7964 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ic3f518513931c660fd93b4272d1b1fceb268f191 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan Burkert Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2134: Defer block transaction commit to the end of a tablet copy
Hao Hao has uploaded a new change for review. http://gerrit.cloudera.org:8080/7966 Change subject: KUDU-2134: Defer block transaction commit to the end of a tablet copy .. KUDU-2134: Defer block transaction commit to the end of a tablet copy KUDU-2131 uncovered a regression that causes a tablet copy session expires reliably if the copied tablet is large and the tserver already has a high number of containers. Even though the issue is mitigated by LIFO container selection, we can completely avoid it by deferring the block transaction commit to the end of tablet copy seesion. There are mainly two reasons we may still want to do so: 1) If DownloadWALs fails there's no reason to pay the fsync() price and commit all those blocks. 2) While DownloadWALs is running the kernel has more time to eagerly flush the blocks, so the fsync()s at the end could be cheaper. This patch moves the block transaction commit out from FetchAll() and commits all downloaded blocks before replacing the superblock. Change-Id: I4feabc08c2199aad8d08be56f09ac06924345f2b --- M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/tools/tool_action_local_replica.cc M src/kudu/tserver/tablet_copy_client-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/tablet_copy_client.h M src/kudu/tserver/ts_tablet_manager.cc 6 files changed, 58 insertions(+), 23 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/66/7966/1 -- To view, visit http://gerrit.cloudera.org:8080/7966 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I4feabc08c2199aad8d08be56f09ac06924345f2b Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Hao Hao
[kudu-CR] Implement a lock table
Todd Lipcon has posted comments on this change. Change subject: Implement a lock table .. Patch Set 2: (6 comments) http://gerrit.cloudera.org:8080/#/c/7918/2/src/kudu/util/lock_table.h File src/kudu/util/lock_table.h: PS2, Line 40: T I think calling this KEY or K would be clearer (at first wasn't sure what it was). Also the docs should clarify what requirements there are of the 'T' type (eg naturally comparable with operator ==? must be hashable with std::hash? etc) PS2, Line 46: std::shared_ptrhrm, why is this shared instead of just a move-only type combined with boost::optional<> to indicate failure Line 48: auto result = slots_.insert(std::move(key)); how about using InsertIfNotPresent? and not doing the std::move here but rather down below when instantiating the token? PS2, Line 52: this->s 'this->' is redundant right? PS2, Line 69: slots_ to me the word 'slots' indicates that multiple entries may fall into the same 'slot' or something (ie that there are some fixed number of slots). Maybe just 'locks' or 'keys_' or something? Line 98: const T& key() { can be a const method -- To view, visit http://gerrit.cloudera.org:8080/7918 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0404478675e079ba147fbdb4b0bac9db0cdc87e5 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-2130 (part 2): more fixes for ITClientStress
Alexey Serbin has posted comments on this change. Change subject: KUDU-2130 (part 2): more fixes for ITClientStress .. Patch Set 2: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/7964/1/java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java File java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java: Line 354: lock.lock(); > Yep, found the following in an ITClientStress log: Aha, I see. Thanks! PS1, Line 427: error = new RecoverableExceptio > Done Thank you for adding the explanatory comment. -- To view, visit http://gerrit.cloudera.org:8080/7964 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic3f518513931c660fd93b4272d1b1fceb268f191 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1807: improve IsCreateInProgress performance
Todd Lipcon has posted comments on this change. Change subject: KUDU-1807: improve IsCreateInProgress performance .. Patch Set 2: (11 comments) I only partially followed this patch... it's one of those ones that's hard to verify by looking at it visually. Is there any way we can add a debug-mode consistency check of some sort that verifies that the map of counts maps the calculated iteration? even if it's correct right now it seems like it may regress and be really hard to debug at some point later. (iirc the Java client just loops forever trying to fetch locations if it thinks the table is in-progress) http://gerrit.cloudera.org:8080/#/c/7957/2/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: Line 637: // Then, commit the mutations. is there some race window here where a caller ends up asking for GetTableLocations and it says that create is not in-progress, but in fact some of the tablets still haven't gotten to the Commit() step below, and hence are not returned? PS2, Line 663: set_state perhaps this should be named swap_state to indicate its return value better Line 665: state_changes_[tablet->table()][old_state]--; can we DCHECK here that this never goes negative? Line 672: const PersistentTabletInfo& data( nit: is this wrapping necessary? Line 681: PersistentTabletInfo* mutable_data( nit: same (wrapping)? PS2, Line 3697: unlocker_in maybe 'committer' is a better name (same below) PS2, Line 4468: SysTabletsEntryPB::RUNNING don't we have to worry about REPLACED and DELETED? or do those actually get entirely removed? PS2, Line 4591: DCHECK_GT(it->second, 0); can you DCHECK_GE() vs 'delta'? http://gerrit.cloudera.org:8080/#/c/7957/2/src/kudu/master/catalog_manager.h File src/kudu/master/catalog_manager.h: PS2, Line 123: get_state unusual naming for a getter PS2, Line 244: TabletStateMap how about TabletStateCountMap? otherwise I would expect this type to be keyed by tablet id like the TabletInfo one. PS2, Line 318: void IncrementTabletStateCountUnlocked(SysTabletsEntryPB::State state, : int64_t delta = 1); : void DecrementTabletStateCountUnlocked(SysTabletsEntryPB::State state, : int64_t delta = 1); if you are taking counts here, it seems a little odd to have the two separate methods instead of just having one, and allowing a negative 'delta' -- To view, visit http://gerrit.cloudera.org:8080/7957 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idca379bb1249e54c0b7fbb0df408ed829ab9fbd2 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] cow object: lazily copy dirty state
Todd Lipcon has posted comments on this change. Change subject: cow_object: lazily copy dirty state .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic00c573717e7cb8df1942ba0760726fd0961e1d6 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] catalog manager: improve IsAlterInProgress performance
Todd Lipcon has posted comments on this change. Change subject: catalog_manager: improve IsAlterInProgress performance .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7950 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id8c1f48c0febad038833edd555ee88f1db83249d Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-2130 (part 2): more fixes for ITClientStress
Dan Burkert has posted comments on this change. Change subject: KUDU-2130 (part 2): more fixes for ITClientStress .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/7964/1/java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java File java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java: Line 216: try { > Would it make sense to add the short-circuiting I'm not sure, but it couldn't hurt, so I'll add it. Line 354: return; > I'm curious -- did it happen in the wild that the message arrived in here b Yep, found the following in an ITClientStress log: 21:47:36.158 [ERROR - New I/O worker #1119] (Connection.java:423) [peer master-127.23.143.1:64034] unexpected exception from downstream on [id: 0xb7be1b31, /127.0.0.1:55649 :> /127.23.143.1:64034] java.lang.IllegalStateException at com.google.common.base.Preconditions.checkState(Preconditions.java:429) at org.apache.kudu.client.Connection.messageReceived(Connection.java:350) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.apache.kudu.client.Connection.handleUpstream(Connection.java:236) PS1, Line 427: assert !explicitlyDisconnected; > I'm not sure it's good to have this assertion here -- this case covers not Done -- To view, visit http://gerrit.cloudera.org:8080/7964 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic3f518513931c660fd93b4272d1b1fceb268f191 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-2130 (part 2): more fixes for ITClientStress
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7964 to look at the new patch set (#2). Change subject: KUDU-2130 (part 2): more fixes for ITClientStress .. KUDU-2130 (part 2): more fixes for ITClientStress This fixes some more race conditions in connection termination in the same vein as part 1. It also filters benign SSLException from being returned back to callers. Change-Id: Ic3f518513931c660fd93b4272d1b1fceb268f191 --- M java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java M java/kudu-client/src/test/java/org/apache/kudu/client/ITClientStress.java 2 files changed, 22 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/64/7964/2 -- To view, visit http://gerrit.cloudera.org:8080/7964 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic3f518513931c660fd93b4272d1b1fceb268f191 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] tablet copy: Allow voting from crashed initial tablet copies
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7961 to look at the new patch set (#2). Change subject: tablet copy: Allow voting from crashed initial tablet copies .. tablet copy: Allow voting from crashed initial tablet copies The change in c35e4f0093dcdec625d1f647924d854c7bc9f3de missed an important case, which is when the target tablet server in a tablet copy operation crashes while in the middle of a tablet copy. In that case, the 'tombstone_last_logged_opid' of 1.0 was not persisted in the superblock. This manifested as a very flaky TabletCopyITest.TestTabletCopyNewReplicaFailureCanVote test, which would fail about 15% of the time. With the change in this patch, that test now passes reliably: http://dist-test.cloudera.org/job?job_id=mpercy.1504654266.31446 Change-Id: Idbc9cecac701c24ebe05759adbbb1fbe2b140506 --- M src/kudu/consensus/consensus-test-util.h M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/tablet-harness.h M src/kudu/tablet/tablet_bootstrap-test.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tools/kudu-tool-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/ts_tablet_manager.cc 10 files changed, 87 insertions(+), 31 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/7961/2 -- To view, visit http://gerrit.cloudera.org:8080/7961 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idbc9cecac701c24ebe05759adbbb1fbe2b140506 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2130 (part 2): more fixes for ITClientStress
Alexey Serbin has posted comments on this change. Change subject: KUDU-2130 (part 2): more fixes for ITClientStress .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/7964/1/java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java File java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java: Line 216: try { Would it make sense to add the short-circuiting if (state == State.TERMINATED) { return; } check here as well? Or we have some guarantees from Netty that it would not be the case? Line 354: return; I'm curious -- did it happen in the wild that the message arrived in here but the connection has been terminated already? PS1, Line 427: assert !explicitlyDisconnected; I'm not sure it's good to have this assertion here -- this case covers not only SSLException case, as I can see. Could you add a comment with reasoning behind adding this assertion if you think it's worth adding this assertion? -- To view, visit http://gerrit.cloudera.org:8080/7964 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic3f518513931c660fd93b4272d1b1fceb268f191 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-2130 (part 2): more fixes for ITClientStress
Hello Adar Dembo, Todd Lipcon, Alexey Serbin, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/7964 to review the following change. Change subject: KUDU-2130 (part 2): more fixes for ITClientStress .. KUDU-2130 (part 2): more fixes for ITClientStress This fixes some more race conditions in connection termination in the same vein as part 1. It also filters benign SSLException from being returned back to callers. Change-Id: Ic3f518513931c660fd93b4272d1b1fceb268f191 --- M java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java M java/kudu-client/src/test/java/org/apache/kudu/client/ITClientStress.java 2 files changed, 17 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/64/7964/1 -- To view, visit http://gerrit.cloudera.org:8080/7964 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ic3f518513931c660fd93b4272d1b1fceb268f191 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2130: java client: handle termination during negotiation edge case
Alexey Serbin has submitted this change and it was merged. Change subject: KUDU-2130: java client: handle termination during negotiation edge case .. KUDU-2130: java client: handle termination during negotiation edge case There was an edge case where a Connection can be terminated while negotiation is completing. This would result in a scary looking log message: 18:24:07.776 [DEBUG - New I/O worker #8112] (Connection.java:649) [peer master-127.32.133.1:64032] cleaning up while in state NEGOTIATING due to: connection disconnected 18:24:07.781 [ERROR - New I/O worker #8112] (Connection.java:418) [peer master-127.32.133.1:64032] unexpected exception from downstream on [id: 0xdd52bacc, /127.0.0.1:55318 :> /127.32.133.1:64032] java.lang.NullPointerException at org.apache.kudu.client.Connection.messageReceived(Connection.java:271) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.apache.kudu.client.Connection.handleUpstream(Connection.java:236) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) but in reality the error message is harmless; it just indicates that the connection has been terminated while the connection's messageReceived handler is clearing the message queue. This interruption is possible because of 82a8e9f99, which fixed a deadlock in Connection. This commit recognizes when this race has occured, and early exits from messageReceived. Change-Id: I3e9d4a6535ae82973743e4ac1071de0aac92b08b Reviewed-on: http://gerrit.cloudera.org:8080/7960 Reviewed-by: Alexey SerbinTested-by: Kudu Jenkins --- M java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java 1 file changed, 9 insertions(+), 4 deletions(-) Approvals: Alexey Serbin: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I3e9d4a6535ae82973743e4ac1071de0aac92b08b Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan Burkert Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] tablet copy: Allow voting from crashed initial tablet copies
Mike Percy has posted comments on this change. Change subject: tablet copy: Allow voting from crashed initial tablet copies .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7961/1//COMMIT_MSG Commit Message: PS1, Line 9: The change in c35e4f0093dcdec625d1f647924d854c7bc9f3de missed an : important case, which is when the target tablet server in a tablet copy : operation crashes while in the middle of a tablet copy. In that case, : the 'tombstone_last_logged_opid' of 1.0 was not persisted in the : superblock. This manifested as a very flaky : TabletCopyITest.TestTabletCopyNewReplicaFailureCanVote test, which would : fail about 15% of the time. > can you clarify what effect this has outside of tests? ie is the change you Right - the initial implementation was incomplete, so this is not a regression. It's an incremental improvement, so it is really a "nice to have" and not what we would typically consider a release blocker. -- To view, visit http://gerrit.cloudera.org:8080/7961 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idbc9cecac701c24ebe05759adbbb1fbe2b140506 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-2130: java client: handle termination during negotiation edge case
Alexey Serbin has posted comments on this change. Change subject: KUDU-2130: java client: handle termination during negotiation edge case .. Patch Set 3: Code-Review+2 Results prior to applying this patch: http://dist-test.cloudera.org//job?job_id=aserbin.1504646631.14226 Results after applying PS3 of this patch: http://dist-test.cloudera.org//job?job_id=aserbin.1504647755.7179 The results show the issue has been fixed. The failures in the after-patch runs are attributed to SSLException in the log, not NullPointerException. We can fix the SSLException issue separately. -- To view, visit http://gerrit.cloudera.org:8080/7960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3e9d4a6535ae82973743e4ac1071de0aac92b08b Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] tablet copy: Allow voting from crashed initial tablet copies
Todd Lipcon has posted comments on this change. Change subject: tablet copy: Allow voting from crashed initial tablet copies .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7961/1//COMMIT_MSG Commit Message: PS1, Line 9: The change in c35e4f0093dcdec625d1f647924d854c7bc9f3de missed an : important case, which is when the target tablet server in a tablet copy : operation crashes while in the middle of a tablet copy. In that case, : the 'tombstone_last_logged_opid' of 1.0 was not persisted in the : superblock. This manifested as a very flaky : TabletCopyITest.TestTabletCopyNewReplicaFailureCanVote test, which would : fail about 15% of the time. can you clarify what effect this has outside of tests? ie is the change you referenced just incomplete? (ie it was still an improvement but didn't improve all the cases it could have) or was it actually buggy such that it was a regression? put another way, should this be a blocker for the 1.5 release, or is this an incremental improvement on top of what's there? -- To view, visit http://gerrit.cloudera.org:8080/7961 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idbc9cecac701c24ebe05759adbbb1fbe2b140506 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] tablet copy: Allow voting from crashed initial tablet copies
Hello Todd Lipcon, Alexey Serbin, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/7961 to review the following change. Change subject: tablet copy: Allow voting from crashed initial tablet copies .. tablet copy: Allow voting from crashed initial tablet copies The change in c35e4f0093dcdec625d1f647924d854c7bc9f3de missed an important case, which is when the target tablet server in a tablet copy operation crashes while in the middle of a tablet copy. In that case, the 'tombstone_last_logged_opid' of 1.0 was not persisted in the superblock. This manifested as a very flaky TabletCopyITest.TestTabletCopyNewReplicaFailureCanVote test, which would fail about 15% of the time. With the change in this patch, that test now passes reliably: http://dist-test.cloudera.org/job?job_id=mpercy.1504646763.18140 Change-Id: Idbc9cecac701c24ebe05759adbbb1fbe2b140506 --- M src/kudu/consensus/consensus-test-util.h M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/master/sys_catalog.cc M src/kudu/tablet/tablet-harness.h M src/kudu/tablet/tablet_bootstrap-test.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tools/kudu-tool-test.cc M src/kudu/tserver/tablet_copy_client.cc M src/kudu/tserver/ts_tablet_manager.cc 10 files changed, 58 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/7961/1 -- To view, visit http://gerrit.cloudera.org:8080/7961 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Idbc9cecac701c24ebe05759adbbb1fbe2b140506 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike PercyGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2130: java client: handle termination during negotiation edge case
Hello Alexey Serbin, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7960 to look at the new patch set (#3). Change subject: KUDU-2130: java client: handle termination during negotiation edge case .. KUDU-2130: java client: handle termination during negotiation edge case There was an edge case where a Connection can be terminated while negotiation is completing. This would result in a scary looking log message: 18:24:07.776 [DEBUG - New I/O worker #8112] (Connection.java:649) [peer master-127.32.133.1:64032] cleaning up while in state NEGOTIATING due to: connection disconnected 18:24:07.781 [ERROR - New I/O worker #8112] (Connection.java:418) [peer master-127.32.133.1:64032] unexpected exception from downstream on [id: 0xdd52bacc, /127.0.0.1:55318 :> /127.32.133.1:64032] java.lang.NullPointerException at org.apache.kudu.client.Connection.messageReceived(Connection.java:271) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.apache.kudu.client.Connection.handleUpstream(Connection.java:236) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) but in reality the error message is harmless; it just indicates that the connection has been terminated while the connection's messageReceived handler is clearing the message queue. This interruption is possible because of 82a8e9f99, which fixed a deadlock in Connection. This commit recognizes when this race has occured, and early exits from messageReceived. Change-Id: I3e9d4a6535ae82973743e4ac1071de0aac92b08b --- M java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java 1 file changed, 9 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/7960/3 -- To view, visit http://gerrit.cloudera.org:8080/7960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3e9d4a6535ae82973743e4ac1071de0aac92b08b Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2130: java client: handle termination during negotiation edge case
Alexey Serbin has posted comments on this change. Change subject: KUDU-2130: java client: handle termination during negotiation edge case .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3e9d4a6535ae82973743e4ac1071de0aac92b08b Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-2130: java client: handle termination during negotiation edge case
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7960 to look at the new patch set (#2). Change subject: KUDU-2130: java client: handle termination during negotiation edge case .. KUDU-2130: java client: handle termination during negotiation edge case There was an edge case where a Connection can be terminated while negotiation is completing. This would result in a scary looking log message: 18:24:07.776 [DEBUG - New I/O worker #8112] (Connection.java:649) [peer master-127.32.133.1:64032] cleaning up while in state NEGOTIATING due to: connection disconnected 18:24:07.781 [ERROR - New I/O worker #8112] (Connection.java:418) [peer master-127.32.133.1:64032] unexpected exception from downstream on [id: 0xdd52bacc, /127.0.0.1:55318 :> /127.32.133.1:64032] java.lang.NullPointerException at org.apache.kudu.client.Connection.messageReceived(Connection.java:271) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.apache.kudu.client.Connection.handleUpstream(Connection.java:236) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) but in reality the error message is harmless; it just indicates that the connection has been terminated while the connection's messageReceived handler is clearing the message queue. This interruption is possible because of 82a8e9f99, which fixed a deadlock in Connection. This commit recognizes when this race has occured, and early exits from messageReceived. Change-Id: I3e9d4a6535ae82973743e4ac1071de0aac92b08b --- M java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java 1 file changed, 9 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/7960/2 -- To view, visit http://gerrit.cloudera.org:8080/7960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3e9d4a6535ae82973743e4ac1071de0aac92b08b Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2130: java client: handle termination during negotiation edge case
Dan Burkert has posted comments on this change. Change subject: KUDU-2130: java client: handle termination during negotiation edge case .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/7960/1/java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java File java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java: Line 264 > If you believe this is not the case, then the next line of Preconditions.ch I've moved these to after the while loop. Line 296 > I would update it to be: I removed this because it was statically true; it comes after a while loop with the negation of this condition. PS1, Line 270: tate != State.TERMINATED > And adding an extra check Done -- To view, visit http://gerrit.cloudera.org:8080/7960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3e9d4a6535ae82973743e4ac1071de0aac92b08b Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] catalog manager: improve IsAlterInProgress performance
Dan Burkert has posted comments on this change. Change subject: catalog_manager: improve IsAlterInProgress performance .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7950 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id8c1f48c0febad038833edd555ee88f1db83249d Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] cow object: lazily copy dirty state
Dan Burkert has posted comments on this change. Change subject: cow_object: lazily copy dirty state .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic00c573717e7cb8df1942ba0760726fd0961e1d6 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1755 Part 2: Improve tablet on-disk size metric
Mike Percy has posted comments on this change. Change subject: KUDU-1755 Part 2: Improve tablet on-disk size metric .. Patch Set 13: (2 comments) http://gerrit.cloudera.org:8080/#/c/6968/10/src/kudu/tablet/diskrowset.cc File src/kudu/tablet/diskrowset.cc: Line 690: Status DiskRowSet::CountRows(rowid_t *count) const { > 1. I think you are right, but the pattern in the class is to use component_ On #1, I'm pretty certain that we need to acquire the lock to get a ref to base_data_, but what I was wondering was whether we need to hold component_lock_ while calling base_data_->OnDiskSize(). #2 is not done in rev 13, as far as I can tell. http://gerrit.cloudera.org:8080/#/c/6968/10/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: PS10, Line 645: : Status TabletRepli > Adar had misgivings about this form. The comment was in PS10 but I'll repro > TabletReplica::Shutdown() resets consensus_ with lock_ held. How old was that comment? That is no longer the case. -- To view, visit http://gerrit.cloudera.org:8080/6968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] cow object: lazily copy dirty state
Adar Dembo has posted comments on this change. Change subject: cow_object: lazily copy dirty state .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/7951/1/src/kudu/util/cow_object.h File src/kudu/util/cow_object.h: Line 73: CHECK(dirty_state_); > maybe not worth removing, but this check is now redundant. Done PS1, Line 97: DCHECK_NOTNULL > likewise Done PS1, Line 104: DCHECK_NOTNULL > likewise Done -- To view, visit http://gerrit.cloudera.org:8080/7951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic00c573717e7cb8df1942ba0760726fd0961e1d6 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1807: improve IsCreateInProgress performance
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7957 to look at the new patch set (#2). Change subject: KUDU-1807: improve IsCreateInProgress performance .. KUDU-1807: improve IsCreateInProgress performance For some inexplicable reason, IsCreateInProgress is invoked when servicing a GetTableSchema RPC. While it's attractive to change that, it's also untenable without affecting backwards compatibility. So instead, here's a patch that adds in-memory caching to the master so that IsCreateInProgress needn't acquire N tablet locks. Just like schema_version_counts_, the cached state is a map of all tablet states to the number of tablets currently in that state. For this particular problem a single count of "number of creating tablets" might suffice, but I liked the consistency with schema_version_counts_ and I found it easier to handle the various corner cases when accounting for all states. Because the map contents reflect persistent state, the locking semantics differ somewhat from schema_version_counts_: 1. Tablet state changes are usually wrapped in a tablet metadata lock, so care must be taken to only update the state if the lock commits. 2. Further, the count must be updated with the tablet metadata lock held. Why? So that count updates are serialized in the same order as state changes themselves. Consider an example where two threads are trying to update a tablet's state (currently X) to Y and Z respectively. The count of state X begins at 1 (C_X=1) and the counts of states Y and Z are 0 (C_Y=C_Z=0). There's no defined order so whichever state change happens second "wins". However, if the count updates were allowed outside the metadata locks, it would be possible for the order of operations to be X->Y, Y->Z, (C_Y-- C_Z++), (C_X-- C_Y++), causing C_Y to momentarily go negative. If count updates are restricted to inside the metadata locks, a state change order of X->Y, Y->Z forces the count update order to be (C_X-- C_Y++), (C_Y--, C_Z++). To accommodate these semantics, the ScopedTabletInfoCommitter was modified to support tracking state changes. There are two notable (and confusing) exceptions which are documented in the code. Change-Id: Idca379bb1249e54c0b7fbb0df408ed829ab9fbd2 --- M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h 2 files changed, 225 insertions(+), 68 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/57/7957/2 -- To view, visit http://gerrit.cloudera.org:8080/7957 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idca379bb1249e54c0b7fbb0df408ed829ab9fbd2 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] catalog manager: improve IsAlterInProgress performance
Adar Dembo has posted comments on this change. Change subject: catalog_manager: improve IsAlterInProgress performance .. Patch Set 2: (3 comments) > What's your thoughts on test coverage, are you assuming we have > enough coverage of this already? My take was that since this patch doesn't change any externally-visible semantics, existing test coverage should suffice. That's also why I looped alter_table-randomized-test. That said, I do think our stress testing of AlterTable is somewhat lacking. There's some coverage in master-stress-test, but that's about it. If you can suggest an interesting concurrent test or two, I would consider adding it. > Only thing I might like to see is > more debug-enabled 'sanity checks' that the versions map is > consistent in certain scenarios. E.g. maybe add a check that if a > tablet is dropped and the # of tablets is 0, the versions map is > empty. I added the one you suggested. If you have others, let me know. http://gerrit.cloudera.org:8080/#/c/7950/2/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: Line 4381: if (it->first == TabletInfo::NOT_YET_REPORTED) { > Can't this check be skipped? Done Line 4389: return it->first < version; > I recommend casting version to an int64_t explicitly. It's ambiguous which Done http://gerrit.cloudera.org:8080/#/c/7950/2/src/kudu/master/catalog_manager.h File src/kudu/master/catalog_manager.h: Line 161: // Simple accessor for reported_schema_version_. > Should this doc line be moved to be aove reported_schema_version()? I was doing the stylistic thing where if an enum/typdef only applies to one method, I like to lump it in with the comment block for that method. But since it's proven to be confusing here, I'll separate them. -- To view, visit http://gerrit.cloudera.org:8080/7950 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id8c1f48c0febad038833edd555ee88f1db83249d Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] cow object: lazily copy dirty state
Hello Dan Burkert, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7951 to look at the new patch set (#2). Change subject: cow_object: lazily copy dirty state .. cow_object: lazily copy dirty state This addresses a long-standing TODO that ought to improve performance of tablet reports in the steady-state: within the COWed objects, defer the creation of the "dirty" copy until it's actually needed (i.e. the first call to mutable_dirty()). The deferral introduces new branches in dirty() and mutable_dirty(), but given the paucity of those calls for any given lock instance, I think this is a reasonable trade-off. Change-Id: Ic00c573717e7cb8df1942ba0760726fd0961e1d6 --- M src/kudu/master/catalog_manager.cc M src/kudu/util/cow_object.h 2 files changed, 21 insertions(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/7951/2 -- To view, visit http://gerrit.cloudera.org:8080/7951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic00c573717e7cb8df1942ba0760726fd0961e1d6 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] catalog manager: improve IsAlterInProgress performance
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7950 to look at the new patch set (#3). Change subject: catalog_manager: improve IsAlterInProgress performance .. catalog_manager: improve IsAlterInProgress performance To mentally prepare for KUDU-1807, I decided to also tackle the O(n) behavior of IsAlterInProgress. It's not nearly as bad as IsCreateInProgress (since we aren't taking cow locks on each tablet) but is still not great since IsAlterInProgress is called repeatedly by clients. The solution involves a new map in TableInfo to track the schema versions of all tablets. Keeping the map up-to-date requires additional work when adding/removing tablets and when tablet reports include a schema change, but these are infrequent operations relative to IsAlterInProgress so I figured the trade off is worth it. I'm not particularly happy about the lock acquisition in set_reported_schema_version() but couldn't see a good way to avoid it. I looped 1000 runs of alter_table-randomized-test. The ~5 failures I saw were due to KUDU-1539, which I don't think this patch makes us any more or less vulnerable to. Change-Id: Id8c1f48c0febad038833edd555ee88f1db83249d --- M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h 2 files changed, 105 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/50/7950/3 -- To view, visit http://gerrit.cloudera.org:8080/7950 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id8c1f48c0febad038833edd555ee88f1db83249d Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2130: java client: handle termination during negotiation edge case
Alexey Serbin has posted comments on this change. Change subject: KUDU-2130: java client: handle termination during negotiation edge case .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7960/1/java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java File java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java: PS1, Line 270: queuedMessages.isEmpty() > i.e., consider changing to And adding an extra check if (state == State.TERMINATED) { return; } just after the while() loop -- To view, visit http://gerrit.cloudera.org:8080/7960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3e9d4a6535ae82973743e4ac1071de0aac92b08b Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-2130: java client: handle termination during negotiation edge case
Alexey Serbin has posted comments on this change. Change subject: KUDU-2130: java client: handle termination during negotiation edge case .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/7960/1/java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java File java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java: Line 296 I would update it to be: assert queuedMessages == null || queuedMessages.isEmpty(); PS1, Line 270: queuedMessages.isEmpty() > I think NPE could still happen here. It would be necessary to check for st i.e., consider changing to while (state != State.TERMINATED && !queuedMessages.isEmpty()) { ... } -- To view, visit http://gerrit.cloudera.org:8080/7960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3e9d4a6535ae82973743e4ac1071de0aac92b08b Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-2130: java client: handle termination during negotiation edge case
Alexey Serbin has posted comments on this change. Change subject: KUDU-2130: java client: handle termination during negotiation edge case .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/7960/1/java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java File java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java: Line 264 If you believe this is not the case, then the next line of Preconditions.checkState(inflightMessages.isEmpty()); should be updated to be Preconditions.checkState(inflightMessages == null || inflightMessages.isEmpty()) PS1, Line 270: queuedMessages.isEmpty() I think NPE could still happen here. It would be necessary to check for state = State.TERMINATED first. -- To view, visit http://gerrit.cloudera.org:8080/7960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3e9d4a6535ae82973743e4ac1071de0aac92b08b Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-2130: java client: handle termination during negotiation edge case
Hello Adar Dembo, Todd Lipcon, Alexey Serbin, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/7960 to review the following change. Change subject: KUDU-2130: java client: handle termination during negotiation edge case .. KUDU-2130: java client: handle termination during negotiation edge case There was an edge case where a Connection can be terminated while negotiation is completing. This would result in a scary looking log message: 18:24:07.781 [ERROR - New I/O worker #8112] (Connection.java:418) [peer master-127.32.133.1:64032] unexpected exception from downstream on [id: 0xdd52bacc, /127.0.0.1:55318 :> /127.32.133.1:64032] java.lang.NullPointerException at org.apache.kudu.client.Connection.messageReceived(Connection.java:271) at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70) at org.apache.kudu.client.Connection.handleUpstream(Connection.java:236) at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564) at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791) but in reality the error message is harmless; it just indicates that the connection has been terminated while the connection's messageReceived handler is clearing the message queue. This interruption is possible because of 82a8e9f99, which fixed a deadlock in Connection. This commit recognizes when this race has occured, and early exits from messageReceived. Change-Id: I3e9d4a6535ae82973743e4ac1071de0aac92b08b --- M java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java 1 file changed, 6 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/7960/1 -- To view, visit http://gerrit.cloudera.org:8080/7960 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I3e9d4a6535ae82973743e4ac1071de0aac92b08b Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [tidy] updated the 'tidy' target
Alexey Serbin has posted comments on this change. Change subject: [tidy] updated the 'tidy' target .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7947/2/build-support/clang_tidy_gerrit.py File build-support/clang_tidy_gerrit.py: Line 169: print >>sys.stderr, "--rev-range works only with --no-gerrit" > Because I wanted that to work also for non-committed changes. Specifying ' oh, and it seems the description for the '--rev-range' is outdated -- I didn't reflect the fact it picks up local changes. -- To view, visit http://gerrit.cloudera.org:8080/7947 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2f8db9076d5113ac44fb8200d00965f120126ac4 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [tidy] updated the 'tidy' target
Todd Lipcon has posted comments on this change. Change subject: [tidy] updated the 'tidy' target .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7947/2/build-support/clang_tidy_gerrit.py File build-support/clang_tidy_gerrit.py: Line 169: print >>sys.stderr, "--rev-range works only with --no-gerrit" I'm a little late to the party, but why not just look for ".." in the revision string, and if so, use 'git diff', otherwise use 'git show'? -- To view, visit http://gerrit.cloudera.org:8080/7947 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2f8db9076d5113ac44fb8200d00965f120126ac4 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] kudu-tool-test: find path to false at runtime
Dan Burkert has submitted this change and it was merged. Change subject: kudu-tool-test: find path to false at runtime .. kudu-tool-test: find path to false at runtime On macos the false binary is in /usr/bin/false. Change-Id: I2e2cd346b48c61cfe27fc77ee3cabb681be94ab2 Reviewed-on: http://gerrit.cloudera.org:8080/7959 Reviewed-by: Todd LipconTested-by: Kudu Jenkins --- M src/kudu/tools/kudu-tool-test.cc 1 file changed, 3 insertions(+), 1 deletion(-) Approvals: Todd Lipcon: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7959 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I2e2cd346b48c61cfe27fc77ee3cabb681be94ab2 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan Burkert Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] kudu-tool-test: find path to false at runtime
Todd Lipcon has posted comments on this change. Change subject: kudu-tool-test: find path to false at runtime .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7959 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2e2cd346b48c61cfe27fc77ee3cabb681be94ab2 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] kudu-tool-test: find path to false at runtime
Hello Todd Lipcon, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/7959 to review the following change. Change subject: kudu-tool-test: find path to false at runtime .. kudu-tool-test: find path to false at runtime On macos the false binary is in /usr/bin/false. Change-Id: I2e2cd346b48c61cfe27fc77ee3cabb681be94ab2 --- M src/kudu/tools/kudu-tool-test.cc 1 file changed, 3 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/7959/1 -- To view, visit http://gerrit.cloudera.org:8080/7959 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I2e2cd346b48c61cfe27fc77ee3cabb681be94ab2 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Todd Lipcon
[kudu-CR] rpc: allow setting --rpc tls min protocol on older RHEL versions
Todd Lipcon has posted comments on this change. Change subject: rpc: allow setting --rpc_tls_min_protocol on older RHEL versions .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/7821/2/src/kudu/security/tls_context.cc File src/kudu/security/tls_context.cc: PS2, Line 71: 0x1000U > Here is a fun thing I discovered: That's an interesting discovery, Henry. Let's try to avoid diverting the code between Impala and Kudu here, so if you go with runtime detection we should do the same. -- To view, visit http://gerrit.cloudera.org:8080/7821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic61f31788d63072fae609c6a2186e52d5e2467b7 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan BurkertGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Henry Robinson Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1755 Part 2: Improve tablet on-disk size metric
Hello Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6968 to look at the new patch set (#13). Change subject: KUDU-1755 Part 2: Improve tablet on-disk size metric .. KUDU-1755 Part 2: Improve tablet on-disk size metric This adds WAL segment and cmeta sizes to the tablet on-disk metric. It also fixes the tablet superblock on disk size, which was being underestimated because the size was measured as the PB size instead of counting the container file the PB is stored in. It also exposes a second metric, Tablet Data Size On Disk, which measures just the size of data blocks and does not include metadata or WAL segments. Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315 --- M src/kudu/consensus/consensus_meta-test.cc M src/kudu/consensus/consensus_meta.cc M src/kudu/consensus/consensus_meta.h M src/kudu/consensus/log-test.cc M src/kudu/consensus/log.cc M src/kudu/consensus/log.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/tablet/diskrowset-test.cc M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/memrowset.h M src/kudu/tablet/mock-rowsets.h M src/kudu/tablet/rowset.cc M src/kudu/tablet/rowset.h M src/kudu/tablet/tablet-test.cc M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h M src/kudu/tablet/tablet_metadata-test.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h 24 files changed, 261 insertions(+), 57 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/68/6968/13 -- To view, visit http://gerrit.cloudera.org:8080/6968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315 Gerrit-PatchSet: 13 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] cow object: lazily copy dirty state
Dan Burkert has posted comments on this change. Change subject: cow_object: lazily copy dirty state .. Patch Set 1: Code-Review+2 (3 comments) http://gerrit.cloudera.org:8080/#/c/7951/1/src/kudu/util/cow_object.h File src/kudu/util/cow_object.h: Line 73: CHECK(dirty_state_); maybe not worth removing, but this check is now redundant. PS1, Line 97: DCHECK_NOTNULL likewise PS1, Line 104: DCHECK_NOTNULL likewise -- To view, visit http://gerrit.cloudera.org:8080/7951 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic00c573717e7cb8df1942ba0760726fd0961e1d6 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] catalog manager: improve IsAlterInProgress performance
Dan Burkert has posted comments on this change. Change subject: catalog_manager: improve IsAlterInProgress performance .. Patch Set 2: (3 comments) What's your thoughts on test coverage, are you assuming we have enough coverage of this already? Only thing I might like to see is more debug-enabled 'sanity checks' that the versions map is consistent in certain scenarios. E.g. maybe add a check that if a tablet is dropped and the # of tablets is 0, the versions map is empty. http://gerrit.cloudera.org:8080/#/c/7950/2/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: Line 4381: if (it->first == TabletInfo::NOT_YET_REPORTED) { Can't this check be skipped? Line 4389: return it->first < version; I recommend casting version to an int64_t explicitly. It's ambiguous which one is getting cast right now, and I think it's brittle to have it cast to uint32_t, even though 'it->first' should be in range given the check above. http://gerrit.cloudera.org:8080/#/c/7950/2/src/kudu/master/catalog_manager.h File src/kudu/master/catalog_manager.h: Line 161: // Simple accessor for reported_schema_version_. Should this doc line be moved to be aove reported_schema_version()? -- To view, visit http://gerrit.cloudera.org:8080/7950 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id8c1f48c0febad038833edd555ee88f1db83249d Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] feat: add task interruption checking for RowIterator
Dan Burkert has posted comments on this change. Change subject: feat: add task interruption checking for RowIterator .. Patch Set 5: Yep, looks like it was just a flaky test. Thanks caiconghui! -- To view, visit http://gerrit.cloudera.org:8080/7753 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0b4284f2c0a40cd7ba8cf2b76e0403592552c814 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: caiconghuiGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: caiconghui Gerrit-HasComments: No
[kudu-CR] feat: add task interruption checking for RowIterator
Dan Burkert has submitted this change and it was merged. Change subject: feat: add task interruption checking for RowIterator .. feat: add task interruption checking for RowIterator Once we the submit a spark job to spark cluster, we cannot cancel or stop the kudu-spark job immediately before checking task interruption, we can kill the job immediately by clicking the "kill" link on the spark Web UI. Change-Id: I0b4284f2c0a40cd7ba8cf2b76e0403592552c814 Reviewed-on: http://gerrit.cloudera.org:8080/7753 Tested-by: Kudu Jenkins Reviewed-by: Dan Burkert--- M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala 1 file changed, 3 insertions(+), 0 deletions(-) Approvals: Dan Burkert: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/7753 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I0b4284f2c0a40cd7ba8cf2b76e0403592552c814 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: caiconghui Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: caiconghui
[kudu-CR] feat: add task interruption checking for RowIterator
Dan Burkert has posted comments on this change. Change subject: feat: add task interruption checking for RowIterator .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7753 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I0b4284f2c0a40cd7ba8cf2b76e0403592552c814 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: caiconghuiGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: caiconghui Gerrit-HasComments: No
[kudu-CR] KUDU-1755 Part 2: Improve tablet on-disk size metric
Hello Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6968 to look at the new patch set (#12). Change subject: KUDU-1755 Part 2: Improve tablet on-disk size metric .. KUDU-1755 Part 2: Improve tablet on-disk size metric This adds WAL segment and cmeta sizes to the tablet on-disk metric. It also fixes the tablet superblock on disk size, which was being underestimated because the size was measured as the PB size instead of counting the container file the PB is stored in. It also exposes a second metric, Tablet Data Size On Disk, which measures just the size of data blocks and does not include metadata or WAL segments. Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315 --- M src/kudu/consensus/consensus_meta-test.cc M src/kudu/consensus/consensus_meta.cc M src/kudu/consensus/consensus_meta.h M src/kudu/consensus/log-test.cc M src/kudu/consensus/log.cc M src/kudu/consensus/log.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/tablet/diskrowset-test.cc M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/memrowset.h M src/kudu/tablet/mock-rowsets.h M src/kudu/tablet/rowset.cc M src/kudu/tablet/rowset.h M src/kudu/tablet/tablet-test.cc M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h M src/kudu/tablet/tablet_metadata-test.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h 24 files changed, 259 insertions(+), 57 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/68/6968/12 -- To view, visit http://gerrit.cloudera.org:8080/6968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315 Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-1755 Part 2: Improve tablet on-disk size metric
Will Berkeley has posted comments on this change. Change subject: KUDU-1755 Part 2: Improve tablet on-disk size metric .. Patch Set 11: Odd...I compiled and ran some tests just before pushing this to review. And TSAN worked. -- To view, visit http://gerrit.cloudera.org:8080/6968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315 Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1755 Part 2: Improve tablet on-disk size metric
Hello Adar Dembo, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6968 to look at the new patch set (#11). Change subject: KUDU-1755 Part 2: Improve tablet on-disk size metric .. KUDU-1755 Part 2: Improve tablet on-disk size metric This adds WAL segment and cmeta sizes to the tablet on-disk metric. It also fixes the tablet superblock on disk size, which was being underestimated because the size was measured as the PB size instead of counting the container file the PB is stored in. It also exposes a second metric, Tablet Data Size On Disk, which measures just the size of data blocks and does not include metadata or WAL segments. Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315 --- M src/kudu/consensus/consensus_meta-test.cc M src/kudu/consensus/consensus_meta.cc M src/kudu/consensus/consensus_meta.h M src/kudu/consensus/log-test.cc M src/kudu/consensus/log.cc M src/kudu/consensus/log.h M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/integration-tests/tablet_copy-itest.cc M src/kudu/tablet/diskrowset-test.cc M src/kudu/tablet/diskrowset.cc M src/kudu/tablet/diskrowset.h M src/kudu/tablet/memrowset.h M src/kudu/tablet/mock-rowsets.h M src/kudu/tablet/rowset.cc M src/kudu/tablet/rowset.h M src/kudu/tablet/tablet-test.cc M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h M src/kudu/tablet/tablet_metadata-test.cc M src/kudu/tablet/tablet_metadata.cc M src/kudu/tablet/tablet_metadata.h M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h 24 files changed, 258 insertions(+), 56 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/68/6968/11 -- To view, visit http://gerrit.cloudera.org:8080/6968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315 Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-1755 Part 2: Improve tablet on-disk size metric
Will Berkeley has posted comments on this change. Change subject: KUDU-1755 Part 2: Improve tablet on-disk size metric .. Patch Set 10: (11 comments) http://gerrit.cloudera.org:8080/#/c/6968/10/src/kudu/consensus/consensus_meta.h File src/kudu/consensus/consensus_meta.h: Line 153: return on_disk_size_; > Can we say this is thread-safe and use an atomic? Done PS10, Line 241: uint64_t > how about atomic and int64_t because there is just no way we are going to n Done http://gerrit.cloudera.org:8080/#/c/6968/10/src/kudu/consensus/log.cc File src/kudu/consensus/log.cc: PS10, Line 922: CHECK_OK > Do we need this CHECK? Also, can we make this safe to run even if log_ is s Done http://gerrit.cloudera.org:8080/#/c/6968/10/src/kudu/consensus/log.h File src/kudu/consensus/log.h: PS10, Line 201: TotalSize > maybe name this OnDiskSize() for consistency? Done http://gerrit.cloudera.org:8080/#/c/6968/10/src/kudu/consensus/raft_consensus.cc File src/kudu/consensus/raft_consensus.cc: Line 2645: UniqueLock lock(lock_); > If we make cmeta_->on_disk_size() thread-safe, then we can avoid taking the Done http://gerrit.cloudera.org:8080/#/c/6968/10/src/kudu/tablet/diskrowset.cc File src/kudu/tablet/diskrowset.cc: Line 690: shared_lock l(component_lock_); > 1. Can we just take a ref to base_data_ instead of holding the component_lo 1. I think you are right, but the pattern in the class is to use component_lock_ whenever doing anything with base_data_. Is there a reason to prefer taking a ref over acquiring the lock briefly? 2. Done. http://gerrit.cloudera.org:8080/#/c/6968/10/src/kudu/tablet/rowset.h File src/kudu/tablet/rowset.h: Line 115: virtual Status DebugDump(std::vector *lines = NULL) = 0; > warning: default arguments on virtual or override methods are prohibited [g Pre-existing; ignored. http://gerrit.cloudera.org:8080/#/c/6968/10/src/kudu/tablet/tablet_replica.cc File src/kudu/tablet/tablet_replica.cc: PS10, Line 377: OnDiskSizeNoCMetaUnlocked > why the "no cmeta" version? could use a comment I think that was a mistake, a leftover from a previous version. There's no need for that method at all now. PS10, Line 645: auto consensus = consensus_; : if (consensus) { > because consensus_ is set-once, this can be written as: Adar had misgivings about this form. The comment was in PS10 but I'll reproduce it here for convenience: "This isn't safe; you need to take a local ref of consensus_ and then test it for nullitude. Or you need to make the call with lock_ held, which guarantees that consensus_ isn't modified. Okay, I went and read the comments in TabletReplica which talk about how consensus_ is a "set-once" object. That seems bogus to me, though admittedly I'm not an expert on this code. TabletReplica::Shutdown() resets consensus_ with lock_ held. Can we guarantee that when Shutdown() is called, no other thread (besides the one calling Shutdown()) will invoke a TabletReplica function? I don't see how we can (hence the fragility). And if we can't, then consensus_ access needs to be atomic, either by taking a local ref then operating on it, or by acquiring lock_ first." PS10, Line 656: size_t > Can we use int64_t instead of size_t? We are never going to have 64-bit len Done PS10, Line 660: if (tablet_) { : ret += tablet_->OnDiskSize(); : } : // WAL segments. : if (state_ == RUNNING) { : ret += log_->TotalSize(); : } > Instead of doing this work while holding TabletReplica::lock_, can this be Done -- To view, visit http://gerrit.cloudera.org:8080/6968 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia83f8aae0a544abbd19f5a34520b4828b89b6315 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] KUDU-1807: improve IsCreateInProgress performance
Hello Dan Burkert, Todd Lipcon, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/7957 to review the following change. Change subject: KUDU-1807: improve IsCreateInProgress performance .. KUDU-1807: improve IsCreateInProgress performance For some inexplicable reason, IsCreateInProgress is invoked when servicing a GetTableSchema RPC. While it's attractive to change that, it's also untenable without affecting backwards compatibility. So instead, here's a patch that adds in-memory caching to the master so that IsCreateInProgress needn't acquire N tablet locks. Just like schema_version_counts_, the cached state is a map of all tablet states to the number of tablets currently in that state. For this particular problem a single count of "number of creating tablets" might suffice, but I liked the consistency with schema_version_counts_ and I found it easier to handle the various corner cases when accounting for all states. Because the map contents reflect persistent state, the locking semantics differ somewhat from schema_version_counts_: 1. Tablet state changes are usually wrapped in a tablet metadata lock, so care must be taken to only update the state if the lock commits. 2. Further, the count must be updated with the tablet metadata lock held. Why? So that count updates are serialized in the same order as state changes themselves. Consider an example where two threads are trying to update a tablet's state (currently X) to Y and Z respectively. The count of state X begins at 1 (C_X=1) and the counts of states Y and Z are 0 (C_Y=C_Z=0). There's no defined order so whichever state change happens second "wins". However, if the count updates were allowed outside the metadata locks, it would be possible for the order of operations to be X->Y, Y->Z, (C_Y-- C_Z++), (C_X-- C_Y++), causing C_Y to momentarily go negative. If count updates are restricted to inside the metadata locks, a state change order of X->Y, Y->Z forces the count update order to be (C_X-- C_Y++), (C_Y--, C_Z++). To accommodate these semantics, the ScopedTabletInfoCommitter was modified to support tracking state changes. There are two notable (and confusing) exceptions which are documented in the code. Change-Id: Idca379bb1249e54c0b7fbb0df408ed829ab9fbd2 --- M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h 2 files changed, 224 insertions(+), 68 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/57/7957/1 -- To view, visit http://gerrit.cloudera.org:8080/7957 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Idca379bb1249e54c0b7fbb0df408ed829ab9fbd2 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Todd Lipcon