[kudu-CR] KUDU-2068: pass --gcc-toolchain into clang codegen build
Dan Burkert has posted comments on this change. Change subject: KUDU-2068: pass --gcc-toolchain into clang codegen build .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/7463 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Id3ccb129432e3915401ca7f1f815ddb355faf597 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-2068: pass --gcc-toolchain into clang codegen build
Adar Dembo has uploaded a new patch set (#2). Change subject: KUDU-2068: pass --gcc-toolchain into clang codegen build .. KUDU-2068: pass --gcc-toolchain into clang codegen build As a brief refresher: Kudu can be built using various versions of gcc and clang, but the codegen build always uses clang from thirdparty. Without this patch, we delegate finding the system header/library prefix to clang. The problem is that the result isn't guaranteed to match the prefix used by the compiler building the rest of Kudu, which may lead to obscure runtime errors due to std:: class layout mismatches. Kudu consumers using custom toolchains have been forced to work around this issue [1]. There were no build issues in the following environments: - gcc 5.4 system compiler on Ubuntu 16 (--prefix=/usr). - gcc 4.9 devtoolset-3 compiler on CentOS 6.6 (--prefix=/opt/rh/devtoolset-3/root/usr). - clang 4.0 thirdparty compiler on Ubuntu 16 (no --prefix). Additionally, I built Kudu in the following environments: - gcc 4.8 system compiler on CentOS 7.3 (--prefix=/usr) with devtoolset-3 also installed. - gcc 6.2 devtoolset-6 compiler on CentOS 7.3 (--prefix=/opt/rh/devtoolset-6/root/usr). In these cases codegen-test crashed without the patch, and stopped crashing after the patch was applied. 1. https://github.com/cloudera/native-toolchain/blob/f0105fd35527f416799acb4aa92a887cbf511ce5/source/kudu/build.sh Change-Id: Id3ccb129432e3915401ca7f1f815ddb355faf597 --- M CMakeLists.txt M cmake_modules/CompilerInfo.cmake M src/kudu/codegen/CMakeLists.txt 3 files changed, 21 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/7463/2 -- To view, visit http://gerrit.cloudera.org:8080/7463 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id3ccb129432e3915401ca7f1f815ddb355faf597 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] disk failure: coordinate error handling
Mike Percy has posted comments on this change. Change subject: disk failure: coordinate error handling .. Patch Set 20: (2 comments) http://gerrit.cloudera.org:8080/#/c/7029/20/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: Line 1072: void TSTabletManager::FailTabletsInDataDir(const string& uuid) { Why don't we just have this be a thunk that says "something failed" and then have the TSTabletManager do something like this: for (const auto& tablet_id : dd_mgr->GetFailedTablets()) { Tablet* t = Lookup(tablet_id); t->Shutdown(); } Maybe I'm missing something for this part, but we need a lock somewhere to track which data dirs are "broken" so that we can ensure that asynchronous things like tablet copy will also abort if it's in the process of copying a tablet that has blocks now on a failed disk. Line 1081: LOG(INFO) << "Shutting down tablet (not implemented in this patch): " << tablet_id; If this doesn't do anything, why not just have an empty body of the function? -- To view, visit http://gerrit.cloudera.org:8080/7029 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib3169deada702c527b70fabf9d2223364f9a9ea6 Gerrit-PatchSet: 20 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR](gh-pages) Add a note in the YCSB post
Todd Lipcon has posted comments on this change. Change subject: Add a note in the YCSB post .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7396/1/_posts/2016-04-26-ycsb.md File _posts/2016-04-26-ycsb.md: PS1, Line 369: changing their value isn't warranted "changing their value" is not clear. Maybe something like: As of Kudu 0.10.0, the default configuration was changed based on the results of the above exploration. We recommend against modifying these configuration variables in Kudu 1.0 or later. -- To view, visit http://gerrit.cloudera.org:8080/7396 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ife414e34388cff6b139580d76d44a2c3477b9abe Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-Owner: Jean-Daniel CryansGerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-2065: Support cancellation for outbound RPC call
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7455 to look at the new patch set (#2). Change subject: KUDU-2065: Support cancellation for outbound RPC call .. KUDU-2065: Support cancellation for outbound RPC call This change implements a new interface Proxy::Cancel() which takes a RpcController* as argument and cancels any pending OutboundCall associated with it. Proxy::Cancel() queues a cancellation task scheduled on the reactor thread for that outbound call. Once the task is run, it will cancel the outbound call right away if the RPC hasn't started sending yet or if it has already sent the request and waiting for a response. If cancellation happens when the RPC request is being sent, the RPC will be cancelled only after the RPC has finished sending the request. If the RPC is finished, the cancellation will be a no-op. Change-Id: Iaf53c5b113de10d573bd32fb9b2293572e806fbf --- M java/kudu-client/src/main/java/org/apache/kudu/client/Status.java M src/kudu/common/wire_protocol.cc M src/kudu/common/wire_protocol.proto M src/kudu/rpc/connection.cc M src/kudu/rpc/connection.h M src/kudu/rpc/messenger.cc M src/kudu/rpc/messenger.h M src/kudu/rpc/outbound_call.cc M src/kudu/rpc/outbound_call.h M src/kudu/rpc/proxy.cc M src/kudu/rpc/proxy.h M src/kudu/rpc/reactor.cc M src/kudu/rpc/reactor.h M src/kudu/rpc/rpc-test-base.h M src/kudu/rpc/rpc-test.cc M src/kudu/rpc/rpc_introspection.proto M src/kudu/util/status.cc M src/kudu/util/status.h 18 files changed, 341 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/55/7455/2 -- To view, visit http://gerrit.cloudera.org:8080/7455 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iaf53c5b113de10d573bd32fb9b2293572e806fbf Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tidy Bot
[kudu-CR] thirdparty: use ninja when possible
Dan Burkert has posted comments on this change. Change subject: thirdparty: use ninja when possible .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7461/1/thirdparty/build-definitions.sh File thirdparty/build-definitions.sh: Line 56: if which ninja > /dev/null ; then RHEL installs ninja as ninja-build, perhaps you should add a case for that. -- To view, visit http://gerrit.cloudera.org:8080/7461 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idacd6d8bc9ad371c3c925f7c628ad69785bb870e Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] KUDU-2066. Add experimental Gradle build support
Todd Lipcon has posted comments on this change. Change subject: KUDU-2066. Add experimental Gradle build support .. Patch Set 7: (7 comments) http://gerrit.cloudera.org:8080/#/c/7258/7/java/gradle/shadow.gradle File java/gradle/shadow.gradle: Line 21: tasks.remove(knows) // Remove "easter egg" knows task. what's this? tried googling but couldn't find it PS7, Line 45: project.conf2ScopeMappings.addMapping( : MavenPlugin.COMPILE_PRIORITY + 1, : configurations.compileUnshaded, : Conf2ScopeMappingContainer.COMPILE : ) not understanding this part at all http://gerrit.cloudera.org:8080/#/c/7258/7/java/gradle/wrapper/gradle-wrapper.jar File java/gradle/wrapper/gradle-wrapper.jar: the ASF is against checking in binary jars http://gerrit.cloudera.org:8080/#/c/7258/7/java/gradlew.bat File java/gradlew.bat: Line 1: @if "%DEBUG%" == "" @echo off can we add these files to the RAT excludes file? http://gerrit.cloudera.org:8080/#/c/7258/7/java/kudu-client/src/test/java/org/apache/kudu/client/TestTimeouts.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestTimeouts.java: Line 35: @Ignore why is this change in here? if you want to disable a test please do it in a separate unrelated commit. http://gerrit.cloudera.org:8080/#/c/7258/7/java/kudu-spark/build.gradle File java/kudu-spark/build.gradle: Line 40: } else { is it possible to be explicit about == "2" and then give a build error if it is anything else? Line 50: } else { same -- To view, visit http://gerrit.cloudera.org:8080/7258 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib257cdd019a1f383c886b9238bb47d96576c4421 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Grant HenkeGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] thirdparty: use ninja when possible
Adar Dembo has posted comments on this change. Change subject: thirdparty: use ninja when possible .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/7461/1//COMMIT_MSG Commit Message: PS1, Line 10: less ugly output. What's the difference here? And is it still less ugly when outputting to something that's not a real terminal? http://gerrit.cloudera.org:8080/#/c/7461/1/thirdparty/build-definitions.sh File thirdparty/build-definitions.sh: Line 54: # If ninja is on the path, use that instead of make for a slightly Would prefer if this happened in build-thirdparty.sh; this file only defines functions, it doesn't actually run anything when sourced. Line 97: build_cmake() { What about here? If it can't work, add a comment explaining why? -- To view, visit http://gerrit.cloudera.org:8080/7461 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Idacd6d8bc9ad371c3c925f7c628ad69785bb870e Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] disk failure: coordinate error handling
Mike Percy has posted comments on this change. Change subject: disk failure: coordinate error handling .. Patch Set 20: (5 comments) http://gerrit.cloudera.org:8080/#/c/7029/20/src/kudu/fs/data_dirs.h File src/kudu/fs/data_dirs.h: Line 297: uint16_t* GetUuidIdxForUuid(const std::string& uuid) { I looked through here and I believe this is safe. That said, this API is a bit scary at first glance. What lifetime guarantees do we make about the pointee here? (To me it looks like the ptr is good as long as the DataDirManager has been opened and was not yet destroyed.) We should at least document them. Still, I think it would be a little more intuitive if this returned a bool or a Status and took a uint16_t* out-parameter. http://gerrit.cloudera.org:8080/#/c/7029/20/src/kudu/fs/error_manager.h File src/kudu/fs/error_manager.h: Line 34: typedef CallbackErrorNotificationCb; doc callback argument Line 57: void RunErrorNotificationCb(const std::string& uuid) const { nit: arguments need docs Line 58: notify_cb_.Run(uuid); It makes me a little nervous not to submit this to a thread pool. We'd have to be careful to release any locks held which may not be easy to do at a low level if we are calling back into a higher level class. Although I guess if we don't need it then "YAGNI" Line 61: void RunErrorNotificationCb(const DataDir* dir) const { Can we remove this one? -- To view, visit http://gerrit.cloudera.org:8080/7029 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib3169deada702c527b70fabf9d2223364f9a9ea6 Gerrit-PatchSet: 20 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR](branch-1.3.x) [security] fixed shortened TSK validity interval
Todd Lipcon has posted comments on this change. Change subject: [security] fixed shortened TSK validity interval .. Patch Set 1: Hey Alexey. That link isn't publicly accessible, maybe worth explaining what the problem is? IIRC this patch basically fixes it so that the default is the proper 7 days instead of an accidental 6 days. I don't see a big benefit to making this backport, since the real fix is to do the automatic renewal which is added in later versions, but maybe I'm missing something. -- To view, visit http://gerrit.cloudera.org:8080/7411 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I84f9789276c4b48a3ba5274393fe30c8bf3ea85d Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: branch-1.3.x Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Jean-Daniel Cryans Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] thirdparty: use ninja when possible
Hello Dan Burkert, Adar Dembo, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/7461 to review the following change. Change subject: thirdparty: use ninja when possible .. thirdparty: use ninja when possible Using ninja to build LLVM, etc, is a bit faster than using make, and produces less ugly output. Change-Id: Idacd6d8bc9ad371c3c925f7c628ad69785bb870e --- M thirdparty/build-definitions.sh 1 file changed, 23 insertions(+), 6 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/7461/1 -- To view, visit http://gerrit.cloudera.org:8080/7461 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Idacd6d8bc9ad371c3c925f7c628ad69785bb870e Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert
[kudu-CR] Upgrade to cmake 3.9.0
Todd Lipcon has posted comments on this change. Change subject: Upgrade to cmake 3.9.0 .. Patch Set 1: Yea the main reason is that it fixes false dependencies on compilation between modules due to this bug I reported 2 years ago: https://cmake.org/Bug/view.php?id=1 Here are some timings building kudu on an 88-core box using ninja: hot ccache (no real difference) --- cmake 3.6: 151.56user 39.47system 0:12.58elapsed 1518%CPU (0avgtext+0avgdata 271912maxresident)k 256inputs+4433488outputs (1major+6657138minor)pagefaults 0swaps cmake 3.9: 158.52user 38.55system 0:12.78elapsed 1541%CPU (0avgtext+0avgdata 271732maxresident)k 0inputs+4464520outputs (0major+6715240minor)pagefaults 0swaps cold ccache (big speedup with cmake 3.9) cmake 3.6: 2742.24user 192.17system 1:31.36elapsed 3211%CPU (0avgtext+0avgdata 466932maxresident)k 0inputs+14842640outputs (0major+58352987minor)pagefaults 0swaps cmake 3.9: 3461.34user 219.52system 0:57.90elapsed 6356%CPU (0avgtext+0avgdata 471408maxresident)k 0inputs+14838200outputs (0major+60222932minor)pagefaults 0swaps -- To view, visit http://gerrit.cloudera.org:8080/7460 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e5504bcb136e70a5e5c83ee95c98df3cad4d3bd Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] Upgrade to cmake 3.9.0
Adar Dembo has posted comments on this change. Change subject: Upgrade to cmake 3.9.0 .. Patch Set 1: > Any particular reason to want the new one? Particularly, if a future patch introduces a dependency on a cmake feature new in 3.9, we should also update cmake_minimum_required() in CMakeLists.txt -- To view, visit http://gerrit.cloudera.org:8080/7460 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e5504bcb136e70a5e5c83ee95c98df3cad4d3bd Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] Upgrade to cmake 3.9.0
Dan Burkert has posted comments on this change. Change subject: Upgrade to cmake 3.9.0 .. Patch Set 1: Code-Review+2 Any particular reason to want the new one? -- To view, visit http://gerrit.cloudera.org:8080/7460 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I6e5504bcb136e70a5e5c83ee95c98df3cad4d3bd Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] Upgrade to cmake 3.9.0
Hello Dan Burkert, Adar Dembo, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/7460 to review the following change. Change subject: Upgrade to cmake 3.9.0 .. Upgrade to cmake 3.9.0 Change-Id: I6e5504bcb136e70a5e5c83ee95c98df3cad4d3bd --- M thirdparty/vars.sh 1 file changed, 1 insertion(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/7460/1 -- To view, visit http://gerrit.cloudera.org:8080/7460 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I6e5504bcb136e70a5e5c83ee95c98df3cad4d3bd Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert
[kudu-CR] KUDU-2033 (part 1). Add write and stop/start in a loop to TestLeaderFailover.
Mike Percy has posted comments on this change. Change subject: KUDU-2033 (part 1). Add write and stop/start in a loop to TestLeaderFailover. .. Patch Set 2: (7 comments) http://gerrit.cloudera.org:8080/#/c/7456/2//COMMIT_MSG Commit Message: Line 7: KUDU-2033 (part 1). Add write and stop/start in a loop to TestLeaderFailover. nit: Avoid period at end of subject line in a commit message http://gerrit.cloudera.org:8080/#/c/7456/2/java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java File java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java: PS2, Line 351: Integer Can we return an int instead of an Integer here and below? I can't see any reason to use a boxed primitive here. http://gerrit.cloudera.org:8080/#/c/7456/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestLeaderFailover.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestLeaderFailover.java: Line 42:* This test writes 3 rows. Then in a loop, it kills the leader, then tries to write inner_row rows. Verifying nit: line length should be <= 100 chars Line 49: public void testFailover() throws Exception { Can we keep the existing simple test and add an additional test that uses this "stress test" approach? Line 51: int startRows = 3; style: use final and all caps for constants Line 67: for (int j = i; j < i + innerRows; j++) { I find this logic a bit difficult to follow. I'd suggest keeping a counter like currentRows and just increment that each time you insert a row, then use simple i and j counters as needed to control the number of iterations per loop. something like: int currentRows = 0; for (int i = 0; i < startRows; i++) { session.apply(createBasicSchemaInsert(table, i)); currentRows++; } for (int i = 0; i < numIterations; i++) { // ... for (int j = 0; j < rowsPerIteration; j++) { session.apply(...); currentRows++; } } // ... assertEquals(totalRowsToInsert, countRowsInScan(scanner)); Line 77: assertEquals(i + innerRows, countRowsInScan(scanner)); these assertions should probably be assert eventually -- To view, visit http://gerrit.cloudera.org:8080/7456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie93a15084a4c4c0748dc74c005b8313f443a5ba9 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Edward FancherGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: Yes
[kudu-CR] KUDU-2033 (part 1). Add write and stop/start in a loop to TestLeaderFailover.
Edward Fancher has uploaded a new patch set (#2). Change subject: KUDU-2033 (part 1). Add write and stop/start in a loop to TestLeaderFailover. .. KUDU-2033 (part 1). Add write and stop/start in a loop to TestLeaderFailover. TestLeaderFailover is made more robust by adding a write loop in which the tablet leader is killed and restarted. The main idea is to check that the writes complete and can be read after the leader is killed. Done in a loop so we can be sure that we stay stable through multiple stops and restarts of the leader tablet. Change-Id: Ie93a15084a4c4c0748dc74c005b8313f443a5ba9 --- M java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestLeaderFailover.java 2 files changed, 29 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/56/7456/2 -- To view, visit http://gerrit.cloudera.org:8080/7456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie93a15084a4c4c0748dc74c005b8313f443a5ba9 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Edward FancherGerrit-Reviewer: Kudu Jenkins
[kudu-CR] KUDU-2033. Add a 'torture' scenario to verify Java client's behavior during fail-over
Edward Fancher has uploaded a new change for review. http://gerrit.cloudera.org:8080/7456 Change subject: KUDU-2033. Add a 'torture' scenario to verify Java client's behavior during fail-over .. KUDU-2033. Add a 'torture' scenario to verify Java client's behavior during fail-over TestLeaderFailover is made more robust by adding a write loop in which the tablet leader is killed and restarted. The main idea is to check that the writes complete and can be read after the leader is killed. Done in a loop so we can be sure that we stay stable through multiple stops and restarts of the leader tablet. Change-Id: Ie93a15084a4c4c0748dc74c005b8313f443a5ba9 --- M java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestLeaderFailover.java 2 files changed, 29 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/56/7456/1 -- To view, visit http://gerrit.cloudera.org:8080/7456 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ie93a15084a4c4c0748dc74c005b8313f443a5ba9 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Edward Fancher
[kudu-CR] kudu-spark-tools: Spark tool for Import & Export different format of files such as parquet,avro,csv in and to from kudu tables kudu-client-tools: mapreduced base export to csv and import par
Jean-Daniel Cryans has posted comments on this change. Change subject: kudu-spark-tools: Spark tool for Import & Export different format of files such as parquet,avro,csv in and to from kudu tables kudu-client-tools: mapreduced base export to csv and import parquet files. .. Patch Set 2: (29 comments) http://gerrit.cloudera.org:8080/#/c/7421/2//COMMIT_MSG Commit Message: Line 7: kudu-spark-tools: Spark tool for Import & Export different format of files such as parquet,avro,csv in and to from kudu tables kudu-client-tools: mapreduced base export to csv and import parquet files. Please follow best practices for git commit messages, such as https://chris.beams.io/posts/git-commit/ http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsv.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsv.java: PS2, Line 35: a CSV files. nit: is there one or many files? Line 39: public class ExportCsv extends Configured implements Tool { This needs a test. PS2, Line 50: The current configuration. I know the ImportCsv* classes is like this, but the javadoc format we use dictates starting @param line with a lower case and and not ending with a period. See http://www.oracle.com/technetwork/java/javase/documentation/index-137868.html#format No need to fix ImportCsv in this patch but you'd be welcome to contribute a separate patch for it :) PS2, Line 85: with columns nit: rephrase to something like "the given table and columns into..." PS2, Line 86: "T nit: mind putting this on a new line? Then the code looks like what the error would look like. PS2, Line 86: kudu nit: Kudu http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java: Line 32: nit: remove the extra blank line PS2, Line 48: (i.e., the parsing). That made more sense in ImportCsbMapper, I'd just remove this javadoc completely. Line 54: this.separator = conf.get(ExportCsv.SEPARATOR_CONF_KEY); single line: this.separator = conf.get(ExportCsv.SEPARATOR_CONF_KEY, ExportCsv.DEFAULT_SEPARATOR); PS2, Line 61: Insert nits: s/Convert/Converts s/Insert/RowResult also end the sentence with a period. Line 74:* converts RowResult $this.separator string Please revise this whole javadoc section. FWIW it might not be necessary, since it's a private method. Also it's not doing anything surprising. Line 114: default: Missing UNIXTIME_MICROS? http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java: PS2, Line 43: PARQUET nit: Apache Parquet Line 47: public class ImportParquet extends Configured implements Tool { This needs a test. Line 59:* @param conf The current configuration. Same comment as before regarding javadoc. Line 75: LOG.info(schema); Did you forget to remove this? Line 88: FileInputFormat.setInputPaths(job, inputDir); You could run some pre-flight checks like making sure that the columns match. PS2, Line 105: PARQUET Apache Parquet http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java: PS2, Line 39: PARQUET lines and turns them into Kudu Inserts. some comments as before regarding PARQUET and Inserts Line 101: case DOUBLE: UNIXTIME_MICROS would be recognized but not supported, someone might have TIMESTAMP and think it's compatible, maybe you can catch that before like in createSubmittableJob? http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ParquetReadSupport.java File java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ParquetReadSupport.java: Line 27: * Created by sany on 26/06/2017 AD. Remove. PS2, Line 29: nit Line 37: Set counts = context.getKeyValueMetadata().get("my.count"); What's this about? Line 38: // assertTrue("counts: " + counts, counts.size() > 0); Remove. http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala File java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala: Line 1: remove this blank line Line 18: package org.apache.kudu.spark.tools add a blank line Line 24: import org.slf4j.{Logger, LoggerFactory} please re-order this Line 29: object ImportExportKudu { I'm less familiar with scala but at least we'll need a test. -- To view, visit
[kudu-CR] disk failure: coordinate error handling
Adar Dembo has posted comments on this change. Change subject: disk failure: coordinate error handling .. Patch Set 16: (3 comments) http://gerrit.cloudera.org:8080/#/c/7029/16/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: Line 1067: uint16_t* uuid_idx = fs_manager_->dd_manager()->GetUuidIdxForUuid(uuid); > Maybe pull out fs_manager_->dd_manager() into a local variable so you don't Did you miss this? http://gerrit.cloudera.org:8080/#/c/7029/20/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: PS20, Line 1074: (const st Didn't think this case was possible. http://gerrit.cloudera.org:8080/#/c/7029/20/src/kudu/tserver/ts_tablet_manager.h File src/kudu/tserver/ts_tablet_manager.h: Line 188: I still think it'd be OK to make this private and wire/unwire in Init/Shutdown (as opposed to the ctor/dtor). Did Todd object to that specific kind of wiring? I thought the 'weirdness' he felt had to do with side effects in the ctor. -- To view, visit http://gerrit.cloudera.org:8080/7029 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ib3169deada702c527b70fabf9d2223364f9a9ea6 Gerrit-PatchSet: 16 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-2065: Support cancellation for outbound RPC call
Michael Ho has posted comments on this change. Change subject: KUDU-2065: Support cancellation for outbound RPC call .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/7455/1/src/kudu/rpc/outbound_call.cc File src/kudu/rpc/outbound_call.cc: PS1, Line 377: delete [] header_buf_.release(); : sidecars_.clear(); Consider doing the same thing for other Set*() functions or moving it to CallCallback() ? May also wanna set controller_ field to null as it may be freed by the callback. -- To view, visit http://gerrit.cloudera.org:8080/7455 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iaf53c5b113de10d573bd32fb9b2293572e806fbf Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Michael HoGerrit-Reviewer: Michael Ho Gerrit-HasComments: Yes
[kudu-CR] KUDU-2065: Support cancellation for outbound RPC call
Michael Ho has uploaded a new change for review. http://gerrit.cloudera.org:8080/7455 Change subject: KUDU-2065: Support cancellation for outbound RPC call .. KUDU-2065: Support cancellation for outbound RPC call This change implements a new interface Proxy::Cancel() which takes a RpcController* as argument and cancels any pending OutboundCall associated with it. Proxy::Cancel() queues a cancellation task scheduled on the reactor thread for that outbound call. Once the task is run, it will cancel the outbound call right away if the RPC hasn't started sending yet or if it has already sent the request and waiting for a response. If cancellation happens when the RPC request is being sent, the RPC will be cancelled only after the RPC has finished sending the request. If the RPC is finished, the cancellation will be a no-op. Change-Id: Iaf53c5b113de10d573bd32fb9b2293572e806fbf --- M java/kudu-client/src/main/java/org/apache/kudu/client/Status.java M src/kudu/common/wire_protocol.cc M src/kudu/common/wire_protocol.proto M src/kudu/rpc/connection.cc M src/kudu/rpc/connection.h M src/kudu/rpc/messenger.cc M src/kudu/rpc/messenger.h M src/kudu/rpc/outbound_call.cc M src/kudu/rpc/outbound_call.h M src/kudu/rpc/proxy.cc M src/kudu/rpc/proxy.h M src/kudu/rpc/reactor.cc M src/kudu/rpc/reactor.h M src/kudu/rpc/rpc-test-base.h M src/kudu/rpc/rpc-test.cc M src/kudu/rpc/rpc_introspection.proto M src/kudu/util/status.cc M src/kudu/util/status.h 18 files changed, 350 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/55/7455/1 -- To view, visit http://gerrit.cloudera.org:8080/7455 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Iaf53c5b113de10d573bd32fb9b2293572e806fbf Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Michael Ho
[kudu-CR] disk failure: tests for disk failure recovery
Andrew Wong has posted comments on this change. Change subject: disk failure: tests for disk failure recovery .. Patch Set 7: (34 comments) http://gerrit.cloudera.org:8080/#/c/7031/3//COMMIT_MSG Commit Message: PS3, Line 16: exercises > typo Done http://gerrit.cloudera.org:8080/#/c/7031/3/src/kudu/integration-tests/disk_failure-itest.cc File src/kudu/integration-tests/disk_failure-itest.cc: Line 26: #include "kudu/gutil/map-util.h" > warning: #includes are not sorted properly [llvm-include-order] Done Line 38: > warning: using decl 'map' is unused [misc-unused-using-decls] Done PS3, Line 61: > nit: indent These have been removed to use TestWorkload instead PS3, Line 62: > nit: using... same Line 81: NO_FATALS(CreateTable(table_id)); > warning: redundant 'test_info_' declaration [readability-redundant-declarat seems harmless? PS3, Line 82: WaitForTSAndReplicas(table_id); : } : } : : // Sets up a cluster with three servers with two disks each. : vectorSetupDefaultCluster() { : FLAGS_num_tablet_servers = 3; : CreateCluster("survivable_cluster", kDefaultFlags, {}, /* num_data_dirs */ 2); : CreateClient(_); : vector tservers; : AppendValuesFromMap(tablet_servers_, ); : CHECK_EQ(3, tservers.size()); : vector ext_tservers; : for (auto* details : tservers) { : ext_tservers.push_back(cluster_->tablet_server_by_uuid(details->uuid())); : } : return ext_tservers; > nit move this comment to before the test declaration, or at least give an i Done PS3, Line 150: if (counts_after - counts_before > 0) { : dirs_written->push_back(e.first); : } else { : dirs_not_written->push_back(e.first); : } : } : } > this is a bit funky (depending on the number of files) any way we can be mo Done. I've changed it to count the files before and compare it with the counts after running whatever function. PS3, Line 214: d on tw > tablet servers Done http://gerrit.cloudera.org:8080/#/c/7031/6/src/kudu/integration-tests/disk_failure-itest.cc File src/kudu/integration-tests/disk_failure-itest.cc: Line 28: #include "kudu/integration-tests/test_workload.h" > warning: #includes are not sorted properly [llvm-include-order] Done Line 66: string GlobForBlockFileInDataDir(const string& data_dir) { > warning: invalid case style for global constant 'ts_flags' [readability-ide Done Line 101: void SetServerSurvivalFlags(vector & ext_tservers) { > warning: non-const reference parameter 'ext_tservers', make it const or use Done Line 126: const std::function & f, > warning: the parameter 'f' is copied for each invocation but only used as a Done Line 165: void GetDataDirsWrittenToByWorkload(const vector ext_tservers, > warning: the const qualified parameter 'ext_tservers' is copied for each in Done Line 166: TestWorkload* workload, > warning: non-const reference parameter 'workload', make it const or use a p Done http://gerrit.cloudera.org:8080/#/c/7031/7/src/kudu/integration-tests/disk_failure-itest.cc File src/kudu/integration-tests/disk_failure-itest.cc: Line 30: #include "kudu/util/path_util.h" > warning: #includes are not sorted properly [llvm-include-order] Done PS7, Line 51: uint32_t > Simple int would be fine here, I think. Done Line 67: if (FLAGS_block_manager == "file") { > Would it be useful to parameterize these tests and run them on both block m Done PS7, Line 68: ** > Isn't this middle entry also '??'? I thought block paths were data/ab/cd/ab Done Line 91: vector tservers; > Maybe omit this and iterate on the tablet_servers_ map directly? Done Line 101: void SetServerSurvivalFlags(vector & ext_tservers) { > warning: non-const reference parameter 'ext_tservers', make it const or use Done Line 150: if (counts_after - counts_before > 0) { > Would if counts_after > counts_before be simpler? Done Line 165: void GetDataDirsWrittenToByWorkload(const vector ext_tservers, > warning: the const qualified parameter 'ext_tservers' is copied for each in Done Line 176: workload->StopAndJoin(); > IIUC, we're writing just enough to figure out who is writing where, and the Yes, exactly Line 184: ext_tserver->GetInt64Metric(_ENTITY_server, nullptr, _data_dirs_failed, > Need ASSERT_OK here too? Done Line 200: ASSERT_OK(row->SetInt32(0, i +
[kudu-CR] disk failure: tests for disk failure recovery
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7031 to look at the new patch set (#10). Change subject: disk failure: tests for disk failure recovery .. disk failure: tests for disk failure recovery This patch adds an EMC test that spawns three servers and triggers EIOs on two of them to fail two different tablets. With improper disk-failure-handling, this scenario alone would have been enough to leave the server with only a single copy of data, as the two servers with EIOs would have been shut down entirely. With proper disk-failure handling, this scenario would be salvageable, and data would be replicated on the remaining disks. This exercises the FlushMRS codepath. Tests are also added to test behavior during FlushDMS calls and during scans, ensuring the servers return to a normal state. All of these tests are parameterized to run with both the LBM and FBM. Change-Id: I8ff63ec71ab718866484b9f3ec7264bc72ecfe97 --- M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/disk_failure-itest.cc 2 files changed, 382 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/31/7031/10 -- To view, visit http://gerrit.cloudera.org:8080/7031 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8ff63ec71ab718866484b9f3ec7264bc72ecfe97 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] disk failure: coordinate error handling
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7029 to look at the new patch set (#20). Change subject: disk failure: coordinate error handling .. disk failure: coordinate error handling I/O errors are spawned when calls to env functions fail. The response to these errors may vary and may span many layers of abstraction. For instance, if an EIO were triggered while writing a block, one response might be to shut down the tablet whose operation just failed. In order to do this without messing up the layering, a new entity, the ErrorManager is added to coordinate error handling. Callbacks are registered with the ErrorManager as responses to errors, and callers to at-risk functions trigger these callbacks as errors are spawned. This patch includes the ErrorManager and its placement w.r.t. the BlockManager, FsManager, TSTabletManager, etc. Actual handling and testing will come in a later patch. This is a part of a series of patches to handle disk failure. See section 2.2 of: https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing Change-Id: Ib3169deada702c527b70fabf9d2223364f9a9ea6 --- M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/block_manager-test.cc M src/kudu/fs/block_manager.h M src/kudu/fs/data_dirs.h A src/kudu/fs/error_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/file_block_manager.h M src/kudu/fs/fs_manager.cc M src/kudu/fs/fs_manager.h M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/fs/log_block_manager.h M src/kudu/tserver/tablet_server.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h 15 files changed, 188 insertions(+), 24 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/29/7029/20 -- To view, visit http://gerrit.cloudera.org:8080/7029 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib3169deada702c527b70fabf9d2223364f9a9ea6 Gerrit-PatchSet: 20 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] WIP disk failure: forced shutdown of replicas
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7439 to look at the new patch set (#4). Change subject: WIP disk failure: forced shutdown of replicas .. WIP disk failure: forced shutdown of replicas When a disk fails, replicas with data on the failed disk must shut down. Normally, when a replica is shutdown, it must wait for all transactions to finish. However, transactions that fail due to disk failure will not complete. This patch adds the ability to Cancel transactions. Unlike Aborts, which cannot end a transaction that is in the APPLYING state (must be RESERVED), Cancels can end any transaction. This is only "safe" because the tablet to whom the MvccManager belongs is expected to shutdown. In addition to this, a new tablet state is added: FAILED_AND_SHUTDOWN. Like QUIESCING and SHUTDOWN, TabletReplica::Shutdown() can wait on FAILED_AND_SHUTDOWN. This is needed to shut down failed tablets that need to be replicated: calling Shutdown() cannot leave the replica in the FAILED state, and the SHUTDOWN state cannot itself indicate the need for a tablet copy. WIP because there is no testing and I'm not sure this is the best design for what I'm trying to accomplish. This is part of a series of patches to address disk failure. To see how this patch fits in, see section 2.3 of: https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit Change-Id: I983620f27e7226806a2cca253db7619731914d42 --- M src/kudu/tablet/metadata.proto M src/kudu/tablet/mvcc.cc M src/kudu/tablet/mvcc.h M src/kudu/tablet/tablet_replica.cc M src/kudu/tablet/tablet_replica.h M src/kudu/tablet/transactions/alter_schema_transaction.h M src/kudu/tablet/transactions/transaction.h M src/kudu/tablet/transactions/transaction_driver.h M src/kudu/tablet/transactions/transaction_tracker-test.cc M src/kudu/tablet/transactions/transaction_tracker.cc M src/kudu/tablet/transactions/transaction_tracker.h M src/kudu/tablet/transactions/write_transaction.cc M src/kudu/tablet/transactions/write_transaction.h M src/kudu/tserver/ts_tablet_manager.cc 14 files changed, 88 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/39/7439/4 -- To view, visit http://gerrit.cloudera.org:8080/7439 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I983620f27e7226806a2cca253db7619731914d42 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] disk failure: reassign tablets due to disk failure
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7440 to look at the new patch set (#3). Change subject: disk failure: reassign tablets due to disk failure .. disk failure: reassign tablets due to disk failure Tablets that are marked FAILED_AND_SHUTDOWN should eventually be replicated, as these tablets have been shut down due to disk failure. If a tablet is in this state and the tserver is responding to a request, no response will be given, as to spur eviction. This is useful, as a disk failure immediately marks a tablet as FAILED (during which the tablet will be seen as TABLET_NOT_RUNNING) and eventually shuts it down forcefully, leaving it in the state FAILED_AND_SHUTDOWN (after which the tablet will be seen as TABLET_FAILED and responses will not update the communication time). This patch is a part of a series of patches to handle disk failure. See section 2.5 in this doc: https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345 --- M src/kudu/client/scanner-internal.cc M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_queue.cc M src/kudu/tserver/tablet_service.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/tserver.proto 6 files changed, 48 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/40/7440/3 -- To view, visit http://gerrit.cloudera.org:8080/7440 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] WIP disk failure: handle disk failures in blocks
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7030 to look at the new patch set (#14). Change subject: WIP disk failure: handle disk failures in blocks .. WIP disk failure: handle disk failures in blocks This patch adds the logic required to prevent a crash on disk failure during I/O to blocks. Block- and container-level functions that call env functions (e.g. ReadV) that may result in disk failure can now run callbacks to fail and shutdown tablets in the failed directory. Failures elswhere (e.g. when reading instance or tablet metadata) will be handled in separate patches. WIP: As failure handling relies on the addition of a handful of other features, end-to-end testing of handling disk failure will come in a later patch. This is part of a series of patches to handle disk failure. See section 2.2 of: https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing Change-Id: Ia03bfb711a1b022d7516f4adb37fe9fb28ec949c --- M src/kudu/fs/data_dirs.h M src/kudu/fs/error_manager.h M src/kudu/fs/file_block_manager.cc M src/kudu/fs/log_block_manager.cc M src/kudu/tablet/mvcc.cc M src/kudu/tserver/ts_tablet_manager.cc M src/kudu/tserver/ts_tablet_manager.h M src/kudu/util/status.h 8 files changed, 239 insertions(+), 68 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/7030/14 -- To view, visit http://gerrit.cloudera.org:8080/7030 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ia03bfb711a1b022d7516f4adb37fe9fb28ec949c Gerrit-PatchSet: 14 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] disk failure: tests for disk failure recovery
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7031 to look at the new patch set (#9). Change subject: disk failure: tests for disk failure recovery .. disk failure: tests for disk failure recovery This patch adds an EMC test that spawns three servers and triggers EIOs on two of them to fail two different tablets. With improper disk-failure-handling, this scenario alone would have been enough to leave the server with only a single copy of data, as the two servers with EIOs would have been shut down entirely. With proper disk-failure handling, this scenario would be salvageable, and data would be replicated on the remaining disks. This exercises the FlushMRS codepath. Tests are also added to test behavior during FlushDMS calls and during scans, ensuring the servers return to a normal state. All of these tests are parameterized to run with both the LBM and FBM. Change-Id: I8ff63ec71ab718866484b9f3ec7264bc72ecfe97 --- M src/kudu/integration-tests/CMakeLists.txt A src/kudu/integration-tests/disk_failure-itest.cc 2 files changed, 380 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/31/7031/9 -- To view, visit http://gerrit.cloudera.org:8080/7031 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8ff63ec71ab718866484b9f3ec7264bc72ecfe97 Gerrit-PatchSet: 9 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] disk failure: don't fail CHECKs for disk failures
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7442 to look at the new patch set (#3). Change subject: disk failure: don't fail CHECKs for disk failures .. disk failure: don't fail CHECKs for disk failures Disk failures are a special case of errors that will be handled. Certain code paths pass along disk failure Statuses until they eventually hit a CHECK and crash Kudu. With this patch, these code paths will instead allow Kudu to continue running, under the assumption that the errors are handled. A small test is added to tablet-test to insert some data and fail. Rather than crashing, the end-state of the tablet must indicated a disk failure. This is a part of a series of patches to handle disk failure. To see how this patch fits in, see section 2.4 of: https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit Change-Id: I109635a54268b9db741b2ae9ea3e9f1fe072d0a8 --- M src/kudu/tablet/delta_tracker.cc M src/kudu/tablet/local_tablet_writer.h M src/kudu/tablet/tablet-test-base.h M src/kudu/tablet/tablet-test.cc M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h M src/kudu/tablet/tablet_replica_mm_ops.cc M src/kudu/tablet/transactions/transaction_driver.cc M src/kudu/tablet/transactions/write_transaction.cc M src/kudu/tserver/ts_tablet_manager.cc 10 files changed, 130 insertions(+), 29 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/42/7442/3 -- To view, visit http://gerrit.cloudera.org:8080/7442 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I109635a54268b9db741b2ae9ea3e9f1fe072d0a8 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [tools] Add a 'kudu tablet relocate' tool
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/7444 to look at the new patch set (#4). Change subject: [tools] Add a 'kudu tablet relocate' tool .. [tools] Add a 'kudu tablet relocate' tool This patch adds a relocate tool that moves a tablet replica from one tablet server to another. Usage: kudu tablet change_config relocate_replica It works by adding to the configuration, waiting for it to tablet copy, then removing . In the typical case, this means the number of replicas will change 3 -> 4 -> 3, and therefore the number of tolerable faults is 1 while the new tablet bootstraps. As a result of this extra fragility, the tool requires the tablet to be in "perfect health" when it runs, meaning ksck returns no errors for the tablet, and also requires the same after the copy is complete but before removing a replica. This probably limits the usefulness of the tool to rebalancing replicas within a healthy tablet. Once pre-voters are implemented, it should be safe to allow the tool to remove a replica just after adding one, without waiting for the copy to finish, assuming the tablet stays healthy. Additionally, this makes some minimal changes to ksck to allow it to print to other output streams besides stdout. The purpose was to allow the output to be suppressed when running the tool, since the use of ksck is an implementation detail, and the output is noisy. Change-Id: I8684e4826bfeaf36b31d297ec1e49897705867f1 Change-Id: I3b7a724ba6e6a3d6fce96b220224d6e38a84 --- M src/kudu/tools/ksck-test.cc M src/kudu/tools/ksck.cc M src/kudu/tools/ksck.h M src/kudu/tools/ksck_remote-test.cc M src/kudu/tools/kudu-admin-test.cc M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_tablet.cc 7 files changed, 388 insertions(+), 142 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/44/7444/4 -- To view, visit http://gerrit.cloudera.org:8080/7444 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3b7a724ba6e6a3d6fce96b220224d6e38a84 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley
[kudu-CR] [tools] Add a 'kudu tablet relocate' tool
Will Berkeley has posted comments on this change. Change subject: [tools] Add a 'kudu tablet relocate' tool .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/7444/1/src/kudu/tools/ksck.cc File src/kudu/tools/ksck.cc: Line 181: } else { > warning: do not use 'else' after 'return' [readability-else-after-return] Done Line 230: } else { > warning: do not use 'else' after 'return' [readability-else-after-return] Done Line 567: } else { > warning: do not use 'else' after 'return' [readability-else-after-return] Done http://gerrit.cloudera.org:8080/#/c/7444/1/src/kudu/tools/ksck_remote-test.cc File src/kudu/tools/ksck_remote-test.cc: Line 49: using strings::Substitute; > warning: using decl 'Substitute' is unused [misc-unused-using-decls] Done -- To view, visit http://gerrit.cloudera.org:8080/7444 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3b7a724ba6e6a3d6fce96b220224d6e38a84 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes