[kudu-CR] KUDU-2068: pass --gcc-toolchain into clang codegen build

2017-07-18 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: KUDU-2068: pass --gcc-toolchain into clang codegen build
..


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7463
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Id3ccb129432e3915401ca7f1f815ddb355faf597
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] KUDU-2068: pass --gcc-toolchain into clang codegen build

2017-07-18 Thread Adar Dembo (Code Review)
Adar Dembo has uploaded a new patch set (#2).

Change subject: KUDU-2068: pass --gcc-toolchain into clang codegen build
..

KUDU-2068: pass --gcc-toolchain into clang codegen build

As a brief refresher: Kudu can be built using various versions of gcc and
clang, but the codegen build always uses clang from thirdparty.

Without this patch, we delegate finding the system header/library prefix to
clang. The problem is that the result isn't guaranteed to match the prefix
used by the compiler building the rest of Kudu, which may lead to obscure
runtime errors due to std:: class layout mismatches. Kudu consumers using
custom toolchains have been forced to work around this issue [1].

There were no build issues in the following environments:
- gcc 5.4 system compiler on Ubuntu 16 (--prefix=/usr).
- gcc 4.9 devtoolset-3 compiler on CentOS 6.6
  (--prefix=/opt/rh/devtoolset-3/root/usr).
- clang 4.0 thirdparty compiler on Ubuntu 16 (no --prefix).

Additionally, I built Kudu in the following environments:
- gcc 4.8 system compiler on CentOS 7.3 (--prefix=/usr) with devtoolset-3
  also installed.
- gcc 6.2 devtoolset-6 compiler on CentOS 7.3
  (--prefix=/opt/rh/devtoolset-6/root/usr).

In these cases codegen-test crashed without the patch, and stopped crashing
after the patch was applied.

1. 
https://github.com/cloudera/native-toolchain/blob/f0105fd35527f416799acb4aa92a887cbf511ce5/source/kudu/build.sh

Change-Id: Id3ccb129432e3915401ca7f1f815ddb355faf597
---
M CMakeLists.txt
M cmake_modules/CompilerInfo.cmake
M src/kudu/codegen/CMakeLists.txt
3 files changed, 21 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/63/7463/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7463
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id3ccb129432e3915401ca7f1f815ddb355faf597
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] disk failure: coordinate error handling

2017-07-18 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: disk failure: coordinate error handling
..


Patch Set 20:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7029/20/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

Line 1072: void TSTabletManager::FailTabletsInDataDir(const string& uuid) {
Why don't we just have this be a thunk that says "something failed" and then 
have the TSTabletManager do something like this:

  for (const auto& tablet_id : dd_mgr->GetFailedTablets()) {
Tablet* t = Lookup(tablet_id);
t->Shutdown();
  }

Maybe I'm missing something for this part, but we need a lock somewhere to 
track which data dirs are "broken" so that we can ensure that asynchronous 
things like tablet copy will also abort if it's in the process of copying a 
tablet that has blocks now on a failed disk.


Line 1081: LOG(INFO) << "Shutting down tablet (not implemented in this 
patch): " << tablet_id;
If this doesn't do anything, why not just have an empty body of the function?


-- 
To view, visit http://gerrit.cloudera.org:8080/7029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib3169deada702c527b70fabf9d2223364f9a9ea6
Gerrit-PatchSet: 20
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR](gh-pages) Add a note in the YCSB post

2017-07-18 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: Add a note in the YCSB post
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7396/1/_posts/2016-04-26-ycsb.md
File _posts/2016-04-26-ycsb.md:

PS1, Line 369: changing their value isn't warranted
"changing their value" is not clear. Maybe something like:

As of Kudu 0.10.0, the default configuration was changed based on the results 
of the above exploration. We recommend against modifying these configuration 
variables in Kudu 1.0 or later.


-- 
To view, visit http://gerrit.cloudera.org:8080/7396
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ife414e34388cff6b139580d76d44a2c3477b9abe
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-Owner: Jean-Daniel Cryans 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2065: Support cancellation for outbound RPC call

2017-07-18 Thread Michael Ho (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7455

to look at the new patch set (#2).

Change subject: KUDU-2065: Support cancellation for outbound RPC call
..

KUDU-2065: Support cancellation for outbound RPC call

This change implements a new interface Proxy::Cancel()
which takes a RpcController* as argument and cancels
any pending OutboundCall associated with it.

Proxy::Cancel() queues a cancellation task scheduled on
the reactor thread for that outbound call. Once the task
is run, it will cancel the outbound call right away if
the RPC hasn't started sending yet or if it has already
sent the request and waiting for a response. If cancellation
happens when the RPC request is being sent, the RPC will
be cancelled only after the RPC has finished sending the
request. If the RPC is finished, the cancellation will
be a no-op.

Change-Id: Iaf53c5b113de10d573bd32fb9b2293572e806fbf
---
M java/kudu-client/src/main/java/org/apache/kudu/client/Status.java
M src/kudu/common/wire_protocol.cc
M src/kudu/common/wire_protocol.proto
M src/kudu/rpc/connection.cc
M src/kudu/rpc/connection.h
M src/kudu/rpc/messenger.cc
M src/kudu/rpc/messenger.h
M src/kudu/rpc/outbound_call.cc
M src/kudu/rpc/outbound_call.h
M src/kudu/rpc/proxy.cc
M src/kudu/rpc/proxy.h
M src/kudu/rpc/reactor.cc
M src/kudu/rpc/reactor.h
M src/kudu/rpc/rpc-test-base.h
M src/kudu/rpc/rpc-test.cc
M src/kudu/rpc/rpc_introspection.proto
M src/kudu/util/status.cc
M src/kudu/util/status.h
18 files changed, 341 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/55/7455/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7455
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iaf53c5b113de10d573bd32fb9b2293572e806fbf
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tidy Bot


[kudu-CR] thirdparty: use ninja when possible

2017-07-18 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: thirdparty: use ninja when possible
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7461/1/thirdparty/build-definitions.sh
File thirdparty/build-definitions.sh:

Line 56: if which ninja > /dev/null ; then
RHEL installs ninja as ninja-build, perhaps you should add a case for that.


-- 
To view, visit http://gerrit.cloudera.org:8080/7461
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idacd6d8bc9ad371c3c925f7c628ad69785bb870e
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2066. Add experimental Gradle build support

2017-07-18 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: KUDU-2066. Add experimental Gradle build support
..


Patch Set 7:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/7258/7/java/gradle/shadow.gradle
File java/gradle/shadow.gradle:

Line 21: tasks.remove(knows)  // Remove "easter egg" knows task.
what's this? tried googling but couldn't find it


PS7, Line 45: project.conf2ScopeMappings.addMapping(
: MavenPlugin.COMPILE_PRIORITY + 1,
: configurations.compileUnshaded,
: Conf2ScopeMappingContainer.COMPILE
: )
not understanding this part at all


http://gerrit.cloudera.org:8080/#/c/7258/7/java/gradle/wrapper/gradle-wrapper.jar
File java/gradle/wrapper/gradle-wrapper.jar:

the ASF is against checking in binary jars


http://gerrit.cloudera.org:8080/#/c/7258/7/java/gradlew.bat
File java/gradlew.bat:

Line 1: @if "%DEBUG%" == "" @echo off
can we add these files to the RAT excludes file?


http://gerrit.cloudera.org:8080/#/c/7258/7/java/kudu-client/src/test/java/org/apache/kudu/client/TestTimeouts.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestTimeouts.java:

Line 35:   @Ignore
why is this change in here? if you want to disable a test please do it in a 
separate unrelated commit.


http://gerrit.cloudera.org:8080/#/c/7258/7/java/kudu-spark/build.gradle
File java/kudu-spark/build.gradle:

Line 40:   } else {
is it possible to be explicit about == "2" and then give a build error if it is 
anything else?


Line 50: } else {
same


-- 
To view, visit http://gerrit.cloudera.org:8080/7258
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib257cdd019a1f383c886b9238bb47d96576c4421
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] thirdparty: use ninja when possible

2017-07-18 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: thirdparty: use ninja when possible
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7461/1//COMMIT_MSG
Commit Message:

PS1, Line 10: less ugly output.
What's the difference here? And is it still less ugly when outputting to 
something that's not a real terminal?


http://gerrit.cloudera.org:8080/#/c/7461/1/thirdparty/build-definitions.sh
File thirdparty/build-definitions.sh:

Line 54: # If ninja is on the path, use that instead of make for a slightly
Would prefer if this happened in build-thirdparty.sh; this file  only defines 
functions, it doesn't actually run anything when sourced.


Line 97: build_cmake() {
What about here? If it can't work, add a comment explaining why?


-- 
To view, visit http://gerrit.cloudera.org:8080/7461
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idacd6d8bc9ad371c3c925f7c628ad69785bb870e
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: Yes


[kudu-CR] disk failure: coordinate error handling

2017-07-18 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: disk failure: coordinate error handling
..


Patch Set 20:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/7029/20/src/kudu/fs/data_dirs.h
File src/kudu/fs/data_dirs.h:

Line 297:   uint16_t* GetUuidIdxForUuid(const std::string& uuid) {
I looked through here and I believe this is safe.

That said, this API is a bit scary at first glance. What lifetime guarantees do 
we make about the pointee here? (To me it looks like the ptr is good as long as 
the DataDirManager has been opened and was not yet destroyed.) We should at 
least document them.

Still, I think it would be a little more intuitive if this returned a bool or a 
Status and took a uint16_t* out-parameter.


http://gerrit.cloudera.org:8080/#/c/7029/20/src/kudu/fs/error_manager.h
File src/kudu/fs/error_manager.h:

Line 34: typedef Callback ErrorNotificationCb;
doc callback argument


Line 57:   void RunErrorNotificationCb(const std::string& uuid) const {
nit: arguments need docs


Line 58: notify_cb_.Run(uuid);
It makes me a little nervous not to submit this to a thread pool. We'd have to 
be careful to release any locks held which may not be easy to do at a low level 
if we are calling back into a higher level class. Although I guess if we don't 
need it then "YAGNI"


Line 61:   void RunErrorNotificationCb(const DataDir* dir) const {
Can we remove this one?


-- 
To view, visit http://gerrit.cloudera.org:8080/7029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib3169deada702c527b70fabf9d2223364f9a9ea6
Gerrit-PatchSet: 20
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR](branch-1.3.x) [security] fixed shortened TSK validity interval

2017-07-18 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: [security] fixed shortened TSK validity interval
..


Patch Set 1:

Hey Alexey. That link isn't publicly accessible, maybe worth explaining what 
the problem is?

IIRC this patch basically fixes it so that the default is the proper 7 days 
instead of an accidental 6 days. I don't see a big benefit to making this 
backport, since the real fix is to do the automatic renewal which is added in 
later versions, but maybe I'm missing something.

-- 
To view, visit http://gerrit.cloudera.org:8080/7411
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I84f9789276c4b48a3ba5274393fe30c8bf3ea85d
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: branch-1.3.x
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] thirdparty: use ninja when possible

2017-07-18 Thread Todd Lipcon (Code Review)
Hello Dan Burkert, Adar Dembo,

I'd like you to do a code review.  Please visit

http://gerrit.cloudera.org:8080/7461

to review the following change.

Change subject: thirdparty: use ninja when possible
..

thirdparty: use ninja when possible

Using ninja to build LLVM, etc, is a bit faster than using make, and
produces less ugly output.

Change-Id: Idacd6d8bc9ad371c3c925f7c628ad69785bb870e
---
M thirdparty/build-definitions.sh
1 file changed, 23 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/61/7461/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7461
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Idacd6d8bc9ad371c3c925f7c628ad69785bb870e
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 


[kudu-CR] Upgrade to cmake 3.9.0

2017-07-18 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change.

Change subject: Upgrade to cmake 3.9.0
..


Patch Set 1:

Yea the main reason is that it fixes false dependencies on compilation between 
modules due to this bug I reported 2 years ago: 
https://cmake.org/Bug/view.php?id=1

Here are some timings building kudu on an 88-core box using ninja:

hot ccache (no real difference)
---
cmake 3.6:
151.56user 39.47system 0:12.58elapsed 1518%CPU (0avgtext+0avgdata 
271912maxresident)k
256inputs+4433488outputs (1major+6657138minor)pagefaults 0swaps

cmake 3.9:
158.52user 38.55system 0:12.78elapsed 1541%CPU (0avgtext+0avgdata 
271732maxresident)k
0inputs+4464520outputs (0major+6715240minor)pagefaults 0swaps


cold ccache (big speedup with cmake 3.9)

cmake 3.6:
2742.24user 192.17system 1:31.36elapsed 3211%CPU (0avgtext+0avgdata 
466932maxresident)k
0inputs+14842640outputs (0major+58352987minor)pagefaults 0swaps

cmake 3.9:
3461.34user 219.52system 0:57.90elapsed 6356%CPU (0avgtext+0avgdata 
471408maxresident)k
0inputs+14838200outputs (0major+60222932minor)pagefaults 0swaps

-- 
To view, visit http://gerrit.cloudera.org:8080/7460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e5504bcb136e70a5e5c83ee95c98df3cad4d3bd
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: No


[kudu-CR] Upgrade to cmake 3.9.0

2017-07-18 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: Upgrade to cmake 3.9.0
..


Patch Set 1:

> Any particular reason to want the new one?

Particularly, if a future patch introduces a dependency on a cmake feature new 
in 3.9, we should also update cmake_minimum_required() in CMakeLists.txt

-- 
To view, visit http://gerrit.cloudera.org:8080/7460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e5504bcb136e70a5e5c83ee95c98df3cad4d3bd
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No


[kudu-CR] Upgrade to cmake 3.9.0

2017-07-18 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change.

Change subject: Upgrade to cmake 3.9.0
..


Patch Set 1: Code-Review+2

Any particular reason to want the new one?

-- 
To view, visit http://gerrit.cloudera.org:8080/7460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6e5504bcb136e70a5e5c83ee95c98df3cad4d3bd
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: No


[kudu-CR] Upgrade to cmake 3.9.0

2017-07-18 Thread Todd Lipcon (Code Review)
Hello Dan Burkert, Adar Dembo,

I'd like you to do a code review.  Please visit

http://gerrit.cloudera.org:8080/7460

to review the following change.

Change subject: Upgrade to cmake 3.9.0
..

Upgrade to cmake 3.9.0

Change-Id: I6e5504bcb136e70a5e5c83ee95c98df3cad4d3bd
---
M thirdparty/vars.sh
1 file changed, 1 insertion(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/60/7460/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7460
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I6e5504bcb136e70a5e5c83ee95c98df3cad4d3bd
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 


[kudu-CR] KUDU-2033 (part 1). Add write and stop/start in a loop to TestLeaderFailover.

2017-07-18 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change.

Change subject: KUDU-2033 (part 1). Add write and stop/start in a loop to 
TestLeaderFailover.
..


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/7456/2//COMMIT_MSG
Commit Message:

Line 7: KUDU-2033 (part 1). Add write and stop/start in a loop to 
TestLeaderFailover.
nit: Avoid period at end of subject line in a commit message


http://gerrit.cloudera.org:8080/#/c/7456/2/java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java
File java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java:

PS2, Line 351: Integer
Can we return an int instead of an Integer here and below?

I can't see any reason to use a boxed primitive here.


http://gerrit.cloudera.org:8080/#/c/7456/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestLeaderFailover.java
File 
java/kudu-client/src/test/java/org/apache/kudu/client/TestLeaderFailover.java:

Line 42:* This test writes 3 rows. Then in a loop, it kills the leader, 
then tries to write inner_row rows. Verifying
nit: line length should be <= 100 chars


Line 49:   public void testFailover() throws Exception {
Can we keep the existing simple test and add an additional test that uses this 
"stress test" approach?


Line 51: int startRows = 3;
style: use final and all caps for constants


Line 67:   for (int j = i; j < i + innerRows; j++) {
I find this logic a bit difficult to follow. I'd suggest keeping a counter like 
currentRows and just increment that each time you insert a row, then use simple 
i and j counters as needed to control the number of iterations per loop.

something like:

int currentRows = 0;
for (int i = 0; i < startRows; i++) {
  session.apply(createBasicSchemaInsert(table, i));
  currentRows++;
}

for (int i = 0; i < numIterations; i++) {
  // ...
  for (int j = 0; j < rowsPerIteration; j++) {
session.apply(...);
currentRows++;
  }
}

// ...
assertEquals(totalRowsToInsert, countRowsInScan(scanner));


Line 77:   assertEquals(i + innerRows, countRowsInScan(scanner));
these assertions should probably be assert eventually


-- 
To view, visit http://gerrit.cloudera.org:8080/7456
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ie93a15084a4c4c0748dc74c005b8313f443a5ba9
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Edward Fancher 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2033 (part 1). Add write and stop/start in a loop to TestLeaderFailover.

2017-07-18 Thread Edward Fancher (Code Review)
Edward Fancher has uploaded a new patch set (#2).

Change subject: KUDU-2033 (part 1). Add write and stop/start in a loop to 
TestLeaderFailover.
..

KUDU-2033 (part 1). Add write and stop/start in a loop to TestLeaderFailover.

TestLeaderFailover is made more robust by adding a write loop in which
the tablet leader is killed and restarted. The main idea is to check
that the writes complete and can be read after the leader is killed.
Done in a loop so we can be sure that we stay stable through multiple
stops and restarts of the leader tablet.

Change-Id: Ie93a15084a4c4c0748dc74c005b8313f443a5ba9
---
M java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestLeaderFailover.java
2 files changed, 29 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/56/7456/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7456
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie93a15084a4c4c0748dc74c005b8313f443a5ba9
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Edward Fancher 
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] KUDU-2033. Add a 'torture' scenario to verify Java client's behavior during fail-over

2017-07-18 Thread Edward Fancher (Code Review)
Edward Fancher has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7456

Change subject: KUDU-2033. Add a 'torture' scenario to verify Java client's 
behavior during fail-over
..

KUDU-2033. Add a 'torture' scenario to verify Java client's behavior
during fail-over

TestLeaderFailover is made more robust by adding a write loop in which
the tablet leader is killed and restarted. The main idea is to check
that the writes complete and can be read after the leader is killed.
Done in a loop so we can be sure that we stay stable through multiple
stops and restarts of the leader tablet.

Change-Id: Ie93a15084a4c4c0748dc74c005b8313f443a5ba9
---
M java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestLeaderFailover.java
2 files changed, 29 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/56/7456/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7456
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie93a15084a4c4c0748dc74c005b8313f443a5ba9
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Edward Fancher 


[kudu-CR] kudu-spark-tools: Spark tool for Import & Export different format of files such as parquet,avro,csv in and to from kudu tables kudu-client-tools: mapreduced base export to csv and import par

2017-07-18 Thread Jean-Daniel Cryans (Code Review)
Jean-Daniel Cryans has posted comments on this change.

Change subject: kudu-spark-tools: Spark tool for Import & Export different 
format of files such as parquet,avro,csv in and to from kudu tables 
kudu-client-tools: mapreduced base export to csv and import parquet files.
..


Patch Set 2:

(29 comments)

http://gerrit.cloudera.org:8080/#/c/7421/2//COMMIT_MSG
Commit Message:

Line 7: kudu-spark-tools: Spark tool for Import & Export different format of 
files such as parquet,avro,csv in and to from kudu tables kudu-client-tools: 
mapreduced base export to csv and import parquet files.
Please follow best practices for git commit messages, such as 
https://chris.beams.io/posts/git-commit/


http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsv.java
File 
java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsv.java:

PS2, Line 35: a CSV files.
nit: is there one or many files?


Line 39: public class ExportCsv extends Configured implements Tool {
This needs a test.


PS2, Line 50: The current configuration.
I know the ImportCsv* classes is like this, but the javadoc format we use 
dictates starting @param line with a lower case and and not ending with a 
period. See 
http://www.oracle.com/technetwork/java/javase/documentation/index-137868.html#format

No need to fix ImportCsv in this patch but you'd be welcome to contribute a 
separate patch for it :)


PS2, Line 85: with columns
nit: rephrase to something like "the given table and columns into..."


PS2, Line 86: "T
nit: mind putting this on a new line? Then the code looks like what the error 
would look like.


PS2, Line 86: kudu
nit: Kudu


http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java
File 
java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ExportCsvMapper.java:

Line 32: 
nit: remove the extra blank line


PS2, Line 48: (i.e., the parsing).
That made more sense in ImportCsbMapper, I'd just remove this javadoc 
completely.


Line 54: this.separator = conf.get(ExportCsv.SEPARATOR_CONF_KEY);
single line:

this.separator = conf.get(ExportCsv.SEPARATOR_CONF_KEY, 
ExportCsv.DEFAULT_SEPARATOR);


PS2, Line 61: Insert
nits:
s/Convert/Converts
s/Insert/RowResult
also end the sentence with a period.


Line 74:* converts RowResult $this.separator string
Please revise this whole javadoc section. FWIW it might not be necessary, since 
it's a private method. Also it's not doing anything surprising.


Line 114: default:
Missing UNIXTIME_MICROS?


http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java
File 
java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquet.java:

PS2, Line 43: PARQUET
nit: Apache Parquet


Line 47: public class ImportParquet extends Configured implements Tool {
This needs a test.


Line 59:* @param conf The current configuration.
Same comment as before regarding javadoc.


Line 75: LOG.info(schema);
Did you forget to remove this?


Line 88: FileInputFormat.setInputPaths(job, inputDir);
You could run some pre-flight checks like making sure that the columns match.


PS2, Line 105: PARQUET
Apache Parquet


http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java
File 
java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ImportParquetMapper.java:

PS2, Line 39: PARQUET lines and turns them into Kudu Inserts.
some comments as before regarding PARQUET and Inserts


Line 101:   case DOUBLE:
UNIXTIME_MICROS would be recognized but not supported, someone might have 
TIMESTAMP and think it's compatible, maybe you can catch that before like in 
createSubmittableJob?


http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ParquetReadSupport.java
File 
java/kudu-client-tools/src/main/java/org/apache/kudu/mapreduce/tools/ParquetReadSupport.java:

Line 27:  * Created by sany on 26/06/2017 AD.
Remove.


PS2, Line 29:  
nit


Line 37: Set counts = context.getKeyValueMetadata().get("my.count");
What's this about?


Line 38: // assertTrue("counts: " + counts, counts.size() > 0);
Remove.


http://gerrit.cloudera.org:8080/#/c/7421/2/java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala
File 
java/kudu-spark-tools/src/main/scala/org/apache/kudu/spark/tools/ImportExportFiles.scala:

Line 1: 
remove this blank line


Line 18: package org.apache.kudu.spark.tools
add a blank line


Line 24: import org.slf4j.{Logger, LoggerFactory}
please re-order this


Line 29: object ImportExportKudu {
I'm less familiar with scala but at least we'll need a test.


-- 
To view, visit 

[kudu-CR] disk failure: coordinate error handling

2017-07-18 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change.

Change subject: disk failure: coordinate error handling
..


Patch Set 16:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7029/16/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

Line 1067:   uint16_t* uuid_idx = 
fs_manager_->dd_manager()->GetUuidIdxForUuid(uuid);
> Maybe pull out fs_manager_->dd_manager() into a local variable so you don't
Did you miss this?


http://gerrit.cloudera.org:8080/#/c/7029/20/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

PS20, Line 1074: (const st
Didn't think this case was possible.


http://gerrit.cloudera.org:8080/#/c/7029/20/src/kudu/tserver/ts_tablet_manager.h
File src/kudu/tserver/ts_tablet_manager.h:

Line 188: 
I still think it'd be OK to make this private and wire/unwire in Init/Shutdown 
(as opposed to the ctor/dtor). Did Todd object to that specific kind of wiring? 
I thought the 'weirdness' he felt had to do with side effects in the ctor.


-- 
To view, visit http://gerrit.cloudera.org:8080/7029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ib3169deada702c527b70fabf9d2223364f9a9ea6
Gerrit-PatchSet: 16
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2065: Support cancellation for outbound RPC call

2017-07-18 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change.

Change subject: KUDU-2065: Support cancellation for outbound RPC call
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7455/1/src/kudu/rpc/outbound_call.cc
File src/kudu/rpc/outbound_call.cc:

PS1, Line 377:   delete [] header_buf_.release();
 :   sidecars_.clear();
Consider doing the same thing for other Set*() functions or moving it to 
CallCallback() ? May also wanna set controller_ field to null as it may be 
freed by the callback.


-- 
To view, visit http://gerrit.cloudera.org:8080/7455
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iaf53c5b113de10d573bd32fb9b2293572e806fbf
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Michael Ho 
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2065: Support cancellation for outbound RPC call

2017-07-18 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7455

Change subject: KUDU-2065: Support cancellation for outbound RPC call
..

KUDU-2065: Support cancellation for outbound RPC call

This change implements a new interface Proxy::Cancel()
which takes a RpcController* as argument and cancels
any pending OutboundCall associated with it.

Proxy::Cancel() queues a cancellation task scheduled on
the reactor thread for that outbound call. Once the task
is run, it will cancel the outbound call right away if
the RPC hasn't started sending yet or if it has already
sent the request and waiting for a response. If cancellation
happens when the RPC request is being sent, the RPC will
be cancelled only after the RPC has finished sending the
request. If the RPC is finished, the cancellation will
be a no-op.

Change-Id: Iaf53c5b113de10d573bd32fb9b2293572e806fbf
---
M java/kudu-client/src/main/java/org/apache/kudu/client/Status.java
M src/kudu/common/wire_protocol.cc
M src/kudu/common/wire_protocol.proto
M src/kudu/rpc/connection.cc
M src/kudu/rpc/connection.h
M src/kudu/rpc/messenger.cc
M src/kudu/rpc/messenger.h
M src/kudu/rpc/outbound_call.cc
M src/kudu/rpc/outbound_call.h
M src/kudu/rpc/proxy.cc
M src/kudu/rpc/proxy.h
M src/kudu/rpc/reactor.cc
M src/kudu/rpc/reactor.h
M src/kudu/rpc/rpc-test-base.h
M src/kudu/rpc/rpc-test.cc
M src/kudu/rpc/rpc_introspection.proto
M src/kudu/util/status.cc
M src/kudu/util/status.h
18 files changed, 350 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/55/7455/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7455
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Iaf53c5b113de10d573bd32fb9b2293572e806fbf
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Michael Ho 


[kudu-CR] disk failure: tests for disk failure recovery

2017-07-18 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change.

Change subject: disk failure: tests for disk failure recovery
..


Patch Set 7:

(34 comments)

http://gerrit.cloudera.org:8080/#/c/7031/3//COMMIT_MSG
Commit Message:

PS3, Line 16: exercises 
> typo
Done


http://gerrit.cloudera.org:8080/#/c/7031/3/src/kudu/integration-tests/disk_failure-itest.cc
File src/kudu/integration-tests/disk_failure-itest.cc:

Line 26: #include "kudu/gutil/map-util.h"
> warning: #includes are not sorted properly [llvm-include-order]
Done


Line 38: 
> warning: using decl 'map' is unused [misc-unused-using-decls]
Done


PS3, Line 61: 
> nit: indent
These have been removed to use TestWorkload instead


PS3, Line 62: 
> nit: using...
same


Line 81:   NO_FATALS(CreateTable(table_id));
> warning: redundant 'test_info_' declaration [readability-redundant-declarat
seems harmless?


PS3, Line 82:   WaitForTSAndReplicas(table_id);
: }
:   }
: 
:   // Sets up a cluster with three servers with two disks each.
:   vector SetupDefaultCluster() {
: FLAGS_num_tablet_servers = 3;
: CreateCluster("survivable_cluster", kDefaultFlags, {}, /* 
num_data_dirs */ 2);
: CreateClient(_);
: vector tservers;
: AppendValuesFromMap(tablet_servers_, );
: CHECK_EQ(3, tservers.size());
: vector ext_tservers;
: for (auto* details : tservers) {
:   
ext_tservers.push_back(cluster_->tablet_server_by_uuid(details->uuid()));
: }
: return ext_tservers;
> nit move this comment to before the test declaration, or at least give an i
Done


PS3, Line 150:   if (counts_after - counts_before > 0) {
 : dirs_written->push_back(e.first);
 :   } else {
 : dirs_not_written->push_back(e.first);
 :   }
 : }
 :   }
> this is a bit funky (depending on the number of files) any way we can be mo
Done. I've changed it to count the files before and compare it with the counts 
after running whatever function.


PS3, Line 214: d on tw
> tablet servers
Done


http://gerrit.cloudera.org:8080/#/c/7031/6/src/kudu/integration-tests/disk_failure-itest.cc
File src/kudu/integration-tests/disk_failure-itest.cc:

Line 28: #include "kudu/integration-tests/test_workload.h"
> warning: #includes are not sorted properly [llvm-include-order]
Done


Line 66: string GlobForBlockFileInDataDir(const string& data_dir) {
> warning: invalid case style for global constant 'ts_flags' [readability-ide
Done


Line 101:   void SetServerSurvivalFlags(vector& 
ext_tservers) {
> warning: non-const reference parameter 'ext_tservers', make it const or use
Done


Line 126:   const 
std::function& f,
> warning: the parameter 'f' is copied for each invocation but only used as a
Done


Line 165:   void GetDataDirsWrittenToByWorkload(const 
vector ext_tservers,
> warning: the const qualified parameter 'ext_tservers' is copied for each in
Done


Line 166:   TestWorkload* workload,
> warning: non-const reference parameter 'workload', make it const or use a p
Done


http://gerrit.cloudera.org:8080/#/c/7031/7/src/kudu/integration-tests/disk_failure-itest.cc
File src/kudu/integration-tests/disk_failure-itest.cc:

Line 30: #include "kudu/util/path_util.h"
> warning: #includes are not sorted properly [llvm-include-order]
Done


PS7, Line 51: uint32_t
> Simple int would be fine here, I think.
Done


Line 67:   if (FLAGS_block_manager == "file") {
> Would it be useful to parameterize these tests and run them on both block m
Done


PS7, Line 68: **
> Isn't this middle entry also '??'? I thought block paths were data/ab/cd/ab
Done


Line 91: vector tservers;
> Maybe omit this and iterate on the tablet_servers_ map directly?
Done


Line 101:   void SetServerSurvivalFlags(vector& 
ext_tservers) {
> warning: non-const reference parameter 'ext_tservers', make it const or use
Done


Line 150:   if (counts_after - counts_before > 0) {
> Would if counts_after > counts_before be simpler?
Done


Line 165:   void GetDataDirsWrittenToByWorkload(const 
vector ext_tservers,
> warning: the const qualified parameter 'ext_tservers' is copied for each in
Done


Line 176: workload->StopAndJoin();
> IIUC, we're writing just enough to figure out who is writing where, and the
Yes, exactly


Line 184:   ext_tserver->GetInt64Metric(_ENTITY_server, nullptr, 
_data_dirs_failed,
> Need ASSERT_OK here too?
Done


Line 200: ASSERT_OK(row->SetInt32(0, i + 

[kudu-CR] disk failure: tests for disk failure recovery

2017-07-18 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7031

to look at the new patch set (#10).

Change subject: disk failure: tests for disk failure recovery
..

disk failure: tests for disk failure recovery

This patch adds an EMC test that spawns three servers and triggers EIOs
on two of them to fail two different tablets. With improper
disk-failure-handling, this scenario alone would have been enough to
leave the server with only a single copy of data, as the two servers
with EIOs would have been shut down entirely.

With proper disk-failure handling, this scenario would be salvageable,
and data would be replicated on the remaining disks. This exercises the
FlushMRS codepath.

Tests are also added to test behavior during FlushDMS calls and during
scans, ensuring the servers return to a normal state. All of these tests
are parameterized to run with both the LBM and FBM.

Change-Id: I8ff63ec71ab718866484b9f3ec7264bc72ecfe97
---
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/disk_failure-itest.cc
2 files changed, 382 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/31/7031/10
-- 
To view, visit http://gerrit.cloudera.org:8080/7031
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8ff63ec71ab718866484b9f3ec7264bc72ecfe97
Gerrit-PatchSet: 10
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] disk failure: coordinate error handling

2017-07-18 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7029

to look at the new patch set (#20).

Change subject: disk failure: coordinate error handling
..

disk failure: coordinate error handling

I/O errors are spawned when calls to env functions fail. The response to
these errors may vary and may span many layers of abstraction.

For instance, if an EIO were triggered while writing a block, one
response might be to shut down the tablet whose operation just failed.
In order to do this without messing up the layering, a new entity, the
ErrorManager is added to coordinate error handling.

Callbacks are registered with the ErrorManager as responses to errors,
and callers to at-risk functions trigger these callbacks as errors are
spawned.

This patch includes the ErrorManager and its placement w.r.t. the
BlockManager, FsManager, TSTabletManager, etc. Actual handling and
testing will come in a later patch.

This is a part of a series of patches to handle disk failure. See
section 2.2 of:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: Ib3169deada702c527b70fabf9d2223364f9a9ea6
---
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/block_manager-test.cc
M src/kudu/fs/block_manager.h
M src/kudu/fs/data_dirs.h
A src/kudu/fs/error_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/file_block_manager.h
M src/kudu/fs/fs_manager.cc
M src/kudu/fs/fs_manager.h
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/fs/log_block_manager.h
M src/kudu/tserver/tablet_server.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
15 files changed, 188 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/29/7029/20
-- 
To view, visit http://gerrit.cloudera.org:8080/7029
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib3169deada702c527b70fabf9d2223364f9a9ea6
Gerrit-PatchSet: 20
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] WIP disk failure: forced shutdown of replicas

2017-07-18 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7439

to look at the new patch set (#4).

Change subject: WIP disk failure: forced shutdown of replicas
..

WIP disk failure: forced shutdown of replicas

When a disk fails, replicas with data on the failed disk must shut
down. Normally, when a replica is shutdown, it must wait for all
transactions to finish. However, transactions that fail due to disk
failure will not complete.

This patch adds the ability to Cancel transactions. Unlike Aborts, which
cannot end a transaction that is in the APPLYING state (must be
RESERVED), Cancels can end any transaction. This is only "safe" because
the tablet to whom the MvccManager belongs is expected to shutdown.

In addition to this, a new tablet state is added: FAILED_AND_SHUTDOWN.
Like QUIESCING and SHUTDOWN, TabletReplica::Shutdown() can wait on
FAILED_AND_SHUTDOWN. This is needed to shut down failed tablets that
need to be replicated: calling Shutdown() cannot leave the replica in
the FAILED state, and the SHUTDOWN state cannot itself indicate the
need for a tablet copy.

WIP because there is no testing and I'm not sure this is the best design
for what I'm trying to accomplish.

This is part of a series of patches to address disk failure. To see how
this patch fits in, see section 2.3 of:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit

Change-Id: I983620f27e7226806a2cca253db7619731914d42
---
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/mvcc.cc
M src/kudu/tablet/mvcc.h
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tablet/transactions/alter_schema_transaction.h
M src/kudu/tablet/transactions/transaction.h
M src/kudu/tablet/transactions/transaction_driver.h
M src/kudu/tablet/transactions/transaction_tracker-test.cc
M src/kudu/tablet/transactions/transaction_tracker.cc
M src/kudu/tablet/transactions/transaction_tracker.h
M src/kudu/tablet/transactions/write_transaction.cc
M src/kudu/tablet/transactions/write_transaction.h
M src/kudu/tserver/ts_tablet_manager.cc
14 files changed, 88 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/39/7439/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I983620f27e7226806a2cca253db7619731914d42
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] disk failure: reassign tablets due to disk failure

2017-07-18 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7440

to look at the new patch set (#3).

Change subject: disk failure: reassign tablets due to disk failure
..

disk failure: reassign tablets due to disk failure

Tablets that are marked FAILED_AND_SHUTDOWN should eventually be
replicated, as these tablets have been shut down due to disk failure. If
a tablet is in this state and the tserver is responding to a request, no
response will be given, as to spur eviction.

This is useful, as a disk failure immediately marks a tablet as FAILED
(during which the tablet will be seen as TABLET_NOT_RUNNING) and
eventually shuts it down forcefully, leaving it in the state
FAILED_AND_SHUTDOWN (after which the tablet will be seen as
TABLET_FAILED and responses will not update the communication time).

This patch is a part of a series of patches to handle disk failure. See
section 2.5 in this doc:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit

Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345
---
M src/kudu/client/scanner-internal.cc
M src/kudu/consensus/consensus_peers.cc
M src/kudu/consensus/consensus_queue.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/tserver.proto
6 files changed, 48 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/40/7440/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7440
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5f61585b02fbe270d215bf7f49c0d390ceee3345
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] WIP disk failure: handle disk failures in blocks

2017-07-18 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7030

to look at the new patch set (#14).

Change subject: WIP disk failure: handle disk failures in blocks
..

WIP disk failure: handle disk failures in blocks

This patch adds the logic required to prevent a crash on disk failure
during I/O to blocks.

Block- and container-level functions that call env functions (e.g.
ReadV)  that may result in disk failure can now run callbacks to fail
and shutdown tablets in the failed directory.

Failures elswhere (e.g. when reading instance or tablet metadata) will
be handled in separate patches.

WIP: As failure handling relies on the addition of a handful of other
features, end-to-end testing of handling disk failure will come in a
later patch.

This is part of a series of patches to handle disk failure. See
section 2.2 of:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit?usp=sharing

Change-Id: Ia03bfb711a1b022d7516f4adb37fe9fb28ec949c
---
M src/kudu/fs/data_dirs.h
M src/kudu/fs/error_manager.h
M src/kudu/fs/file_block_manager.cc
M src/kudu/fs/log_block_manager.cc
M src/kudu/tablet/mvcc.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
M src/kudu/util/status.h
8 files changed, 239 insertions(+), 68 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/30/7030/14
-- 
To view, visit http://gerrit.cloudera.org:8080/7030
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia03bfb711a1b022d7516f4adb37fe9fb28ec949c
Gerrit-PatchSet: 14
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] disk failure: tests for disk failure recovery

2017-07-18 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7031

to look at the new patch set (#9).

Change subject: disk failure: tests for disk failure recovery
..

disk failure: tests for disk failure recovery

This patch adds an EMC test that spawns three servers and triggers EIOs
on two of them to fail two different tablets. With improper
disk-failure-handling, this scenario alone would have been enough to
leave the server with only a single copy of data, as the two servers
with EIOs would have been shut down entirely.

With proper disk-failure handling, this scenario would be salvageable,
and data would be replicated on the remaining disks. This exercises the
FlushMRS codepath.

Tests are also added to test behavior during FlushDMS calls and during
scans, ensuring the servers return to a normal state. All of these tests
are parameterized to run with both the LBM and FBM.

Change-Id: I8ff63ec71ab718866484b9f3ec7264bc72ecfe97
---
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/disk_failure-itest.cc
2 files changed, 380 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/31/7031/9
-- 
To view, visit http://gerrit.cloudera.org:8080/7031
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8ff63ec71ab718866484b9f3ec7264bc72ecfe97
Gerrit-PatchSet: 9
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] disk failure: don't fail CHECKs for disk failures

2017-07-18 Thread Andrew Wong (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7442

to look at the new patch set (#3).

Change subject: disk failure: don't fail CHECKs for disk failures
..

disk failure: don't fail CHECKs for disk failures

Disk failures are a special case of errors that will be handled. Certain
code paths pass along disk failure Statuses until they eventually hit a
CHECK and crash Kudu.

With this patch, these code paths will instead allow Kudu to continue
running, under the assumption that the errors are handled.

A small test is added to tablet-test to insert some data and fail.
Rather than crashing, the end-state of the tablet must indicated a disk
failure.

This is a part of a series of patches to handle disk failure. To see how
this patch fits in, see section 2.4 of:
https://docs.google.com/document/d/1zZk-vb_ETKUuePcZ9ZqoSK2oPvAAaEV1sjDXes8Pxgk/edit

Change-Id: I109635a54268b9db741b2ae9ea3e9f1fe072d0a8
---
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/local_tablet_writer.h
M src/kudu/tablet/tablet-test-base.h
M src/kudu/tablet/tablet-test.cc
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_replica_mm_ops.cc
M src/kudu/tablet/transactions/transaction_driver.cc
M src/kudu/tablet/transactions/write_transaction.cc
M src/kudu/tserver/ts_tablet_manager.cc
10 files changed, 130 insertions(+), 29 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/42/7442/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7442
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I109635a54268b9db741b2ae9ea3e9f1fe072d0a8
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: David Ribeiro Alves 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] [tools] Add a 'kudu tablet relocate' tool

2017-07-18 Thread Will Berkeley (Code Review)
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/7444

to look at the new patch set (#4).

Change subject: [tools] Add a 'kudu tablet relocate' tool
..

[tools] Add a 'kudu tablet relocate' tool

This patch adds a relocate tool that moves a tablet replica from
one tablet server to another. Usage:

kudu tablet change_config relocate_replica


It works by adding  to the configuration, waiting for it
to tablet copy, then removing . In the typical case,
this means the number of replicas will change 3 -> 4 -> 3, and
therefore the number of tolerable faults is 1 while the new
tablet bootstraps. As a result of this extra fragility, the tool
requires the tablet to be in "perfect health" when it runs,
meaning ksck returns no errors for the tablet, and also requires
the same after the copy is complete but before removing a
replica. This probably limits the usefulness of the tool to
rebalancing replicas within a healthy tablet.

Once pre-voters are implemented, it should be safe to allow the
tool to remove a replica just after adding one, without waiting
for the copy to finish, assuming the tablet stays healthy.

Additionally, this makes some minimal changes to ksck to allow
it to print to other output streams besides stdout. The purpose
was to allow the output to be suppressed when running the tool,
since the use of ksck is an implementation detail, and the
output is noisy. Change-Id:
I8684e4826bfeaf36b31d297ec1e49897705867f1

Change-Id: I3b7a724ba6e6a3d6fce96b220224d6e38a84
---
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/ksck_remote-test.cc
M src/kudu/tools/kudu-admin-test.cc
M src/kudu/tools/kudu-tool-test.cc
M src/kudu/tools/tool_action_tablet.cc
7 files changed, 388 insertions(+), 142 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/44/7444/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3b7a724ba6e6a3d6fce96b220224d6e38a84
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] [tools] Add a 'kudu tablet relocate' tool

2017-07-18 Thread Will Berkeley (Code Review)
Will Berkeley has posted comments on this change.

Change subject: [tools] Add a 'kudu tablet relocate' tool
..


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7444/1/src/kudu/tools/ksck.cc
File src/kudu/tools/ksck.cc:

Line 181:   } else {
> warning: do not use 'else' after 'return' [readability-else-after-return]
Done


Line 230:   } else {
> warning: do not use 'else' after 'return' [readability-else-after-return]
Done


Line 567:   } else {
> warning: do not use 'else' after 'return' [readability-else-after-return]
Done


http://gerrit.cloudera.org:8080/#/c/7444/1/src/kudu/tools/ksck_remote-test.cc
File src/kudu/tools/ksck_remote-test.cc:

Line 49: using strings::Substitute;
> warning: using decl 'Substitute' is unused [misc-unused-using-decls]
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7444
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3b7a724ba6e6a3d6fce96b220224d6e38a84
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley 
Gerrit-HasComments: Yes