[kudu-CR] Implement BloomFilter Predicate in server side.

2018-10-04 Thread Dan Burkert (Code Review)
Dan Burkert has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11100 )

Change subject: Implement BloomFilter Predicate in server side.
..


Patch Set 9:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate-test.cc
File src/kudu/common/column_predicate-test.cc:

http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate-test.cc@881
PS9, Line 881: InList
Should this be 'Equality'?


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.h
File src/kudu/common/column_predicate.h:

http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.h@230
PS9, Line 230:   bool EvaluateCellForBloomFilter(DataType type, const void* 
cell) const;
Is this used anywhere?  If it's test only please indicate that in the doc.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.h@376
PS9, Line 376:   typedef typename DataTypeTraits::cpp_type 
cpp_type;
 :   size_t size = sizeof(cpp_type);
 :   const void* data = cell;
 :   if (PhysicalType == BINARY) {
 : const Slice *slice = reinterpret_cast(cell);
 : size = slice->size();
 : data = slice->data();
 :   }
 :   Slice cell_slice(reinterpret_cast(data), 
size);
 :   BloomKeyProbe probe(cell_slice, bf.hash_algorithm());
This entire portion could be hoisted outside the for loop, which saves 
recomputing the probe in the case of multiple filters.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.h@386
PS9, Line 386:   if (!BloomFilter(bf.bloom_data(), 
bf.nhash()).MayContainKey(probe)) {
Once this for loop is simplified to just be this call, it should further 
simplify to just use std::all_of


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc
File src/kudu/common/column_predicate.cc:

http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@115
PS9, Line 115:   return ColumnPredicate(PredicateType::InBloomFilter, 
move(column), bfs, lower, upper);
I think this should be calling Simplify(), or if there's a reason not to please 
leave a comment.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@385
PS9, Line 385:   bloom_filters_.insert(bloom_filters_.end(), 
other.bloom_filters().begin(),
Pretty sure this can be simplified to a straight copy, eg:

bloom_filters_ = other.bloom_filters_;


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@387
PS9, Line 387:   if (other.lower_ != nullptr &&
This range portion of this code doesn't need to be duplicated if you move 
InBloomFilter above Range, and let it fall-through.  Make sure to annotate with 
FALLTHROUGH_INTENDED, though.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@588
PS9, Line 588:   if (new_values.empty()) {
Simplify() should already handle this, so I think you can reduce this to just 
the else clause.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@644
PS9, Line 644:   new_values.emplace_back(value);
can this use copy_if like above?  I think it's more elegant that way.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@648
PS9, Line 648: SetToNone();
likewise, I think this is already handled by the Simplify() call.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@661
PS9, Line 661:   if (other.lower_ != nullptr &&
likewise I think this can be simplified with the fallthrough trick.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@895
PS9, Line 895: case PredicateType::InBloomFilter: {
This could be simplified by moving it above Range and allowing it to fall 
through to check the range portion (after checking the annotations).  Annotate 
with FALLTHROUGH_INTENDED if you do that.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@899
PS9, Line 899:   auto bound_equal = [&] (const void* eleml, const void* 
elemr) {
This is nice, consider doing the same simplification for Range.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/common.proto
File src/kudu/common/common.proto:

http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/common.proto@347
PS9, Line 347: enum HashAlgorithmInBloomFilter {
I suggested this back in PS5 but I think there was some confusion.  I think you 
can call this just 'HashAlgorithm', and use it for both bloom filters as well 
as in HashBucketSchemaPB (where the existing one will be removed).  To do that 
compatibly the existing HashAlgorithm values have to remain, so it should be:

HashAlgorithm {
  UNKNOWN = 0;
  MURMUR_HASH_2 = 1;
  CITY_HASH = 

[kudu-CR] KUDU-2563: [spark] Use the scanner keepAlive API

2018-10-04 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11571 )

Change subject: KUDU-2563: [spark] Use the scanner keepAlive API
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11571/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala:

http://gerrit.cloudera.org:8080/#/c/11571/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala@35
PS3, Line 35:
> nit: remove.
Done



--
To view, visit http://gerrit.cloudera.org:8080/11571
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia7f26d6ab8deb24982055d247938a11e188c35db
Gerrit-Change-Number: 11571
Gerrit-PatchSet: 4
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Comment-Date: Fri, 05 Oct 2018 02:23:15 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2563: [spark] Use the scanner keepAlive API

2018-10-04 Thread Grant Henke (Code Review)
Grant Henke has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/11571 )

Change subject: KUDU-2563: [spark] Use the scanner keepAlive API
..

KUDU-2563: [spark] Use the scanner keepAlive API

Adds scheduled keepAlive calls to the scanner in the
KuduRDD RowIterator. The period in which the calls
are made is configurable via keepAlivePeriodMs and
has a default of 15 seconds (which is 1/4 the default
scanner ttl).

This implementation is similar to the Impala integration.
It checks if a call to the keepAlive API is needed as
it processes each row. Compared to a background
thread, this has the downside of being less consistently
scheduled and susceptible to scenarios in which a single
row takes longer to process than the ttl. However,
because the scanner is not thread safe, this is the most
straightforward solution and has been proven to work.

Change-Id: Ia7f26d6ab8deb24982055d247938a11e188c35db
Reviewed-on: http://gerrit.cloudera.org:8080/11571
Reviewed-by: Grant Henke 
Tested-by: Grant Henke 
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
M 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduRDDTest.scala
6 files changed, 139 insertions(+), 11 deletions(-)

Approvals:
  Grant Henke: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/11571
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ia7f26d6ab8deb24982055d247938a11e188c35db
Gerrit-Change-Number: 11571
Gerrit-PatchSet: 5
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 


[kudu-CR] KUDU-2563: [spark] Use the scanner keepAlive API

2018-10-04 Thread Grant Henke (Code Review)
Hello Mike Percy, Dan Burkert, Kudu Jenkins, Adar Dembo, Hao Hao,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11571

to look at the new patch set (#4).

Change subject: KUDU-2563: [spark] Use the scanner keepAlive API
..

KUDU-2563: [spark] Use the scanner keepAlive API

Adds scheduled keepAlive calls to the scanner in the
KuduRDD RowIterator. The period in which the calls
are made is configurable via keepAlivePeriodMs and
has a default of 15 seconds (which is 1/4 the default
scanner ttl).

This implementation is similar to the Impala integration.
It checks if a call to the keepAlive API is needed as
it processes each row. Compared to a background
thread, this has the downside of being less consistently
scheduled and susceptible to scenarios in which a single
row takes longer to process than the ttl. However,
because the scanner is not thread safe, this is the most
straightforward solution and has been proven to work.

Change-Id: Ia7f26d6ab8deb24982055d247938a11e188c35db
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
M 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduRDDTest.scala
6 files changed, 139 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/71/11571/4
--
To view, visit http://gerrit.cloudera.org:8080/11571
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia7f26d6ab8deb24982055d247938a11e188c35db
Gerrit-Change-Number: 11571
Gerrit-PatchSet: 4
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 


[kudu-CR] KUDU-2563: [spark] Use the scanner keepAlive API

2018-10-04 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11571 )

Change subject: KUDU-2563: [spark] Use the scanner keepAlive API
..


Patch Set 4: Verified+1 Code-Review+2

Carrying the +2 forward through the doc nit change.


--
To view, visit http://gerrit.cloudera.org:8080/11571
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia7f26d6ab8deb24982055d247938a11e188c35db
Gerrit-Change-Number: 11571
Gerrit-PatchSet: 4
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Comment-Date: Fri, 05 Oct 2018 02:23:36 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2563: [spark] Use the scanner keepAlive API

2018-10-04 Thread Hao Hao (Code Review)
Hao Hao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11571 )

Change subject: KUDU-2563: [spark] Use the scanner keepAlive API
..


Patch Set 3: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11571/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala:

http://gerrit.cloudera.org:8080/#/c/11571/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala@35
PS3, Line 35: .
nit: remove.



--
To view, visit http://gerrit.cloudera.org:8080/11571
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia7f26d6ab8deb24982055d247938a11e188c35db
Gerrit-Change-Number: 11571
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Comment-Date: Fri, 05 Oct 2018 00:53:09 +
Gerrit-HasComments: Yes


[kudu-CR] [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets

2018-10-04 Thread Will Berkeley (Code Review)
Hello Tidy Bot, Alexey Serbin, Attila Bukor, Kudu Jenkins, Andrew Wong, Adar 
Dembo, Grant Henke, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11554

to look at the new patch set (#11).

Change subject: [tools] KUDU-2179: Have ksck not use a single snapshot for all 
tablets
..

[tools] KUDU-2179: Have ksck not use a single snapshot for all tablets

ksck checksum scans allow the user to checksum with snapshot scans, so
that a checksum can be done even as tablets are mutated. It also allows
users to omit a snapshot timestamp. Previously, in this case, the
snapshot timestamp would be retrieved from some healthy tablet server at
the beginning of the checksum process, and used for every replica. This
didn't work well for checksumming large tables, because eventually the
snapshot timestamp fell before the ancient history mark, and subsequent
checksums scans would not be accepted by the tablet servers.

This changes how checksum scans work to address this problem:
1. A background process periodically updates timestamps from tablet
   servers.
2. The checksum process is reorganized so the replicas of one tablet
   are checksummed together.
3. When a tablet is about to be checksummed, and the checksum scan is a
   snapshot scan with no user-provided timestamp, the tablet is assigned
   an up-to-date timestamp from one of the tablet servers that hosts a
   replica. Every replica is then checksummed using this snapshot
   timestamp.
4. The original default timeout of 3600 seconds for a checksum scan is
   too low, but it didn't really matter because the default tablet
   history max age was 900 seconds. Now that checksum scans can continue
   for many hours, the default timeout is raised to 86400 seconds (1
   day), and a new idle timeout is added. If a checksum process does not
   checksum an additional row for this idle timeout (default 10
   minutes), it will idle time out.

Note that there is a new scheduling problem given #2: each tablet server
has a fixed number of slots for checksum scans, but every tablet server
hosting a replica must have a slot available before any replica's
checksum can start, so deciding in which order to checksum tablets and
how to find which are available to schedule is important. Given that the
bulk of the time in checksums is occupied waiting for tablet servers to
read lots of data off disk, materialize it as rows, and checksum it,
it's worth spending a lot of effort to make sure the cluster is fully
utilized given the scan concurrency constraints. So, the tool uses a
brute force approach and simply checks all tablets to see which can be
checksummed, any time a replica checksum finishes and frees a slot.
Tablets are considered in tablet id order. Since tablet ids are UUIDs,
there should be no correlation between a tablet's id and how its
replicas are distributed across tablet servers.

There are several tests added:
1. For the KUDU-2179 fix itself.
2. For the idle timeout.
3. For when a checksum finds mismatches. Yes, we didn't have a test for
   this before. After adding this test I saw that the output is a little
   confusing since it reported the number of replicas with mismatches
   rather than the number of tablets, so I altered the output to fix
   that.
4. A couple of tests exercising situations when all tablet servers are
   unavailable and when all peers of a tablet are unavailable.

I also checksummed a very large cluster with 500TB of data or so, across
about 37000 replicas. The checksum scan completed successfully after
more than 12 hours.

Change-Id: Iff0905c2099e6f56ed1cb651611918acbaf75476
---
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/ksck_checksum.cc
M src/kudu/tools/ksck_checksum.h
M src/kudu/tools/ksck_remote-test.cc
M src/kudu/tools/ksck_remote.cc
M src/kudu/tools/ksck_remote.h
7 files changed, 869 insertions(+), 276 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/11554/11
--
To view, visit http://gerrit.cloudera.org:8080/11554
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iff0905c2099e6f56ed1cb651611918acbaf75476
Gerrit-Change-Number: 11554
Gerrit-PatchSet: 11
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2563: [spark] Use the scanner keepAlive API

2018-10-04 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11571 )

Change subject: KUDU-2563: [spark] Use the scanner keepAlive API
..


Patch Set 3: Code-Review+2

Would be nice to get another review though.


--
To view, visit http://gerrit.cloudera.org:8080/11571
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia7f26d6ab8deb24982055d247938a11e188c35db
Gerrit-Change-Number: 11571
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Comment-Date: Thu, 04 Oct 2018 23:57:28 +
Gerrit-HasComments: No


[kudu-CR] [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets

2018-10-04 Thread Will Berkeley (Code Review)
Will Berkeley has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11554 )

Change subject: [tools] KUDU-2179: Have ksck not use a single snapshot for all 
tablets
..


Patch Set 9:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/11554/10/src/kudu/tools/ksck-test.cc
File src/kudu/tools/ksck-test.cc:

http://gerrit.cloudera.org:8080/#/c/11554/10/src/kudu/tools/ksck-test.cc@21
PS10, Line 21: #include 
> Unused? Or is it?
The 'timestamp_' member is an atomic. Also IWYU isn't complaining.


http://gerrit.cloudera.org:8080/#/c/11554/10/src/kudu/tools/ksck-test.cc@1588
PS10, Line 1588: new_uuid,
   :   
std::make_shared(new_uuid));
> nit: add a space
Done


http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc
File src/kudu/tools/ksck_checksum.cc:

http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc@298
PS7, Line 298: aut
> Ah, thanks for doing that. I don't think I'm a fan of that; it's verbose an
Done


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc
File src/kudu/tools/ksck_checksum.cc:

http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@353
PS9, Line 353:  :
> nit: spacing
Done


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@354
PS9, Line 354: DCHECK_GE(*slots_open, 0);
 : DCHECK_LE(*slots_open, opts_.scan_concurrency);
> These are redundant now that we're actually using the values we already DCH
They're technically not redundant but they are probably unnecessary.


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote-test.cc
File src/kudu/tools/ksck_remote-test.cc:

http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote-test.cc@416
PS9, Line 416: TestChecksumSnapshotLastingLongerThanAHM
> Did you try to run this with --stress-cpu-threads=8 and many iterations usi
1000 times, no failures, DEBUG

http://dist-test.cloudera.org/job?job_id=wdberkeley.1538696623.61201



--
To view, visit http://gerrit.cloudera.org:8080/11554
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iff0905c2099e6f56ed1cb651611918acbaf75476
Gerrit-Change-Number: 11554
Gerrit-PatchSet: 9
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Thu, 04 Oct 2018 23:52:48 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2563: [spark] Use the scanner keepAlive API

2018-10-04 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11571 )

Change subject: KUDU-2563: [spark] Use the scanner keepAlive API
..


Patch Set 2:

(7 comments)

> It would be good to double check that KuduScanner.keepAlive() is well behaved 
> when the scan is "in between" two tablets. Ideally that would do nothing, but 
> at the very least it shouldn't try to send a keep alive RPC into the ether 
> and then throw an exception when that fails.

This is already verified in the keepAlive API unit tests and given that there 
is no exposure to that here, I don't think it's worth testing here as well.

http://gerrit.cloudera.org:8080/#/c/11571/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala:

http://gerrit.cloudera.org:8080/#/c/11571/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala@125
PS1, Line 125:  * @param scanner the wrapped scanner
 :  * @param kuduContext the kudu context
> Update this.
Done


http://gerrit.cloudera.org:8080/#/c/11571/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala@141
PS1, Line 141:* Calls the keepAlive API on the current scanner if the 
keepAlivePeriodMs has passed.
> I wish I had remembered earlier, but this isn't safe; scanners aren't threa
Ah, right. Makes sense. I will switch to the read loop model then.


http://gerrit.cloudera.org:8080/#/c/11571/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala@161
PS1, Line 161: }
> Don't we need to do this if we were interrupted too? So maybe wrap the whol
We wouldn't have needed to, given the interrupt only exists for shutting down. 
But I am removing the thread model anyway.


http://gerrit.cloudera.org:8080/#/c/11571/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala:

http://gerrit.cloudera.org:8080/#/c/11571/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala@19
PS2, Line 19: import java.util.concurrent.Executors
> Can you double check that you still need all of these new imports?
Done


http://gerrit.cloudera.org:8080/#/c/11571/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala@146
PS2, Line 146:   LOG.error(s"Calling keepAlive on ${scanner.currentTablet}")
> For debugging only?
yep. removed.


http://gerrit.cloudera.org:8080/#/c/11571/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala@147
PS2, Line 147:   scanner.keepAlive()
> Is it OK if this fails and throws a KuduException?
yes, I would expect an exception to fail the job.


http://gerrit.cloudera.org:8080/#/c/11571/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala:

http://gerrit.cloudera.org:8080/#/c/11571/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala@53
PS1, Line 53:   val defaultKeepAlivePeriodMs: Long = 15000 // 25% of the 
default scanner ttl.
> Should doc the choice in value here, probably related to the default value
Done



--
To view, visit http://gerrit.cloudera.org:8080/11571
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia7f26d6ab8deb24982055d247938a11e188c35db
Gerrit-Change-Number: 11571
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Comment-Date: Thu, 04 Oct 2018 23:41:16 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2563: [spark] Use the scanner keepAlive API

2018-10-04 Thread Grant Henke (Code Review)
Hello Mike Percy, Dan Burkert, Kudu Jenkins, Adar Dembo,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11571

to look at the new patch set (#3).

Change subject: KUDU-2563: [spark] Use the scanner keepAlive API
..

KUDU-2563: [spark] Use the scanner keepAlive API

Adds scheduled keepAlive calls to the scanner in the
KuduRDD RowIterator. The period in which the calls
are made is configurable via keepAlivePeriodMs and
has a default of 15 seconds (which is 1/4 the default
scanner ttl).

This implementation is similar to the Impala integration.
It checks if a call to the keepAlive API is needed as
it processes each row. Compared to a background
thread, this has the downside of being less consistently
scheduled and susceptible to scenarios in which a single
row takes longer to process than the ttl. However,
because the scanner is not thread safe, this is the most
straightforward solution and has been proven to work.

Change-Id: Ia7f26d6ab8deb24982055d247938a11e188c35db
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
M 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduRDDTest.scala
6 files changed, 139 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/71/11571/3
--
To view, visit http://gerrit.cloudera.org:8080/11571
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia7f26d6ab8deb24982055d247938a11e188c35db
Gerrit-Change-Number: 11571
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 


[kudu-CR] docs: add a note about RAID-0

2018-10-04 Thread Andrew Wong (Code Review)
Andrew Wong has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/11580 )

Change subject: docs: add a note about RAID-0
..

docs: add a note about RAID-0

Adds a note that a single RAID-0 device with multiple disks is not
preferable to specifying multiple fs directories across multiple disks.

Change-Id: Ibb17fae9459be04211bd0a7db6eca8e94e443f71
Reviewed-on: http://gerrit.cloudera.org:8080/11580
Reviewed-by: Adar Dembo 
Tested-by: Kudu Jenkins
---
M docs/configuration.adoc
1 file changed, 5 insertions(+), 1 deletion(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Kudu Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/11580
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ibb17fae9459be04211bd0a7db6eca8e94e443f71
Gerrit-Change-Number: 11580
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] docs: add a note about RAID-0

2018-10-04 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11580 )

Change subject: docs: add a note about RAID-0
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/11580
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb17fae9459be04211bd0a7db6eca8e94e443f71
Gerrit-Change-Number: 11580
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Comment-Date: Thu, 04 Oct 2018 22:50:21 +
Gerrit-HasComments: No


[kudu-CR] docs: add a note about RAID-0

2018-10-04 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11580 )

Change subject: docs: add a note about RAID-0
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11580/2/docs/configuration.adoc
File docs/configuration.adoc:

http://gerrit.cloudera.org:8080/#/c/11580/2/docs/configuration.adoc@80
PS2, Line 80: If not specified, data blocks will
: be placed in the directory specified by `--fs_wal_dir`.
Nit: I'd put this ahead of the RAID note, since it deals with the semantics 
rather than the effects.



--
To view, visit http://gerrit.cloudera.org:8080/11580
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb17fae9459be04211bd0a7db6eca8e94e443f71
Gerrit-Change-Number: 11580
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Comment-Date: Thu, 04 Oct 2018 22:33:34 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2563: [spark] Use the scanner keepAlive API

2018-10-04 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11571 )

Change subject: KUDU-2563: [spark] Use the scanner keepAlive API
..


Patch Set 2:

(3 comments)

It would be good to double check that KuduScanner.keepAlive() is well behaved 
when the scan is "in between" two tablets. Ideally that would do nothing, but 
at the very least it shouldn't try to send a keep alive RPC into the ether and 
then throw an exception when that fails.

http://gerrit.cloudera.org:8080/#/c/11571/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala:

http://gerrit.cloudera.org:8080/#/c/11571/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala@19
PS2, Line 19: import java.util.concurrent.Executors
Can you double check that you still need all of these new imports?


http://gerrit.cloudera.org:8080/#/c/11571/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala@146
PS2, Line 146:   LOG.error(s"Calling keepAlive on ${scanner.currentTablet}")
For debugging only?


http://gerrit.cloudera.org:8080/#/c/11571/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala@147
PS2, Line 147:   scanner.keepAlive()
Is it OK if this fails and throws a KuduException?



--
To view, visit http://gerrit.cloudera.org:8080/11571
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia7f26d6ab8deb24982055d247938a11e188c35db
Gerrit-Change-Number: 11571
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 
Gerrit-Comment-Date: Thu, 04 Oct 2018 22:41:57 +
Gerrit-HasComments: Yes


[kudu-CR] docs: add a note about RAID-0

2018-10-04 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11580 )

Change subject: docs: add a note about RAID-0
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11580/2/docs/configuration.adoc
File docs/configuration.adoc:

http://gerrit.cloudera.org:8080/#/c/11580/2/docs/configuration.adoc@80
PS2, Line 80: es rather than delegating the
: striping to a RAID-0 array.
> Nit: I'd put this ahead of the RAID note, since it deals with the semantics
Done



--
To view, visit http://gerrit.cloudera.org:8080/11580
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb17fae9459be04211bd0a7db6eca8e94e443f71
Gerrit-Change-Number: 11580
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Comment-Date: Thu, 04 Oct 2018 22:37:50 +
Gerrit-HasComments: Yes


[kudu-CR] docs: add a note about RAID-0

2018-10-04 Thread Andrew Wong (Code Review)
Hello Alex Rodoni, Kudu Jenkins, Adar Dembo,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11580

to look at the new patch set (#3).

Change subject: docs: add a note about RAID-0
..

docs: add a note about RAID-0

Adds a note that a single RAID-0 device with multiple disks is not
preferable to specifying multiple fs directories across multiple disks.

Change-Id: Ibb17fae9459be04211bd0a7db6eca8e94e443f71
---
M docs/configuration.adoc
1 file changed, 5 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/80/11580/3
--
To view, visit http://gerrit.cloudera.org:8080/11580
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibb17fae9459be04211bd0a7db6eca8e94e443f71
Gerrit-Change-Number: 11580
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets

2018-10-04 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11554 )

Change subject: [tools] KUDU-2179: Have ksck not use a single snapshot for all 
tablets
..


Patch Set 10:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/11554/10/src/kudu/tools/ksck-test.cc
File src/kudu/tools/ksck-test.cc:

http://gerrit.cloudera.org:8080/#/c/11554/10/src/kudu/tools/ksck-test.cc@21
PS10, Line 21: #include 
Unused? Or is it?


http://gerrit.cloudera.org:8080/#/c/11554/10/src/kudu/tools/ksck-test.cc@1588
PS10, Line 1588: new_uuid,
   :   
std::make_shared(new_uuid));
nit: add a space


http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc
File src/kudu/tools/ksck_checksum.cc:

http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc@298
PS7, Line 298: ol
> If I do that I need to cast 'FLAGS_...' to match whatever is inferred for '
Ah, thanks for doing that. I don't think I'm a fan of that; it's verbose and 
probably not worth the cycles to read/ponder as a developer coming back to this 
code. I'm in favor of reverting it if you agree


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc
File src/kudu/tools/ksck_checksum.cc:

http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@353
PS9, Line 353:
nit: spacing


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@354
PS9, Line 354: slot_counts_to_decrement.push_back(slots_open);
 :   }
These are redundant now that we're actually using the values we already DCHECKed



--
To view, visit http://gerrit.cloudera.org:8080/11554
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iff0905c2099e6f56ed1cb651611918acbaf75476
Gerrit-Change-Number: 11554
Gerrit-PatchSet: 10
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Thu, 04 Oct 2018 21:20:06 +
Gerrit-HasComments: Yes


[kudu-CR] docs: update docs for CFile checksum handling

2018-10-04 Thread Andrew Wong (Code Review)
Andrew Wong has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/11581 )

Change subject: docs: update docs for CFile checksum handling
..

docs: update docs for CFile checksum handling

Notes that the behavior when encountering a CFile checksum has changed
in 1.8.0. I've kept around the manual steps, since they are still
valuable.

Change-Id: I11ecfe2739122f80894c5bbba13de853d962754a
Reviewed-on: http://gerrit.cloudera.org:8080/11581
Reviewed-by: Grant Henke 
Tested-by: Grant Henke 
Tested-by: Kudu Jenkins
---
M docs/troubleshooting.adoc
1 file changed, 7 insertions(+), 5 deletions(-)

Approvals:
  Grant Henke: Looks good to me, approved; Verified
  Kudu Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/11581
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I11ecfe2739122f80894c5bbba13de853d962754a
Gerrit-Change-Number: 11581
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] docs: add a note about RAID-0

2018-10-04 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11580 )

Change subject: docs: add a note about RAID-0
..


Patch Set 2: Verified+1

Overriding -1 since this is a docs patch


--
To view, visit http://gerrit.cloudera.org:8080/11580
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb17fae9459be04211bd0a7db6eca8e94e443f71
Gerrit-Change-Number: 11580
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Comment-Date: Thu, 04 Oct 2018 20:44:18 +
Gerrit-HasComments: No


[kudu-CR] [master] extra tests for the placement policy

2018-10-04 Thread Alexey Serbin (Code Review)
Hello Will Berkeley, Kudu Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11562

to look at the new patch set (#3).

Change subject: [master] extra tests for the placement policy
..

[master] extra tests for the placement policy

Added a few more tests for the replica placement logic.  This is to
verify how an extra replica is placed in case of a tablet with RF=5.

RF=5 test scenarios are interesting in exercising the replica placement
logic vs RF=3 scenarios because placing two replicas into one location
does not constitute a violation of placement policy in case of RF=5.
So, those RF=5 scenarios allows to verify how the placement logic works
during initial replica placement and while adding an extra replica
when it's possible to place more than one replica into same location.

This is a follow-up to ebb2852d99ed27c26e65c3569d5cb515754b2937.

Change-Id: I49748eca01743fb94846ccd611c5f1c7ddb20c04
---
M src/kudu/master/placement_policy-test.cc
M src/kudu/master/placement_policy.h
2 files changed, 200 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/62/11562/3
--
To view, visit http://gerrit.cloudera.org:8080/11562
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I49748eca01743fb94846ccd611c5f1c7ddb20c04
Gerrit-Change-Number: 11562
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] docs: add a note about RAID-0

2018-10-04 Thread Andrew Wong (Code Review)
Andrew Wong has removed a vote on this change.

Change subject: docs: add a note about RAID-0
..


Removed Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/11580
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: Ibb17fae9459be04211bd0a7db6eca8e94e443f71
Gerrit-Change-Number: 11580
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] [master] extra tests for the placement policy

2018-10-04 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11562 )

Change subject: [master] extra tests for the placement policy
..


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/11562/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11562/1//COMMIT_MSG@11
PS1, Line 11: The test scenario for RF=5 is chosen to exercise the logic where 
two
: replicas per a location does not constitute yet a placement policy
: violation.
> I'm not sure what this sentence says.
Done


http://gerrit.cloudera.org:8080/#/c/11562/1/src/kudu/master/placement_policy-test.cc
File src/kudu/master/placement_policy-test.cc:

http://gerrit.cloudera.org:8080/#/c/11562/1/src/kudu/master/placement_policy-test.cc@598
PS1, Line 598: 3
> I think it's worth naming this as 'num_replicas', here and below.
Done


http://gerrit.cloudera.org:8080/#/c/11562/1/src/kudu/master/placement_policy-test.cc@601
PS1, Line 601: ASSERT_GE(1, m.count("A_ts0") + m.count("A_ts1"));
 : ASSERT_GE(1, m.count("B_ts0") + m.count("B_ts1"));
 : ASSERT_GE(1, m.count("C_ts0"));
 : ASSERT_GE(1, m.count("D_ts0"));
 : ASSERT_GE(1, m.count("E_ts0"));
 : ASSERT_EQ(3, m.count("A_ts0") + m.count("A_ts1") +
 :  m.count("B_ts0") + m.count("B_ts1") +
 :  m.count("C_ts0") + m.count("D_ts0") + 
m.count("E_ts0"));
> Could you write a quick note explaining what the intent of the checks is, i
Done


http://gerrit.cloudera.org:8080/#/c/11562/1/src/kudu/master/placement_policy-test.cc@625
PS1, Line 625: talbet
> tablet
Done


http://gerrit.cloudera.org:8080/#/c/11562/1/src/kudu/master/placement_policy-test.cc@697
PS1, Line 697: ASSERT_TRUE(
 :   extra_ts->permanent_uuid() == "A_ts2" ||
 :   extra_ts->permanent_uuid() == "C_ts0")
> Why is this expected?
Among the locations where an additional replica can be placed, location A and 
location C have the least load.  As for the  preferences within location A, the 
replica is placed using power-of-two choices among the available servers.

I added corresponding comment.


http://gerrit.cloudera.org:8080/#/c/11562/1/src/kudu/master/placement_policy-test.cc@704
PS1, Line 704: // This is similar to PlaceExtraTablet_L5_TS10_RF5 but 16 tablet 
servers instead
 : // of 10 and slightly different replica distribution.
> What specifically is this testing then?
Actually, this assumed to be a test that verifies how evenly the replicas are 
placed among the servers in locations A and B.  I updated the test.


http://gerrit.cloudera.org:8080/#/c/11562/1/src/kudu/master/placement_policy-test.cc@726
PS1, Line 726: ASSERT_TRUE(extra_ts->permanent_uuid() == "A_ts1" ||
 : extra_ts->permanent_uuid() == "A_ts2" ||
 : extra_ts->permanent_uuid() == "A_ts3" ||
 : extra_ts->permanent_uuid() == "B_ts1" ||
 : extra_ts->permanent_uuid() == "B_ts2" ||
 : extra_ts->permanent_uuid() == "B_ts3")
> Why is this pick expected?
Updated the overall description.



--
To view, visit http://gerrit.cloudera.org:8080/11562
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I49748eca01743fb94846ccd611c5f1c7ddb20c04
Gerrit-Change-Number: 11562
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Thu, 04 Oct 2018 20:35:03 +
Gerrit-HasComments: Yes


[kudu-CR] [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets

2018-10-04 Thread Will Berkeley (Code Review)
Will Berkeley has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11554 )

Change subject: [tools] KUDU-2179: Have ksck not use a single snapshot for all 
tablets
..


Patch Set 7:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/11554/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11554/9//COMMIT_MSG@15
PS9, Line 15: before
> nit: before --> behind ?
'before' is ok because we're talking about (something sort of like) time.


http://gerrit.cloudera.org:8080/#/c/11554/9//COMMIT_MSG@60
PS9, Line 60: y large c
> BTW, what happens if _some_ of the peers are unavailable?  Do we need to ad
It's not special.


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck.h
File src/kudu/tools/ksck.h:

http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck.h@353
PS9, Line 353: timestamp_ = timestamp;
 :   }
 :
> Is it used at all?
Done


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.h
File src/kudu/tools/ksck_checksum.h:

http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.h@256
PS9, Line 256:   // be checksummed based in available slots on tablet servers.
 :   gscoped_ptr find_tablets_t
> nit: why not to move into std::atomic with those as well?
Done


http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc
File src/kudu/tools/ksck_checksum.cc:

http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc@358
PS7, Line 358:   shared_ptr messenger;
 :   rpc::MessengerBuilder builder("timestamp update");
 :   RETURN_NOT_OK(builder.Build());
> I never really saw the point of the class hierarchy here; we could have pok
Done


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc
File src/kudu/tools/ksck_checksum.cc:

http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@66
PS9, Line 66: DEFINE_int32(max_progress_report_wait_ms, 5000,
:  "Maximum number of milliseconds to wait between 
progress reports. "
:  "Used to speed up tests.");
: TAG_FLAG(max_progress_report_wait_ms, hidden);
> Could this be also useful while using the tool in the field?
I don't want to expose it because it's meant for tests. The updates are very 
short, there's really no reason to get them more or less often, and it just 
pollutes the option list with one nobody is really gonna use.


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@70
PS9, Line 70: timestamp_update_period_ms
> Does it make sense to keep it in milliseconds?  Maybe, a second-grain resol
I guess, but I don't even think it's worth changing.


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@86
PS9, Line 86: when the all the checksums for its replicas b
> nit: maybe, change to 'when the checksumming for all its replicas begins'
I don't think there's a meaningful difference there.


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@101
PS9, Line 101: listed in 'tservers'
> What about the reverse: is it OK to have a tablet server in 'tservers' that
Yes. It could be one with no replicas on it.


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@106
PS9, Line 106: CHECK(num_replicas);
> nit: maybe, DCHECK is enough, since it would crash anyway further down if i
I think the crash is cleaner if it's a CHECK failure. This isn't a hot path, 
anyway.


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@110
PS9, Line 110: tserver_uuid_set
> Consider using helper functions from map-util.h: those present more express
Done


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@187
PS9, Line 187: bool KsckChecksumManager::HasOpenTsSlotsUnlocked() {
> nit: add const specifier?
Done


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@383
PS9, Line 383: r (const auto& ts : 
> nit: use helper functions from map-util.h
Done


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote-test.cc
File src/kudu/tools/ksck_remote-test.cc:

http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote-test.cc@429
PS9, Line 429: or
> and
Done


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote-test.cc@428
PS9, Line 428:   #if defined(THREAD_SANITIZER) || defined(ADDRESS_SANITIZER)
 : LOG(WARNING) << "test is skipped in TSAN or ASAN builds";
 : return;
 :   #endif
> nit: shift this left one indent
Why? That looks really funny.


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote-test.cc@476
PS9, Line 476: MonoDelta::FromSeconds(timeout_sec),
 : MonoDelta::FromSeconds(timeout_sec),
 : scan_concurrency,
 

[kudu-CR] KUDU-2563: [spark] Use the scanner keepAlive API

2018-10-04 Thread Grant Henke (Code Review)
Hello Mike Percy, Dan Burkert, Kudu Jenkins, Adar Dembo,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11571

to look at the new patch set (#2).

Change subject: KUDU-2563: [spark] Use the scanner keepAlive API
..

KUDU-2563: [spark] Use the scanner keepAlive API

Adds scheduled keepAlive calls to the scanner in the
KuduRDD RowIterator. The period in which the calls
are made is configurable via keepAlivePeriodMs and
has a default of 15 seconds (which is 1/4 the default
scanner ttl).

This implementation is similar to the Impala integration.
It checks if a call to the keepAlive API is needed as
it processes each row. Compared to a background
thread, this has the downside of being less consistently
scheduled and susceptible to scenarios in which a single
row takes longer to process than the ttl. However,
because the scanner is not thread safe, this is the most
straightforward solution and has been proven to work.

Change-Id: Ia7f26d6ab8deb24982055d247938a11e188c35db
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
M java/kudu-client/src/main/java/org/apache/kudu/client/KuduScanner.java
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduRDD.scala
M 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduReadOptions.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduRDDTest.scala
6 files changed, 152 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/71/11571/2
--
To view, visit http://gerrit.cloudera.org:8080/11571
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia7f26d6ab8deb24982055d247938a11e188c35db
Gerrit-Change-Number: 11571
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Dan Burkert 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy 


[kudu-CR] docs: add a note about RAID-0

2018-10-04 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11580 )

Change subject: docs: add a note about RAID-0
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11580/1/docs/configuration.adoc
File docs/configuration.adoc:

http://gerrit.cloudera.org:8080/#/c/11580/1/docs/configuration.adoc@76
PS1, Line 76: Note that while RAID-0
: devices with multiple drives may perform better than single 
drives, using a
: single RAID-0 device instead of specifying multiple data 
directories may reduce
: the amount of parallelism with which Kudu can operate.
Maybe: "Note that while a single data directory backed by a RAID-0 device will 
outperform a single data directory backed by a single device, it is better to 
let Kudu manage its own striping over multiple storage devices rather than 
delegating the striping to a RAID-0 array."



--
To view, visit http://gerrit.cloudera.org:8080/11580
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb17fae9459be04211bd0a7db6eca8e94e443f71
Gerrit-Change-Number: 11580
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Comment-Date: Thu, 04 Oct 2018 19:56:10 +
Gerrit-HasComments: Yes


[kudu-CR] docs: update docs for CFile checksum handling

2018-10-04 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11581 )

Change subject: docs: update docs for CFile checksum handling
..


Patch Set 1: Verified+1 Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/11581
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I11ecfe2739122f80894c5bbba13de853d962754a
Gerrit-Change-Number: 11581
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Comment-Date: Thu, 04 Oct 2018 20:08:23 +
Gerrit-HasComments: No


[kudu-CR] [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets

2018-10-04 Thread Will Berkeley (Code Review)
Hello Tidy Bot, Alexey Serbin, Attila Bukor, Kudu Jenkins, Andrew Wong, Adar 
Dembo, Grant Henke, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11554

to look at the new patch set (#10).

Change subject: [tools] KUDU-2179: Have ksck not use a single snapshot for all 
tablets
..

[tools] KUDU-2179: Have ksck not use a single snapshot for all tablets

ksck checksum scans allow the user to checksum with snapshot scans, so
that a checksum can be done even as tablets are mutated. It also allows
users to omit a snapshot timestamp. Previously, in this case, the
snapshot timestamp would be retrieved from some healthy tablet server at
the beginning of the checksum process, and used for every replica. This
didn't work well for checksumming large tables, because eventually the
snapshot timestamp fell before the ancient history mark, and subsequent
checksums scans would not be accepted by the tablet servers.

This changes how checksum scans work to address this problem:
1. A background process periodically updates timestamps from tablet
   servers.
2. The checksum process is reorganized so the replicas of one tablet
   are checksummed together.
3. When a tablet is about to be checksummed, and the checksum scan is a
   snapshot scan with no user-provided timestamp, the tablet is assigned
   an up-to-date timestamp from one of the tablet servers that hosts a
   replica. Every replica is then checksummed using this snapshot
   timestamp.
4. The original default timeout of 3600 seconds for a checksum scan is
   too low, but it didn't really matter because the default tablet
   history max age was 900 seconds. Now that checksum scans can continue
   for many hours, the default timeout is raised to 86400 seconds (1
   day), and a new idle timeout is added. If a checksum process does not
   checksum an additional row for this idle timeout (default 10
   minutes), it will idle time out.

Note that there is a new scheduling problem given #2: each tablet server
has a fixed number of slots for checksum scans, but every tablet server
hosting a replica must have a slot available before any replica's
checksum can start, so deciding in which order to checksum tablets and
how to find which are available to schedule is important. Given that the
bulk of the time in checksums is occupied waiting for tablet servers to
read lots of data off disk, materialize it as rows, and checksum it,
it's worth spending a lot of effort to make sure the cluster is fully
utilized given the scan concurrency constraints. So, the tool uses a
brute force approach and simply checks all tablets to see which can be
checksummed, any time a replica checksum finishes and frees a slot.
Tablets are considered in tablet id order. Since tablet ids are UUIDs,
there should be no correlation between a tablet's id and how its
replicas are distributed across tablet servers.

There are several tests added:
1. For the KUDU-2179 fix itself.
2. For the idle timeout.
3. For when a checksum finds mismatches. Yes, we didn't have a test for
   this before. After adding this test I saw that the output is a little
   confusing since it reported the number of replicas with mismatches
   rather than the number of tablets, so I altered the output to fix
   that.
4. A couple of tests exercising situations when all tablet servers are
   unavailable and when all peers of a tablet are unavailable.

I also checksummed a very large cluster with 500TB of data or so, across
about 37000 replicas. The checksum scan completed successfully after
more than 12 hours.

Change-Id: Iff0905c2099e6f56ed1cb651611918acbaf75476
---
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/ksck_checksum.cc
M src/kudu/tools/ksck_checksum.h
M src/kudu/tools/ksck_remote-test.cc
M src/kudu/tools/ksck_remote.cc
M src/kudu/tools/ksck_remote.h
7 files changed, 871 insertions(+), 275 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/11554/10
--
To view, visit http://gerrit.cloudera.org:8080/11554
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iff0905c2099e6f56ed1cb651611918acbaf75476
Gerrit-Change-Number: 11554
Gerrit-PatchSet: 10
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] docs: update docs for CFile checksum handling

2018-10-04 Thread Andrew Wong (Code Review)
Andrew Wong has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/11581


Change subject: docs: update docs for CFile checksum handling
..

docs: update docs for CFile checksum handling

Notes that the behavior when encountering a CFile checksum has changed
in 1.8.0. I've kept around the manual steps, since they are still
valuable.

Change-Id: I11ecfe2739122f80894c5bbba13de853d962754a
---
M docs/troubleshooting.adoc
1 file changed, 7 insertions(+), 5 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/81/11581/1
--
To view, visit http://gerrit.cloudera.org:8080/11581
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I11ecfe2739122f80894c5bbba13de853d962754a
Gerrit-Change-Number: 11581
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 


[kudu-CR] docs: add a note about RAID-0

2018-10-04 Thread Andrew Wong (Code Review)
Hello Alex Rodoni, Kudu Jenkins, Adar Dembo,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11580

to look at the new patch set (#2).

Change subject: docs: add a note about RAID-0
..

docs: add a note about RAID-0

Adds a note that a single RAID-0 device with multiple disks is not
preferable to specifying multiple fs directories across multiple disks.

Change-Id: Ibb17fae9459be04211bd0a7db6eca8e94e443f71
---
M docs/configuration.adoc
1 file changed, 6 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/80/11580/2
--
To view, visit http://gerrit.cloudera.org:8080/11580
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ibb17fae9459be04211bd0a7db6eca8e94e443f71
Gerrit-Change-Number: 11580
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] docs: add a note about RAID-0

2018-10-04 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11580 )

Change subject: docs: add a note about RAID-0
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/11580/1/docs/configuration.adoc
File docs/configuration.adoc:

http://gerrit.cloudera.org:8080/#/c/11580/1/docs/configuration.adoc@77
PS1, Line 77: R
> the ones with
Went with a different sentence entirely.


http://gerrit.cloudera.org:8080/#/c/11580/1/docs/configuration.adoc@76
PS1, Line 76:
: Note that while a single data directory backed by a RAID-0 array 
will
: outperform a single data directory backed by a single storage 
device, it is
: better to let Kudu manage its own striping over multip
> Maybe: "Note that while a single data directory backed by a RAID-0 device w
Taken near-verbatim


http://gerrit.cloudera.org:8080/#/c/11580/1/docs/configuration.adoc@76
PS1, Line 76:
: Note that while a single data directory backed by a RAID-0 array 
will
: outperform a single data directory backed by a single storage 
device, it is
: better to let Kudu manage its own striping over multipl
> Maybe put this in a separate Note paragraph below?
I considered that, but I think here is more appropriate since we discuss 
striping already.



--
To view, visit http://gerrit.cloudera.org:8080/11580
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb17fae9459be04211bd0a7db6eca8e94e443f71
Gerrit-Change-Number: 11580
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Comment-Date: Thu, 04 Oct 2018 20:01:14 +
Gerrit-HasComments: Yes


[kudu-CR] docs: add a note about RAID-0

2018-10-04 Thread Alex Rodoni (Code Review)
Alex Rodoni has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11580 )

Change subject: docs: add a note about RAID-0
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/11580/1/docs/configuration.adoc
File docs/configuration.adoc:

http://gerrit.cloudera.org:8080/#/c/11580/1/docs/configuration.adoc@77
PS1, Line 77:
the ones with


http://gerrit.cloudera.org:8080/#/c/11580/1/docs/configuration.adoc@76
PS1, Line 76: Note that while RAID-0
: devices with multiple drives may perform better than single 
drives, using a
: single RAID-0 device instead of specifying multiple data 
directories may reduce
: the amount of parallelism with which Kudu can operate.
Maybe put this in a separate Note paragraph below?



--
To view, visit http://gerrit.cloudera.org:8080/11580
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibb17fae9459be04211bd0a7db6eca8e94e443f71
Gerrit-Change-Number: 11580
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Comment-Date: Thu, 04 Oct 2018 19:59:07 +
Gerrit-HasComments: Yes


[kudu-CR] docs: add a note about RAID-0

2018-10-04 Thread Andrew Wong (Code Review)
Andrew Wong has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/11580


Change subject: docs: add a note about RAID-0
..

docs: add a note about RAID-0

Adds a note that a single RAID-0 device with multiple disks is not
preferable to specifying multiple fs directories across multiple disks.

Change-Id: Ibb17fae9459be04211bd0a7db6eca8e94e443f71
---
M docs/configuration.adoc
1 file changed, 4 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/80/11580/1
--
To view, visit http://gerrit.cloudera.org:8080/11580
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ibb17fae9459be04211bd0a7db6eca8e94e443f71
Gerrit-Change-Number: 11580
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong 


[kudu-CR] [master] extra tests for the placement policy

2018-10-04 Thread Alexey Serbin (Code Review)
Hello Will Berkeley,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11562

to look at the new patch set (#2).

Change subject: [master] extra tests for the placement policy
..

[master] extra tests for the placement policy

Added a few more tests for the replica placement logic.  This is to
verify how an extra replica is placed in case of a tablet with RF=5.
The test scenario for RF=5 is chosen to exercise the logic where two
replicas per a location does not constitute yet a placement policy
violation.

This is a follow-up to ebb2852d99ed27c26e65c3569d5cb515754b2937.

Change-Id: I49748eca01743fb94846ccd611c5f1c7ddb20c04
---
M src/kudu/master/placement_policy-test.cc
M src/kudu/master/placement_policy.h
2 files changed, 200 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/62/11562/2
--
To view, visit http://gerrit.cloudera.org:8080/11562
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I49748eca01743fb94846ccd611c5f1c7ddb20c04
Gerrit-Change-Number: 11562
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] Remove Pandas as a Python Client Requirement

2018-10-04 Thread Jordan Birdsell (Code Review)
Jordan Birdsell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11557 )

Change subject: Remove Pandas as a Python Client Requirement
..


Patch Set 5: Code-Review+1

I'm good with this change but would like someone else to provide the 2nd +1 
since there has been a lot of feedback here.


--
To view, visit http://gerrit.cloudera.org:8080/11557
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2cce7d529a804e712ccc32fb2774f5d0ee9dbdf8
Gerrit-Change-Number: 11557
Gerrit-PatchSet: 5
Gerrit-Owner: a...@phdata.io
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Jordan Birdsell 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: a...@phdata.io
Gerrit-Comment-Date: Thu, 04 Oct 2018 18:02:01 +
Gerrit-HasComments: No


[kudu-CR] Remove Pandas as a Python Client Requirement

2018-10-04 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11557 )

Change subject: Remove Pandas as a Python Client Requirement
..


Patch Set 5: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11557/3/build-support/jenkins/build-and-test.sh
File build-support/jenkins/build-and-test.sh:

http://gerrit.cloudera.org:8080/#/c/11557/3/build-support/jenkins/build-and-test.sh@506
PS3, Line 506:   # We need to install Pandas manually since its not a required 
package but is needed
> Done
Missed its -> it's. But not a big deal.



--
To view, visit http://gerrit.cloudera.org:8080/11557
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2cce7d529a804e712ccc32fb2774f5d0ee9dbdf8
Gerrit-Change-Number: 11557
Gerrit-PatchSet: 5
Gerrit-Owner: a...@phdata.io
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Jordan Birdsell 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: a...@phdata.io
Gerrit-Comment-Date: Thu, 04 Oct 2018 18:17:23 +
Gerrit-HasComments: Yes


[kudu-CR] Remove Pandas as a Python Client Requirement

2018-10-04 Thread Adar Dembo (Code Review)
Adar Dembo has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/11557 )

Change subject: Remove Pandas as a Python Client Requirement
..

Remove Pandas as a Python Client Requirement

This is to remove the pandas dependency for users.
We will put the pandas dependency in the jenkins script
build-and-test.sh as a workaround. This is based on the initial
work done here https://gerrit.cloudera.org/#/c/10809/4

Change-Id: I2cce7d529a804e712ccc32fb2774f5d0ee9dbdf8
Reviewed-on: http://gerrit.cloudera.org:8080/11557
Tested-by: Kudu Jenkins
Reviewed-by: Jordan Birdsell 
Reviewed-by: Adar Dembo 
---
M build-support/jenkins/build-and-test.sh
M python/kudu/tests/test_scanner.py
M python/requirements.txt
M python/setup.py
4 files changed, 23 insertions(+), 12 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Jordan Birdsell: Looks good to me, but someone else must approve
  Adar Dembo: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/11557
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I2cce7d529a804e712ccc32fb2774f5d0ee9dbdf8
Gerrit-Change-Number: 11557
Gerrit-PatchSet: 6
Gerrit-Owner: a...@phdata.io
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Jordan Birdsell 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: a...@phdata.io


[kudu-CR] KUDU-1592 Make warnings about FBM more ominous

2018-10-04 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11579 )

Change subject: KUDU-1592 Make warnings about FBM more ominous
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/11579/1/docs/troubleshooting.adoc
File docs/troubleshooting.adoc:

http://gerrit.cloudera.org:8080/#/c/11579/1/docs/troubleshooting.adoc@85
PS1, Line 85: This results in a very large number of file descriptors that
: the tablet server needs to open even with small amounts of data, 
which is a
: limited resource and the tablet servers can easily exhaust it 
with file block manager.
This isn't true; the FileCache will recycle file descriptors on an LRU basis in 
order to keep the total open fd count below a certain threshold.

That doesn't detract from the larger argument (that it's not as performant) 
though.


http://gerrit.cloudera.org:8080/#/c/11579/1/docs/troubleshooting.adoc@89
PS1, Line 89: One more reason why file block manager should be avoided is that 
it's
: *impossible to switch between block managers* without wiping and 
reinitializing
: the tablet servers.
Combine this into the previous paragraph.


http://gerrit.cloudera.org:8080/#/c/11579/1/src/kudu/fs/fs_manager.cc
File src/kudu/fs/fs_manager.cc:

http://gerrit.cloudera.org:8080/#/c/11579/1/src/kudu/fs/fs_manager.cc@70
PS1, Line 70: File
Nit: "The file block manager..."



--
To view, visit http://gerrit.cloudera.org:8080/11579
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15adf42dd7e6376a85cdb178ad460f239a4f87a1
Gerrit-Change-Number: 11579
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Jean-Daniel Cryans 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 04 Oct 2018 18:21:32 +
Gerrit-HasComments: Yes


[kudu-CR] [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets

2018-10-04 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11554 )

Change subject: [tools] KUDU-2179: Have ksck not use a single snapshot for all 
tablets
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc
File src/kudu/tools/ksck_checksum.cc:

http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc@358
PS7, Line 358:   return true;
 : }
 :
> There is no Messenger in a KsckCluster, the base class this class works wit
I never really saw the point of the class hierarchy here; we could have poked 
enough holes in ksck to simplify testing without inheritance.

Anyway, I think your suggestion is reasonable. How shady it will end up sort of 
depends on whether rpc::PeriodicTimer needs to be used by whatever tests use a 
MockKsckCluster. If it isn't used, you could get away with returning nullptr in 
MockKsckCluster::messenger(), which would ease the stinging.



--
To view, visit http://gerrit.cloudera.org:8080/11554
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iff0905c2099e6f56ed1cb651611918acbaf75476
Gerrit-Change-Number: 11554
Gerrit-PatchSet: 9
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Thu, 04 Oct 2018 18:27:34 +
Gerrit-HasComments: Yes


[kudu-CR] [master] extra tests for the placement policy

2018-10-04 Thread Will Berkeley (Code Review)
Will Berkeley has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11562 )

Change subject: [master] extra tests for the placement policy
..


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/11562/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11562/1//COMMIT_MSG@11
PS1, Line 11: The test scenario for RF=5 is chosen to exercise the logic where 
two
: replicas per a location does not constitute yet a placement policy
: violation.
I'm not sure what this sentence says.


http://gerrit.cloudera.org:8080/#/c/11562/1/src/kudu/master/placement_policy-test.cc
File src/kudu/master/placement_policy-test.cc:

http://gerrit.cloudera.org:8080/#/c/11562/1/src/kudu/master/placement_policy-test.cc@598
PS1, Line 598: 3
I think it's worth naming this as 'num_replicas', here and below.


http://gerrit.cloudera.org:8080/#/c/11562/1/src/kudu/master/placement_policy-test.cc@601
PS1, Line 601: ASSERT_GE(1, m.count("A_ts0") + m.count("A_ts1"));
 : ASSERT_GE(1, m.count("B_ts0") + m.count("B_ts1"));
 : ASSERT_GE(1, m.count("C_ts0"));
 : ASSERT_GE(1, m.count("D_ts0"));
 : ASSERT_GE(1, m.count("E_ts0"));
 : ASSERT_EQ(3, m.count("A_ts0") + m.count("A_ts1") +
 :  m.count("B_ts0") + m.count("B_ts1") +
 :  m.count("C_ts0") + m.count("D_ts0") + 
m.count("E_ts0"));
Could you write a quick note explaining what the intent of the checks is, i.e. 
"Make sure that X".


http://gerrit.cloudera.org:8080/#/c/11562/1/src/kudu/master/placement_policy-test.cc@625
PS1, Line 625: talbet
tablet


http://gerrit.cloudera.org:8080/#/c/11562/1/src/kudu/master/placement_policy-test.cc@697
PS1, Line 697: ASSERT_TRUE(
 :   extra_ts->permanent_uuid() == "A_ts2" ||
 :   extra_ts->permanent_uuid() == "C_ts0")
Why is this expected?


http://gerrit.cloudera.org:8080/#/c/11562/1/src/kudu/master/placement_policy-test.cc@704
PS1, Line 704: // This is similar to PlaceExtraTablet_L5_TS10_RF5 but 16 tablet 
servers instead
 : // of 10 and slightly different replica distribution.
What specifically is this testing then?


http://gerrit.cloudera.org:8080/#/c/11562/1/src/kudu/master/placement_policy-test.cc@726
PS1, Line 726: ASSERT_TRUE(extra_ts->permanent_uuid() == "A_ts1" ||
 : extra_ts->permanent_uuid() == "A_ts2" ||
 : extra_ts->permanent_uuid() == "A_ts3" ||
 : extra_ts->permanent_uuid() == "B_ts1" ||
 : extra_ts->permanent_uuid() == "B_ts2" ||
 : extra_ts->permanent_uuid() == "B_ts3")
Why is this pick expected?



--
To view, visit http://gerrit.cloudera.org:8080/11562
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I49748eca01743fb94846ccd611c5f1c7ddb20c04
Gerrit-Change-Number: 11562
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Thu, 04 Oct 2018 17:43:40 +
Gerrit-HasComments: Yes


[kudu-CR] [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets

2018-10-04 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11554 )

Change subject: [tools] KUDU-2179: Have ksck not use a single snapshot for all 
tablets
..


Patch Set 9:

(18 comments)

Skimmed trough for a first look, will take another look later today.

Overall looks pretty good, as of now mostly nits.

http://gerrit.cloudera.org:8080/#/c/11554/9//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11554/9//COMMIT_MSG@15
PS9, Line 15: before
nit: before --> behind ?


http://gerrit.cloudera.org:8080/#/c/11554/9//COMMIT_MSG@60
PS9, Line 60: all peers
BTW, what happens if _some_ of the peers are unavailable?  Do we need to add a 
test for that or there is nothing special about that?


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck.h
File src/kudu/tools/ksck.h:

http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck.h@353
PS9, Line 353:   void set_timestamp(uint64_t timestamp) {
 : timestamp_ = timestamp;
 :   }
Is it used at all?


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.h
File src/kudu/tools/ksck_checksum.h:

http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.h@256
PS9, Line 256:   AtomicInt rows_summed_;
 :   AtomicInt disk_bytes_summed_;
nit: why not to move into std::atomic with those as well?


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc
File src/kudu/tools/ksck_checksum.cc:

http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@66
PS9, Line 66: DEFINE_int32(max_progress_report_wait_ms, 5000,
:  "Maximum number of milliseconds to wait between 
progress reports. "
:  "Used to speed up tests.");
: TAG_FLAG(max_progress_report_wait_ms, hidden);
Could this be also useful while using the tool in the field?


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@70
PS9, Line 70: timestamp_update_period_ms
Does it make sense to keep it in milliseconds?  Maybe, a second-grain 
resolution would be enough?


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@86
PS9, Line 86: when all the checksums for its replicas begin
nit: maybe, change to 'when the checksumming for all its replicas begins'


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@101
PS9, Line 101: listed in 'tservers'
What about the reverse: is it OK to have a tablet server in 'tservers' that is 
not referenced by any tablet from the 'tablet_infos'?


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@106
PS9, Line 106: CHECK(num_replicas);
nit: maybe, DCHECK is enough, since it would crash anyway further down if it 
were null.


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@110
PS9, Line 110: tserver_uuid_set
Consider using helper functions from map-util.h: those present more expressive 
semantics.  E.g., here it's supposed to be InsertOrDie(_uuid_set, 
tserver->uuid()), right?


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@187
PS9, Line 187: bool KsckChecksumManager::HasOpenTsSlotsUnlocked() {
nit: add const specifier?


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_checksum.cc@383
PS9, Line 383: replica_uuids.insert
nit: use helper functions from map-util.h


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote-test.cc
File src/kudu/tools/ksck_remote-test.cc:

http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote-test.cc@416
PS9, Line 416: TestChecksumSnapshotLastingLongerThanAHM
Did you try to run this with --stress-cpu-threads=8 and many iterations using 
dist-test?


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote-test.cc@429
PS9, Line 429: or
and


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote-test.cc@428
PS9, Line 428:   #if defined(THREAD_SANITIZER) || defined(ADDRESS_SANITIZER)
 : LOG(WARNING) << "test is skipped in TSAN or ASAN builds";
 : return;
 :   #endif
nit: shift this left one indent


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote-test.cc@476
PS9, Line 476: MonoDelta::FromSeconds(timeout_sec),
 : MonoDelta::FromSeconds(timeout_sec),
 : scan_concurrency,
 : use_snapshot,
 : KsckChecksumOptions::kCurrentTimestamp)));
nit: indent should be 4 spaces from the outer scope's indent.


http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote.h
File src/kudu/tools/ksck_remote.h:

http://gerrit.cloudera.org:8080/#/c/11554/9/src/kudu/tools/ksck_remote.h@130
PS9, Line 130:
style nit: indent should be 4 spaces from the outer scope



[kudu-CR] Remove Pandas as a Python Client Requirement

2018-10-04 Thread Anonymous Coward (Code Review)
a...@phdata.io has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11557 )

Change subject: Remove Pandas as a Python Client Requirement
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11557/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11557/3//COMMIT_MSG@13
PS3, Line 13:
> initial
Done



--
To view, visit http://gerrit.cloudera.org:8080/11557
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2cce7d529a804e712ccc32fb2774f5d0ee9dbdf8
Gerrit-Change-Number: 11557
Gerrit-PatchSet: 5
Gerrit-Owner: a...@phdata.io
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Jordan Birdsell 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: a...@phdata.io
Gerrit-Comment-Date: Thu, 04 Oct 2018 15:18:40 +
Gerrit-HasComments: Yes


[kudu-CR] Remove Pandas as a Python Client Requirement

2018-10-04 Thread Anonymous Coward (Code Review)
Hello Kudu Jenkins, Jordan Birdsell, Adar Dembo,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11557

to look at the new patch set (#5).

Change subject: Remove Pandas as a Python Client Requirement
..

Remove Pandas as a Python Client Requirement

This is to remove the pandas dependency for users.
We will put the pandas dependency in the jenkins script
build-and-test.sh as a workaround. This is based on the initial
work done here https://gerrit.cloudera.org/#/c/10809/4

Change-Id: I2cce7d529a804e712ccc32fb2774f5d0ee9dbdf8
---
M build-support/jenkins/build-and-test.sh
M python/kudu/tests/test_scanner.py
M python/requirements.txt
M python/setup.py
4 files changed, 23 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/57/11557/5
--
To view, visit http://gerrit.cloudera.org:8080/11557
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2cce7d529a804e712ccc32fb2774f5d0ee9dbdf8
Gerrit-Change-Number: 11557
Gerrit-PatchSet: 5
Gerrit-Owner: a...@phdata.io
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Jordan Birdsell 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: a...@phdata.io


[kudu-CR] Remove Pandas as a Python Client Requirement

2018-10-04 Thread Anonymous Coward (Code Review)
a...@phdata.io has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11557 )

Change subject: Remove Pandas as a Python Client Requirement
..


Patch Set 4:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/11557/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/11557/3//COMMIT_MSG@12
PS3, Line 12: work done here https://gerrit.cloudera.org/#/c/10809/4
> dependency
Done


http://gerrit.cloudera.org:8080/#/c/11557/3/build-support/jenkins/build-and-test.sh
File build-support/jenkins/build-and-test.sh:

http://gerrit.cloudera.org:8080/#/c/11557/3/build-support/jenkins/build-and-test.sh@506
PS3, Line 506:   # We need to install Pandas manually since its not a required 
package but is needed
> Nit: it's
Done


http://gerrit.cloudera.org:8080/#/c/11557/3/build-support/jenkins/build-and-test.sh@508
PS3, Line 508:   # pandas 0.18 dropped support for python 2.6.
> Can you pin this to 0.18, which was the last version to support Python 2.6?
Done


http://gerrit.cloudera.org:8080/#/c/11557/3/build-support/jenkins/build-and-test.sh@567
PS3, Line 567:   #
> Nit: leading whitespace here.
Done


http://gerrit.cloudera.org:8080/#/c/11557/3/python/kudu/tests/test_scanner.py
File python/kudu/tests/test_scanner.py:

http://gerrit.cloudera.org:8080/#/c/11557/3/python/kudu/tests/test_scanner.py@385
PS3, Line 385: @pytest.mark.skipif((not(kudu.CLIENT_SUPPORTS_PANDAS) and
 : (not(kudu.CLIENT_SUPPORTS_DECIMAL))),
> Can this be formatted like this?
Done


http://gerrit.cloudera.org:8080/#/c/11557/3/python/setup.py
File python/setup.py:

http://gerrit.cloudera.org:8080/#/c/11557/3/python/setup.py@197
PS3, Line 197: # pytest 3.3 [1] and pytest-timeout 1.2.1 [2] dropped
 : # support for python 2.6.
 : #
 : # 1. https://docs.pytest.org/en/latest/changelog.html#id164
 : # 2. https://pypi.org/project/pytest-timeout/#id5
 : tests_require=['pytest >=2.8,<3.3',
> Update this to remove the pandas reference.
Done



--
To view, visit http://gerrit.cloudera.org:8080/11557
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2cce7d529a804e712ccc32fb2774f5d0ee9dbdf8
Gerrit-Change-Number: 11557
Gerrit-PatchSet: 4
Gerrit-Owner: a...@phdata.io
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Jordan Birdsell 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: a...@phdata.io
Gerrit-Comment-Date: Thu, 04 Oct 2018 15:17:22 +
Gerrit-HasComments: Yes


[kudu-CR] Remove Pandas as a Python Client Requirement

2018-10-04 Thread Anonymous Coward (Code Review)
Hello Kudu Jenkins, Jordan Birdsell, Adar Dembo,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11557

to look at the new patch set (#4).

Change subject: Remove Pandas as a Python Client Requirement
..

Remove Pandas as a Python Client Requirement

This is to remove the pandas dependency for users.
We will put the pandas dependency in the jenkins script
build-and-test.sh as a workaround. This is based on the intial
work done here https://gerrit.cloudera.org/#/c/10809/4

Change-Id: I2cce7d529a804e712ccc32fb2774f5d0ee9dbdf8
---
M build-support/jenkins/build-and-test.sh
M python/kudu/tests/test_scanner.py
M python/requirements.txt
M python/setup.py
4 files changed, 23 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/57/11557/4
--
To view, visit http://gerrit.cloudera.org:8080/11557
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2cce7d529a804e712ccc32fb2774f5d0ee9dbdf8
Gerrit-Change-Number: 11557
Gerrit-PatchSet: 4
Gerrit-Owner: a...@phdata.io
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Jordan Birdsell 
Gerrit-Reviewer: Kudu Jenkins


[kudu-CR] [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets

2018-10-04 Thread Will Berkeley (Code Review)
Hello Tidy Bot, Alexey Serbin, Attila Bukor, Kudu Jenkins, Andrew Wong, Adar 
Dembo, Grant Henke, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11554

to look at the new patch set (#9).

Change subject: [tools] KUDU-2179: Have ksck not use a single snapshot for all 
tablets
..

[tools] KUDU-2179: Have ksck not use a single snapshot for all tablets

ksck checksum scans allow the user to checksum with snapshot scans, so
that a checksum can be done even as tablets are mutated. It also allows
users to omit a snapshot timestamp. Previously, in this case, the
snapshot timestamp would be retrieved from some healthy tablet server at
the beginning of the checksum process, and used for every replica. This
didn't work well for checksumming large tables, because eventually the
snapshot timestamp fell before the ancient history mark, and subsequent
checksums scans would not be accepted by the tablet servers.

This changes how checksum scans work to address this problem:
1. A background process periodically updates timestamps from tablet
   servers.
2. The checksum process is reorganized so the replicas of one tablet
   are checksummed together.
3. When a tablet is about to be checksummed, and the checksum scan is a
   snapshot scan with no user-provided timestamp, the tablet is assigned
   an up-to-date timestamp from one of the tablet servers that hosts a
   replica. Every replica is then checksummed using this snapshot
   timestamp.
4. The original default timeout of 3600 seconds for a checksum scan is
   too low, but it didn't really matter because the default tablet
   history max age was 900 seconds. Now that checksum scans can continue
   for many hours, the default timeout is raised to 86400 seconds (1
   day), and a new idle timeout is added. If a checksum process does not
   checksum an additional row for this idle timeout (default 10
   minutes), it will idle time out.

Note that there is a new scheduling problem given #2: each tablet server
has a fixed number of slots for checksum scans, but every tablet server
hosting a replica must have a slot available before any replica's
checksum can start, so deciding in which order to checksum tablets and
how to find which are available to schedule is important. Given that the
bulk of the time in checksums is occupied waiting for tablet servers to
read lots of data off disk, materialize it as rows, and checksum it,
it's worth spending a lot of effort to make sure the cluster is fully
utilized given the scan concurrency constraints. So, the tool uses a
brute force approach and simply checks all tablets to see which can be
checksummed, any time a replica checksum finishes and frees a slot.
Tablets are considered in tablet id order. Since tablet ids are UUIDs,
there should be no correlation between a tablet's id and how its
replicas are distributed across tablet servers.

There are several tests added:
1. For the KUDU-2179 fix itself.
2. For the idle timeout.
3. For when a checksum finds mismatches. Yes, we didn't have a test for
   this before. After adding this test I saw that the output is a little
   confusing since it reported the number of replicas with mismatches
   rather than the number of tablets, so I altered the output to fix
   that.
4. A couple of tests exercising situations when all tablet servers are
   unavailable and when all peers of a tablet are unavailable.

I also checksummed a very large cluster with 500TB of data or so, across
about 37000 replicas. The checksum scan completed successfully after
more than 12 hours.

Change-Id: Iff0905c2099e6f56ed1cb651611918acbaf75476
---
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/ksck_checksum.cc
M src/kudu/tools/ksck_checksum.h
M src/kudu/tools/ksck_remote-test.cc
M src/kudu/tools/ksck_remote.cc
M src/kudu/tools/ksck_remote.h
7 files changed, 846 insertions(+), 269 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/11554/9
--
To view, visit http://gerrit.cloudera.org:8080/11554
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iff0905c2099e6f56ed1cb651611918acbaf75476
Gerrit-Change-Number: 11554
Gerrit-PatchSet: 9
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-1592 Make warnings about FBM more ominous

2018-10-04 Thread Attila Bukor (Code Review)
Hello Jean-Daniel Cryans, Adar Dembo, Grant Henke,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/11579

to review the following change.


Change subject: KUDU-1592 Make warnings about FBM more ominous
..

KUDU-1592 Make warnings about FBM more ominous

The documentation warned about scalability issues with file block
manager, but it was rather lightly worded and so was the warning message
of a hole punch test failure.

This commit removes the actual flag necessary to enable FBM from the
hole punching tests' warning to force users look it up in the
documentation instead of blindly applying it, adds a note about it being
unsuitable for production use to the flag's description and expands the
warning in the troubleshooting section.

Change-Id: I15adf42dd7e6376a85cdb178ad460f239a4f87a1
---
M docs/troubleshooting.adoc
M src/kudu/fs/data_dirs.cc
M src/kudu/fs/fs_manager.cc
3 files changed, 32 insertions(+), 9 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/79/11579/1
--
To view, visit http://gerrit.cloudera.org:8080/11579
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I15adf42dd7e6376a85cdb178ad460f239a4f87a1
Gerrit-Change-Number: 11579
Gerrit-PatchSet: 1
Gerrit-Owner: Attila Bukor 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Jean-Daniel Cryans 


[kudu-CR] [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets

2018-10-04 Thread Will Berkeley (Code Review)
Will Berkeley has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11554 )

Change subject: [tools] KUDU-2179: Have ksck not use a single snapshot for all 
tablets
..


Patch Set 7:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.h
File src/kudu/tools/ksck_checksum.h:

http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.h@45
PS7, Line 45: }
> nit: add "// namespace rpc"
Done


http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.h@136
PS7, Line 136:   static Status New(KsckChecksumOptions opts,
> warning: function 'kudu::tools::KsckChecksumManager::New' has a definition
Done


http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.h@140
PS7, Line 140:   ~KsckChecksumManager() = default;
> Is this needed?
Done


http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.h@169
PS7, Line 169: const MonoDelta& timeout,
 :   const MonoDelta& idle_timeout,
> Why not use the `timeout` and `idle_timeout` in `opts_`?
Done


http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.h@211
PS7, Line 211:   // Initialize 'ts_open_slots_map_'.
 :   void InitializeTsSlotsMap();
> Does it matter whether this is called more than once? If not, maybe name it
It shouldn't be called more than once since it sets the number of slots to 
their initial values and other methods manage them from there.


http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.h@256
PS7, Line 256:  in
> nit: on
Done


http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc
File src/kudu/tools/ksck_checksum.cc:

http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc@86
PS7, Line 86: when the all 
> nit: extra word
Done


http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc@118
PS7, Line 118:
> nit: spacing
Done


http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc@256
PS7, Line 256: int64_t delta_rows, int64_t delta_bytes
> Let's DCHECK that these are positive.
Done


http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc@270
PS7, Line 270: auto& tablet_result = LookupOrInsert(_,
> We've got emplace-based functions in map-util.h; you should prefer them to
Done


http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc@279
PS7, Line 279:   boost::bind(::StartTabletChecksums, 
this)),
> Prefer std::bind; I think std::function can implicitly cast to boost::funct
Done


http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc@293
PS7, Line 293: MonoTime::Now()
> nit: reuse `start`?
Done


http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc@298
PS7, Line 298: int
> No more `auto`?
If I do that I need to cast 'FLAGS_...' to match whatever is inferred for 
'rem_ms', or cast 'rem_ms' to be an int32_t, removing the point of auto. I'll 
change it to the former alternative and you can see.


http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc@304
PS7, Line 304:
> nit: spacing
Done


http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc@341
PS7, Line 341: auto& slots_open = FindOrDie(ts_slots_open_map_, 
replica->ts_uuid());
> nit: perhaps put these into a vector so you don't have to re-FindOrDie them
Done


http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc@358
PS7, Line 358:   shared_ptr messenger;
 :   rpc::MessengerBuilder builder("timestamp update");
 :   RETURN_NOT_OK(builder.Build());
> Can't you reuse the Messenger that ksck uses for RPCs?
There is no Messenger in a KsckCluster, the base class this class works with. 
There's only a Messenger in a RemoteKsckCluster, and it's held privately. I 
guess I can add a virtual KsckCluster::messenger() getter that returns a 
shared_ptr to a KsckCluster's Messenger, which just constructs a new Messenger 
when called on a MockKsckCluster? That seems a little shady. Does it sound 
worth it to you?


http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc@402
PS7, Line 402: VLOG(1) << LogPrefix(tablet_info.tablet->id()) << "Using 
timestamp "
 :   << timestamp_for_tablet;
> What should happen if `timestamp_for_tablet` is kCurrentTimestamp? I know w
In that case we probably ought to short-circuit the checksum with an error.


http://gerrit.cloudera.org:8080/#/c/11554/7/src/kudu/tools/ksck_checksum.cc@620
PS7, Line 620: tablet_infos,
 :  tablet_servers,
 :  ));
> nit: spacing
Done



--
To view, visit http://gerrit.cloudera.org:8080/11554
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master

[kudu-CR] [tools] KUDU-2179: Have ksck not use a single snapshot for all tablets

2018-10-04 Thread Will Berkeley (Code Review)
Hello Tidy Bot, Alexey Serbin, Attila Bukor, Kudu Jenkins, Andrew Wong, Adar 
Dembo, Grant Henke, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/11554

to look at the new patch set (#8).

Change subject: [tools] KUDU-2179: Have ksck not use a single snapshot for all 
tablets
..

[tools] KUDU-2179: Have ksck not use a single snapshot for all tablets

ksck checksum scans allow the user to checksum with snapshot scans, so
that a checksum can be done even as tablets are mutated. It also allows
users to omit a snapshot timestamp. Previously, in this case, the
snapshot timestamp would be retrieved from some healthy tablet server at
the beginning of the checksum process, and used for every replica. This
didn't work well for checksumming large tables, because eventually the
snapshot timestamp fell before the ancient history mark, and subsequent
checksums scans would not be accepted by the tablet servers.

This changes how checksum scans work to address this problem:
1. A background process periodically updates timestamps from tablet
   servers.
2. The checksum process is reorganized so the replicas of one tablet
   are checksummed together.
3. When a tablet is about to be checksummed, and the checksum scan is a
   snapshot scan with no user-provided timestamp, the tablet is assigned
   an up-to-date timestamp from one of the tablet servers that hosts a
   replica. Every replica is then checksummed using this snapshot
   timestamp.
4. The original default timeout of 3600 seconds for a checksum scan is
   too low, but it didn't really matter because the default tablet
   history max age was 900 seconds. Now that checksum scans can continue
   for many hours, the default timeout is raised to 86400 seconds (1
   day), and a new idle timeout is added. If a checksum process does not
   checksum an additional row for this idle timeout (default 10
   minutes), it will idle time out.

Note that there is a new scheduling problem given #2: each tablet server
has a fixed number of slots for checksum scans, but every tablet server
hosting a replica must have a slot available before any replica's
checksum can start, so deciding in which order to checksum tablets and
how to find which are available to schedule is important. Given that the
bulk of the time in checksums is occupied waiting for tablet servers to
read lots of data off disk, materialize it as rows, and checksum it,
it's worth spending a lot of effort to make sure the cluster is fully
utilized given the scan concurrency constraints. So, the tool uses a
brute force approach and simply checks all tablets to see which can be
checksummed, any time a replica checksum finishes and frees a slot.
Tablets are considered in tablet id order. Since tablet ids are UUIDs,
there should be no correlation between a tablet's id and how its
replicas are distributed across tablet servers.

There are several tests added:
1. For the KUDU-2179 fix itself.
2. For the idle timeout.
3. For when a checksum finds mismatches. Yes, we didn't have a test for
   this before. After adding this test I saw that the output is a little
   confusing since it reported the number of replicas with mismatches
   rather than the number of tablets, so I altered the output to fix
   that.
4. A couple of tests exercising situations when all tablet servers are
   unavailable and when all peers of a tablet are unavailable.

I also checksummed a very large cluster with 500TB of data or so, across
about 37000 replicas. The checksum scan completed successfully after
more than 12 hours.

Change-Id: Iff0905c2099e6f56ed1cb651611918acbaf75476
---
M src/kudu/tools/ksck-test.cc
M src/kudu/tools/ksck.h
M src/kudu/tools/ksck_checksum.cc
M src/kudu/tools/ksck_checksum.h
M src/kudu/tools/ksck_remote-test.cc
M src/kudu/tools/ksck_remote.cc
M src/kudu/tools/ksck_remote.h
7 files changed, 846 insertions(+), 269 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/54/11554/8
--
To view, visit http://gerrit.cloudera.org:8080/11554
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iff0905c2099e6f56ed1cb651611918acbaf75476
Gerrit-Change-Number: 11554
Gerrit-PatchSet: 8
Gerrit-Owner: Will Berkeley 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley