[kudu-CR] [backup] Add initial incremental backup/restore support

2019-04-15 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12879 )

Change subject: [backup] Add initial incremental backup/restore support
..


Patch Set 4: Code-Review+1

(9 comments)

Looks good.

http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala:

http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@33
PS4, Line 33:   private val adjacencyList = mutable.Map[Long, 
mutable.ListBuffer[BackupNode]]()
Please add a comment the describing key -> val mapping.


http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@34
PS4, Line 34:   private val nodes = mutable.Set[Long]()
Needs comment. Is 'nodes' used for anything?


http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@37
PS4, Line 37:   private val fullBackupFromMs = 0
style nit: this looks like a constant, so should be named with InitialCaps?


http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@90
PS4, Line 90:   adjacencyList(fromMs).flatMap { node =>
: allPaths(node.metadata.getToMs, path ++ List(node))
Wow, Scala is fancy.


http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@149
PS4, Line 149: represents what the graph would look like
 :* at the passed point in time.
suggestion: that represents the graph including only nodes with a ToTime equal 
to or less than the specified time


http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@199
PS4, Line 199:   def pathString: String = 
backups.map(_.metadata.getFromMs).mkString(" -> ")
This is amazingly terse. Scala has built in method for everything.


http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala:

http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@66
PS4, Line 66: } else {
We should have a --force-incremental to ensure that a backup administrator can 
guarantee some kind of SLA if they can't afford the resources to do a full 
backup and would rather get an error. OK to add this as a TODO item though.


http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala:

http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@56
PS4, Line 56:   val graph = 
io.readBackupGraph(tableName).filterByTime(options.timestampMs)
What do you think about adding a TODO to add an option for an exact match on 
"to" timestamp instead of the best match based on timestamp?


http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-client/src/main/java/org/apache/kudu/Schema.java
File java/kudu-client/src/main/java/org/apache/kudu/Schema.java:

http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-client/src/main/java/org/apache/kudu/Schema.java@206
PS4, Line 206: Integer index = this.columnsByName.get(columnName);
 : return index != null;
nit: how about: return columnsByName.containsKey(columnName);



--
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 4
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Tue, 16 Apr 2019 04:07:28 +
Gerrit-HasComments: Yes


[kudu-CR] log block manager: reenable dead container deletion at runtime

2019-04-15 Thread Andrew Wong (Code Review)
Andrew Wong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13021 )

Change subject: log_block_manager: reenable dead container deletion at runtime
..


Patch Set 1: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13021/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13021/1//COMMIT_MSG@26
PS1, Line 26: 1. 
http://dist-test.cloudera.org:8080/download_log?c195c2b0-2845-11e9-a3f1-0242ac110002
Not sure about anyone else, but I get a 404 when I click this. This one works 
for me though: 
http://dist-test.cloudera.org:8080/diagnose?key=c195c2b0-2845-11e9-a3f1-0242ac110002



--
To view, visit http://gerrit.cloudera.org:8080/13021
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icd8742c150e1447e7838d9f86173536b5726cd40
Gerrit-Change-Number: 13021
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu 
Gerrit-Comment-Date: Tue, 16 Apr 2019 03:23:16 +
Gerrit-HasComments: Yes


[kudu-CR] WIP [docs] updated master-related scenarios w.r.t. CM

2019-04-15 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13022 )

Change subject: WIP [docs] updated master-related scenarios w.r.t. CM
..


Patch Set 1: Verified+1

unrelated test flake


--
To view, visit http://gerrit.cloudera.org:8080/13022
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I96bf3ec0de07a46151f947a18b9466da5f05adb3
Gerrit-Change-Number: 13022
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Tue, 16 Apr 2019 03:06:10 +
Gerrit-HasComments: No


[kudu-CR] WIP [docs] updated master-related scenarios w.r.t. CM

2019-04-15 Thread Alexey Serbin (Code Review)
Alexey Serbin has removed a vote on this change.

Change subject: WIP [docs] updated master-related scenarios w.r.t. CM
..


Removed Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/13022
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I96bf3ec0de07a46151f947a18b9466da5f05adb3
Gerrit-Change-Number: 13022
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Attila Bukor 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] remove outdated information from consensus.txt

2019-04-15 Thread Jia Hongchao (Code Review)
Jia Hongchao has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/13023


Change subject: remove outdated information from consensus.txt
..

remove outdated information from consensus.txt

It says "For the details of the write-ahead-log (WAL) message
format, see the README file in this directory.", however, there
is no README file there. I think it's better to remove outdated
information.

Change-Id: I4ae6b4dc92477d54f98ac652b35666a2c21a63a2
---
M src/kudu/consensus/consensus.txt
1 file changed, 1 insertion(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/13023/1
--
To view, visit http://gerrit.cloudera.org:8080/13023
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4ae6b4dc92477d54f98ac652b35666a2c21a63a2
Gerrit-Change-Number: 13023
Gerrit-PatchSet: 1
Gerrit-Owner: Jia Hongchao 


[kudu-CR] add document for KUDU-2080

2019-04-15 Thread Jia Hongchao (Code Review)
Jia Hongchao has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12774 )

Change subject: add document for KUDU-2080
..


Patch Set 2:

Sorry for the delay, it is my first time to use Gerrit.I thought there is 
nothing for me to do.


--
To view, visit http://gerrit.cloudera.org:8080/12774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a802a846ad5ec93ce4e0022ec279f1b4c6cc5db
Gerrit-Change-Number: 12774
Gerrit-PatchSet: 2
Gerrit-Owner: Jia Hongchao 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Greg Solovyev 
Gerrit-Reviewer: Jia Hongchao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 16 Apr 2019 02:27:05 +
Gerrit-HasComments: No


[kudu-CR] add document for KUDU-2080

2019-04-15 Thread Jia Hongchao (Code Review)
Hello Kudu Jenkins, Adar Dembo, Grant Henke, Greg Solovyev,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12774

to look at the new patch set (#2).

Change subject: add document for KUDU-2080
..

add document for KUDU-2080

Change-Id: I7a802a846ad5ec93ce4e0022ec279f1b4c6cc5db
---
M docs/known_issues.adoc
1 file changed, 2 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/74/12774/2
--
To view, visit http://gerrit.cloudera.org:8080/12774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7a802a846ad5ec93ce4e0022ec279f1b4c6cc5db
Gerrit-Change-Number: 12774
Gerrit-PatchSet: 2
Gerrit-Owner: Jia Hongchao 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Greg Solovyev 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] WIP [docs] updated master-related scenarios w.r.t. CM

2019-04-15 Thread Alexey Serbin (Code Review)
Alexey Serbin has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/13022


Change subject: WIP [docs] updated master-related scenarios w.r.t. CM
..

WIP [docs] updated master-related scenarios w.r.t. CM

WIP: Do we need to mention CM in the upstream doc at all?
 Maybe, we should remove all CM-related references
 from this doc?

Change-Id: I96bf3ec0de07a46151f947a18b9466da5f05adb3
---
M docs/administration.adoc
1 file changed, 8 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/22/13022/1
--
To view, visit http://gerrit.cloudera.org:8080/13022
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I96bf3ec0de07a46151f947a18b9466da5f05adb3
Gerrit-Change-Number: 13022
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin 


[kudu-CR] log block manager: reenable dead container deletion at runtime

2019-04-15 Thread Adar Dembo (Code Review)
Hello Andrew Wong, helifu, Hao Hao,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/13021

to review the following change.


Change subject: log_block_manager: reenable dead container deletion at runtime
..

log_block_manager: reenable dead container deletion at runtime

A quick timeline:
1. The feature was added and its behavior enabled.
2. Occasionally, block_manager-stress-test would SIGSEGV [1]. These failures
   showed up in the flaky test dashboard.
3. So, a feature flag was added in commit 8dc7904. The flag was false by
   default, only set to true for a handful of tests.
4. In commit 3e0fcc9, the flag was set to true in block_manager-stress-test
   so that we could continue to debug the issue.
5. Except that it hasn't surfaced since then; the last
   block_manager-stress-test failure was on 02/03/19.

It's a mystery as to why there haven't been any block_manager-stress-test
failures since #4. Maybe something isn't working quite right with the
feature flag, or maybe the failure was due to something else. Either way,
let's try to enable the feature again, since it's still early in the 1.10
release cycle.

1. 
http://dist-test.cloudera.org:8080/download_log?c195c2b0-2845-11e9-a3f1-0242ac110002

Change-Id: Icd8742c150e1447e7838d9f86173536b5726cd40
---
M src/kudu/fs/block_manager-stress-test.cc
M src/kudu/fs/log_block_manager-test.cc
M src/kudu/fs/log_block_manager.cc
3 files changed, 1 insertion(+), 15 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/13021/1
--
To view, visit http://gerrit.cloudera.org:8080/13021
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Icd8742c150e1447e7838d9f86173536b5726cd40
Gerrit-Change-Number: 13021
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: helifu 


[kudu-CR] add document for KUDU-2080

2019-04-15 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12774 )

Change subject: add document for KUDU-2080
..


Patch Set 1:

Jia, are you planning to address the feedback?


--
To view, visit http://gerrit.cloudera.org:8080/12774
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a802a846ad5ec93ce4e0022ec279f1b4c6cc5db
Gerrit-Change-Number: 12774
Gerrit-PatchSet: 1
Gerrit-Owner: Jia Hongchao 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Greg Solovyev 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 15 Apr 2019 23:45:58 +
Gerrit-HasComments: No


[kudu-CR] generic iterators: MergeIterator whole block copy optimization

2019-04-15 Thread Adar Dembo (Code Review)
Adar Dembo has removed Kudu Jenkins from this change.  ( 
http://gerrit.cloudera.org:8080/13011 )

Change subject: generic_iterators: MergeIterator whole block copy optimization
..


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/13011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I3a4b56169644f35f1aa8aef27d4af0ce4ec0792d
Gerrit-Change-Number: 13011
Gerrit-PatchSet: 4
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] generic iterators: MergeIterator whole block copy optimization

2019-04-15 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13011 )

Change subject: generic_iterators: MergeIterator whole block copy optimization
..


Patch Set 4: Verified+1

Overriding Jenkins, unrelated (probably flaky?) test failure.


--
To view, visit http://gerrit.cloudera.org:8080/13011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3a4b56169644f35f1aa8aef27d4af0ce4ec0792d
Gerrit-Change-Number: 13011
Gerrit-PatchSet: 4
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 15 Apr 2019 23:21:33 +
Gerrit-HasComments: No


[kudu-CR] generic iterators: refactor MergeIterator for whole-block-copy optimization

2019-04-15 Thread Adar Dembo (Code Review)
Adar Dembo has removed a vote on this change.

Change subject: generic_iterators: refactor MergeIterator for whole-block-copy 
optimization
..


Removed Verified+1 by Adar Dembo 
--
To view, visit http://gerrit.cloudera.org:8080/13010
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I050116edc51bb3e91852e6286c221175770e6382
Gerrit-Change-Number: 13010
Gerrit-PatchSet: 4
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] SelectionVector: pad extra bits with zeroes in constructor

2019-04-15 Thread Adar Dembo (Code Review)
Adar Dembo has removed Kudu Jenkins from this change.  ( 
http://gerrit.cloudera.org:8080/13009 )

Change subject: SelectionVector: pad extra bits with zeroes in constructor
..


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/13009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I74baf87c7f38b7eea0ad46825c2421c99f8f42a1
Gerrit-Change-Number: 13009
Gerrit-PatchSet: 4
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] SelectionVector: pad extra bits with zeroes in constructor

2019-04-15 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13009 )

Change subject: SelectionVector: pad extra bits with zeroes in constructor
..


Patch Set 4: Verified+1

Overriding Jenkins, known flaky test failures.


--
To view, visit http://gerrit.cloudera.org:8080/13009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I74baf87c7f38b7eea0ad46825c2421c99f8f42a1
Gerrit-Change-Number: 13009
Gerrit-PatchSet: 4
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 15 Apr 2019 22:42:11 +
Gerrit-HasComments: No


[kudu-CR](gh-pages) [blog] a blogpost about location awareness in Kudu

2019-04-15 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12119 )

Change subject: [blog] a blogpost about location awareness in Kudu
..


Patch Set 10:

> Patch Set 9:
>
> Now that the design doc is merged the link can be updated.

Yes, indeed.  Thank you for kind reminder!


--
To view, visit http://gerrit.cloudera.org:8080/12119
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-MessageType: comment
Gerrit-Change-Id: I10b30a80d5661fb889a11285b8118cdea1a87cd2
Gerrit-Change-Number: 12119
Gerrit-PatchSet: 10
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Greg Solovyev 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Mon, 15 Apr 2019 22:22:21 +
Gerrit-HasComments: No


[kudu-CR](gh-pages) [blog] a blogpost about location awareness in Kudu

2019-04-15 Thread Alexey Serbin (Code Review)
Hello Will Berkeley, Greg Solovyev, Grant Henke, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12119

to look at the new patch set (#10).

Change subject: [blog] a blogpost about location awareness in Kudu
..

[blog] a blogpost about location awareness in Kudu

Change-Id: I10b30a80d5661fb889a11285b8118cdea1a87cd2
---
A _posts/2019-04-16-location-awareness.md
1 file changed, 372 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/19/12119/10
--
To view, visit http://gerrit.cloudera.org:8080/12119
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I10b30a80d5661fb889a11285b8118cdea1a87cd2
Gerrit-Change-Number: 12119
Gerrit-PatchSet: 10
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Greg Solovyev 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2466: implement three-heap merge algorithm

2019-04-15 Thread Adar Dembo (Code Review)
Adar Dembo has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/12947 )

Change subject: KUDU-2466: implement three-heap merge algorithm
..

KUDU-2466: implement three-heap merge algorithm

This patch implements a new algorithm for efficient merging in
MergeIterator. The algorithm is documented extensively in the code so
there's no point in recapping it here. There's also a high-level overview
with pseudocode available[1].

I microbenchmarked the old implementation against the new one, using both
overlapping and non-overlapping inputs with a varying number of lists and
1 rows per list (see generic_iterators-test). Here are the scan times,
averaged across five runs:

Parameters  | old  | new  | diff
+--+--+--
overlapping, 10 lists   | 0.015s   | 0.0522s  | +0.0372   (0.29x)
overlapping, 100 lists  | 1.0798s  | 1.1024s  | +0.0226   (0.98x)
overlapping, 1000 lists | 184.245s | 22.8156s | -161.4294 (8.07x)
non-overlapping, 10 lists   | 0.0126s  | 0.0196s  | +0.007(0.64x)
non-overlapping, 100 lists  | 0.5038s  | 0.129s   | -0.3748   (3.91x)
non-overlapping, 1000 lists | 89.8626s | 0.9874s  | -88.8752  (91x)
+--+--+--

The goal was to optimize ORDERED scans for large and mostly compacted
tablets, and the new algorithm does just that. With smaller input, the
overhead of using heaps becomes more pronounced. Overlapping input still
benefits from heap-based merging, but the overlapping defeats the hot vs.
cold optimization provided by the algorithm.

To see how it performed against real data, I tested the algorithm against a
mostly compacted "representative" 40G tablet with 1294 rowsets, all stored
on one disk. I used a new CLI tool that does a full tablet scan without
sending any data over the wire. I ran the tool three times: first with
UNORDERED scans, then with ORDERED scans using the old algorithm, and
finally with ORDERED scans using the new algorithm. Each run did six scans;
to account for any caching effects, I dropped the first scan. Results:

Parameters   | Average run time (s) | Peak RSS (kb)
-+--+--
UNORDERED:   | 232  | 710320
ORDERED, old | 33787| 4465532
ORDERED, new | 979  | 749440
-+--+--

Lastly, the new algorithm opens the door to another optimization: while
there's just one "hot" sub-iterator, we can copy data block-by-block instead
of row-by-row. I'll implement this in a follow-up.

1. 
https://docs.google.com/document/d/1uP0ubjM6ulnKVCRrXtwT_dqrTWjF9tlFSRk0JN2e_O0/edit#

Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d
Reviewed-on: http://gerrit.cloudera.org:8080/12947
Reviewed-by: Mike Percy 
Tested-by: Kudu Jenkins
---
M src/kudu/common/encoded_key.cc
M src/kudu/common/generic_iterators-test.cc
M src/kudu/common/generic_iterators.cc
M src/kudu/common/generic_iterators.h
M src/kudu/common/schema.cc
M src/kudu/common/schema.h
6 files changed, 424 insertions(+), 131 deletions(-)

Approvals:
  Mike Percy: Looks good to me, approved
  Kudu Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/12947
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d
Gerrit-Change-Number: 12947
Gerrit-PatchSet: 11
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] [backup] Add initial incremental backup/restore support

2019-04-15 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12879 )

Change subject: [backup] Add initial incremental backup/restore support
..


Patch Set 4: Code-Review+1

Would be good to get another set of eyes on this, preferably with Scala and/or 
Spark expertise.


--
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 4
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Mon, 15 Apr 2019 21:27:02 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2466: implement three-heap merge algorithm

2019-04-15 Thread Adar Dembo (Code Review)
Hello Tidy Bot, Mike Percy, Kudu Jenkins, Grant Henke, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12947

to look at the new patch set (#10).

Change subject: KUDU-2466: implement three-heap merge algorithm
..

KUDU-2466: implement three-heap merge algorithm

This patch implements a new algorithm for efficient merging in
MergeIterator. The algorithm is documented extensively in the code so
there's no point in recapping it here. There's also a high-level overview
with pseudocode available[1].

I microbenchmarked the old implementation against the new one, using both
overlapping and non-overlapping inputs with a varying number of lists and
1 rows per list (see generic_iterators-test). Here are the scan times,
averaged across five runs:

Parameters  | old  | new  | diff
+--+--+--
overlapping, 10 lists   | 0.015s   | 0.0522s  | +0.0372   (0.29x)
overlapping, 100 lists  | 1.0798s  | 1.1024s  | +0.0226   (0.98x)
overlapping, 1000 lists | 184.245s | 22.8156s | -161.4294 (8.07x)
non-overlapping, 10 lists   | 0.0126s  | 0.0196s  | +0.007(0.64x)
non-overlapping, 100 lists  | 0.5038s  | 0.129s   | -0.3748   (3.91x)
non-overlapping, 1000 lists | 89.8626s | 0.9874s  | -88.8752  (91x)
+--+--+--

The goal was to optimize ORDERED scans for large and mostly compacted
tablets, and the new algorithm does just that. With smaller input, the
overhead of using heaps becomes more pronounced. Overlapping input still
benefits from heap-based merging, but the overlapping defeats the hot vs.
cold optimization provided by the algorithm.

To see how it performed against real data, I tested the algorithm against a
mostly compacted "representative" 40G tablet with 1294 rowsets, all stored
on one disk. I used a new CLI tool that does a full tablet scan without
sending any data over the wire. I ran the tool three times: first with
UNORDERED scans, then with ORDERED scans using the old algorithm, and
finally with ORDERED scans using the new algorithm. Each run did six scans;
to account for any caching effects, I dropped the first scan. Results:

Parameters   | Average run time (s) | Peak RSS (kb)
-+--+--
UNORDERED:   | 232  | 710320
ORDERED, old | 33787| 4465532
ORDERED, new | 979  | 749440
-+--+--

Lastly, the new algorithm opens the door to another optimization: while
there's just one "hot" sub-iterator, we can copy data block-by-block instead
of row-by-row. I'll implement this in a follow-up.

1. 
https://docs.google.com/document/d/1uP0ubjM6ulnKVCRrXtwT_dqrTWjF9tlFSRk0JN2e_O0/edit#

Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d
---
M src/kudu/common/encoded_key.cc
M src/kudu/common/generic_iterators-test.cc
M src/kudu/common/generic_iterators.cc
M src/kudu/common/generic_iterators.h
M src/kudu/common/schema.cc
M src/kudu/common/schema.h
6 files changed, 424 insertions(+), 131 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/47/12947/10
--
To view, visit http://gerrit.cloudera.org:8080/12947
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d
Gerrit-Change-Number: 12947
Gerrit-PatchSet: 10
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] bitmap: add copy operation

2019-04-15 Thread Adar Dembo (Code Review)
Hello Mike Percy, Kudu Jenkins, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13007

to look at the new patch set (#3).

Change subject: bitmap: add copy operation
..

bitmap: add copy operation

This can be used to copy an arbitrary subsection of a bitmap into another
bitmap, overwriting its contents.

Change-Id: I9c1206306d7e06f6a3eb6854d1a8b4cc09600ca0
---
M src/kudu/common/partial_row.cc
M src/kudu/util/bitmap-test.cc
M src/kudu/util/bitmap.cc
M src/kudu/util/bitmap.h
4 files changed, 130 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/13007/3
--
To view, visit http://gerrit.cloudera.org:8080/13007
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9c1206306d7e06f6a3eb6854d1a8b4cc09600ca0
Gerrit-Change-Number: 13007
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] generic iterators: refactor MergeIterator for whole-block-copy optimization

2019-04-15 Thread Adar Dembo (Code Review)
Hello Mike Percy, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13010

to look at the new patch set (#4).

Change subject: generic_iterators: refactor MergeIterator for whole-block-copy 
optimization
..

generic_iterators: refactor MergeIterator for whole-block-copy optimization

This patch refactors MergeIterator::MaterializeBlock in preparation for the
whole-block-copy optimization. Specifically, the "copy next row" and
"advance iter and rebalance heaps" operations have been decomposed into
standalone functions so that it'll be easier to add "copy whole block" as
an equivalent function in the future.

Also of note: we no longer set the entire client selection vector up front,
as this won't necessarily be the case when we copy a block.

Change-Id: I050116edc51bb3e91852e6286c221175770e6382
---
M src/kudu/common/generic_iterators.cc
1 file changed, 135 insertions(+), 130 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/10/13010/4
--
To view, visit http://gerrit.cloudera.org:8080/13010
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I050116edc51bb3e91852e6286c221175770e6382
Gerrit-Change-Number: 13010
Gerrit-PatchSet: 4
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] generic iterators: MergeIterator whole block copy optimization

2019-04-15 Thread Adar Dembo (Code Review)
Hello Mike Percy, Kudu Jenkins, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13011

to look at the new patch set (#4).

Change subject: generic_iterators: MergeIterator whole block copy optimization
..

generic_iterators: MergeIterator whole block copy optimization

This patch adds the "whole block copy" optimization to the MergeIterator.
The idea is simple: whenever there's only one sub-iterator in the merge
window, we can copy the entire block out of that sub-iterator instead of
copying row-by-row.

The challenging aspect was properly handling the client's SelectionVector:
- When copying row-by-row, the MergeIterator must skip deselected rows in
  order to always return the next smallest row. The MergeIterState
  bookkeeping helped enforce this invariant.
- When copying block-by-block, skipping deselected rows is harder (and
  potentially less performant) than the simple bitmap copy we currently do.
  Plus it's not necessary for correctness; the scan endpoint will skip
  deselected rows when serializing the RowBlock.

So I opted to retain deselected rows in the block-by-block case, and updated
the MergeIterState bookkeeping to cope.

I also changed the default RowBlock sizes in various merge-related paths to
be a power of 2. This should increase the likelihood of hitting the
BitmapCopy "fast path" during a RowBlock copy, though it doesn't guarantee
it (e.g. consider a merge where two sub-iterators overlap for 63/128 rows,
then one sub-iterator has an additional 65 rows: the BitmapCopy operations
in the block copy could not be byte-aligned). It yielded a slight
improvement in microbenchmarks, though not in the macrobenchmark.

Finally, I tweaked the heuristic for whole-block-copying such that it only
happens when there's more than one row to copy. I don't totally buy the
rationale, but it did yield an improvement in both the micro and macro
benchmarks, so I must be onto something.

Below are the micro and macrobenchmark results. The microbenchmark was
generic_iterators-test's overlapping and non-overlapping MergeIterator tests
with 10 iterations, 1000 lists, and 1 rows per list, collecting the
wallclock time to scan and averaging it across the runs. The macrobenchmark
was running 'kudu perf tablet_scan' on a representative 40GB tablet six
times, dropping the first time (to reduce cache effects), and averaging the
remaining wallclock times.

no whole-block-copy:
- non-overlapping: 0.9437
- overlapping: 9.5113
- representative tablet: 786.732

whole-block-copy, no power-of-2 RowBlocks
- non-overlapping: 0.6301
- overlapping: 9.7518
- representative tablet: 751.297

whole-block-copy, power-of-2 RowBlocks
- non-overlapping: 0.5316
- overlapping: 9.5718
- representative tablet: 754.961

whole-block-copy, power-of-2 RowBlocks, only copy when >1 rows left:
- non-overlapping: 0.5247
- overlapping: 9.1287
- representative tablet: 720.534

Todd and I discussed another possible optimization which I want to note here
for posterity: rather than copying blocks, we could "attach" the data to the
client's RowbBlock and serialize it to the wire directly. The attachment
could be RowBlock-based by allowing RowBlocks to work like a refcounted
"iovec", or it could be done more deeply to enable direct serialization of
cached decoder data. Either way, anything that helps us avoid copying data
is likely to be a performance win.

Change-Id: I3a4b56169644f35f1aa8aef27d4af0ce4ec0792d
---
M src/kudu/common/generic_iterators-test.cc
M src/kudu/common/generic_iterators.cc
M src/kudu/common/rowblock.h
M src/kudu/tserver/tablet_service.cc
4 files changed, 129 insertions(+), 62 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/11/13011/4
--
To view, visit http://gerrit.cloudera.org:8080/13011
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3a4b56169644f35f1aa8aef27d4af0ce4ec0792d
Gerrit-Change-Number: 13011
Gerrit-PatchSet: 4
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] rowblock: add copying functionality

2019-04-15 Thread Adar Dembo (Code Review)
Hello Mike Percy, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13008

to look at the new patch set (#4).

Change subject: rowblock: add copying functionality
..

rowblock: add copying functionality

This patch adds RowBlock::CopyTo, a function that enables copying of row
data between RowBlocks. It's a building block for the "whole block copy"
MergeIterator optimization, wherein part of a (or an entire) sub-iterator
RowBlock is copied to the client's RowBlock.

Change-Id: I735796f11e3a388ffc66e3d92f8c2097cdec3a91
---
M src/kudu/common/CMakeLists.txt
M src/kudu/common/columnblock-test.cc
A src/kudu/common/columnblock.cc
M src/kudu/common/columnblock.h
M src/kudu/common/rowblock.cc
M src/kudu/common/rowblock.h
6 files changed, 262 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/08/13008/4
--
To view, visit http://gerrit.cloudera.org:8080/13008
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I735796f11e3a388ffc66e3d92f8c2097cdec3a91
Gerrit-Change-Number: 13008
Gerrit-PatchSet: 4
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] SelectionVector: pad extra bits with zeroes in constructor

2019-04-15 Thread Adar Dembo (Code Review)
Hello Mike Percy, Kudu Jenkins, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/13009

to look at the new patch set (#4).

Change subject: SelectionVector: pad extra bits with zeroes in constructor
..

SelectionVector: pad extra bits with zeroes in constructor

I found this after removing the MergeIterator's call to SetAllTrue(). It
turns out that if you don't call any functions that set all bytes en masse,
CountSelected() and AnySelected() will misbehave as they'll read garbage
data from any trailing bits beyond the logical end of the bitmap.

We can fix this in one of two ways:
- Modify CountSelected()/AnySelected() to ignore the trailing bits.
- Zero out the trailing bits in the constructor.

I opted for the second approach as I found it easier to implement, and I
suspect it's more performant than the first.

Change-Id: I74baf87c7f38b7eea0ad46825c2421c99f8f42a1
---
M src/kudu/common/rowblock-test.cc
M src/kudu/common/rowblock.cc
M src/kudu/common/rowblock.h
3 files changed, 41 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/09/13009/4
--
To view, visit http://gerrit.cloudera.org:8080/13009
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I74baf87c7f38b7eea0ad46825c2421c99f8f42a1
Gerrit-Change-Number: 13009
Gerrit-PatchSet: 4
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] generic iterators: pass rowset bounds into grouping iterators

2019-04-15 Thread Adar Dembo (Code Review)
Adar Dembo has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/12946 )

Change subject: generic_iterators: pass rowset bounds into grouping iterators
..

generic_iterators: pass rowset bounds into grouping iterators

The rowset bounds will be used to reduce MergeIterator memory consumption by
restricting the "eager RowBlock fetching" behavior at Init time to only
those sub-iterators whose rowsets could conceivably participate in the
beginning of the merge.

Change-Id: Id1aebd84507f87c869d781e117225c37c8f15969
Reviewed-on: http://gerrit.cloudera.org:8080/12946
Reviewed-by: Mike Percy 
Tested-by: Kudu Jenkins
---
M src/kudu/common/generic_iterators-test.cc
M src/kudu/common/generic_iterators.cc
M src/kudu/common/generic_iterators.h
M src/kudu/tablet/rowset.cc
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
7 files changed, 171 insertions(+), 83 deletions(-)

Approvals:
  Mike Percy: Looks good to me, approved
  Kudu Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/12946
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Id1aebd84507f87c869d781e117225c37c8f15969
Gerrit-Change-Number: 12946
Gerrit-PatchSet: 8
Gerrit-Owner: Adar Dembo 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] [backup] Add initial incremental backup/restore support

2019-04-15 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12879 )

Change subject: [backup] Add initial incremental backup/restore support
..


Patch Set 4: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 4
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Mon, 15 Apr 2019 20:34:10 +
Gerrit-HasComments: No


[kudu-CR] [backup] Add initial incremental backup/restore support

2019-04-15 Thread Grant Henke (Code Review)
Grant Henke has removed a vote on this change.

Change subject: [backup] Add initial incremental backup/restore support
..


Removed Verified-1 by Kudu Jenkins (120)
--
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 4
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] [backup] Add initial incremental backup/restore support

2019-04-15 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12879 )

Change subject: [backup] Add initial incremental backup/restore support
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@100
PS2, Line 100:   // Get the operation type based on the change type 
column.
 :   // This will always be the last column in the row.
 :   val rowActionValue = row.getByte(row.length - 1)
> OK makes sense. Would like for design decisions like these to be documented
I added detailed documentation to the RowAction enum.



--
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Mon, 15 Apr 2019 20:05:55 +
Gerrit-HasComments: Yes


[kudu-CR] [backup] Add initial incremental backup/restore support

2019-04-15 Thread Grant Henke (Code Review)
Hello Will Berkeley, Mike Percy, Kudu Jenkins, Adar Dembo,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12879

to look at the new patch set (#4).

Change subject: [backup] Add initial incremental backup/restore support
..

[backup] Add initial incremental backup/restore support

This patch adds initial support for incremental backup
and restore.

A high level overview of the changes in this patch is:
- Added a version to the TableMetadata for future use.
- Broke out io/layout logic to a SessionIO class so it
   could be easily shared.
- Unified the BackupOptions and RestoreOptions
   so common options could be shared.
- Introduced a BackupGraph class to handle chaining
   together backups for backup and restore jobs.
- Enhanced the BackupRDD to output an additional
   RowAction byte column on backup and restore.
- Enhanced the restore job to use the new RowAction
   column and translate them into operations for
   incremental restore jobs.
- Added the ability to restore to a given “time” on a
   per backup basis.
- Added example usage docs to the Backup and Restore
   classes.
- Added hasIsDeleted to RowResult and hasColumn to
   Schema.

Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
---
M java/kudu-backup/src/main/protobuf/backup.proto
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
D java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
D 
java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestoreOptions.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/RowAction.java
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/SessionIO.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/TableMetadata.scala
A java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestBackupGraph.scala
M java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala
M java/kudu-client/src/main/java/org/apache/kudu/Schema.java
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
M java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/RowConverter.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduTestSuite.scala
18 files changed, 1,127 insertions(+), 396 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/79/12879/4
--
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 4
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] [backup] Add initial incremental backup/restore support

2019-04-15 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12879 )

Change subject: [backup] Add initial incremental backup/restore support
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@100
PS2, Line 100:   // Generate an operation based on the row action.
 :   val operation = rowAction match {
 : case RowAction.UPSERT => table.newUpsert()
> > why bother converting is_deleted in KuduBackupRDD at all?
OK makes sense. Would like for design decisions like these to be documented in 
the code somewhere though.


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala@37
PS2, Line 37: tables: Seq[String],
: rootPath: String,
: kuduMasterAddresses: String = 
InetAddress.getLocalHost.getCanonicalHostName,
> The benefit of the trait is seen in SessionIO, I can use BackupOptions as a
I did indeed miss that, thanks.



--
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Mon, 15 Apr 2019 19:39:39 +
Gerrit-HasComments: Yes


[kudu-CR] [backup] Add initial incremental backup/restore support

2019-04-15 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12879 )

Change subject: [backup] Add initial incremental backup/restore support
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@100
PS2, Line 100:   // Get the operation type based on the change type 
column.
 :   // This will always be the last column in the row.
 :   val rowActionValue = row.getByte(row.length - 1)
> why bother converting is_deleted in KuduBackupRDD at all?

I could represent a boolean with a byte or an enum in that same byte of space, 
I decided to use an enum since it seemed more flexible for an on-disk format 
which we would need to support long term.

I imagine we could use RowAction to support INSERT in the future or perhaps 
UPDATE if we have full fidelity and sparse row backup support. If go the 
boolean route, any new feature would need a new column.


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala@37
PS2, Line 37: tables: Seq[String],
: rootPath: String,
: kuduMasterAddresses: String = 
InetAddress.getLocalHost.getCanonicalHostName,
> Hmm, not really seeing the point of the trait then, at least not in terms o
The benefit of the trait is seen in SessionIO, I can use BackupOptions as a 
parameter for functions that are used in both the Backup and Restore jobs.



--
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Mon, 15 Apr 2019 19:23:15 +
Gerrit-HasComments: Yes


[kudu-CR] [backup] Add initial incremental backup/restore support

2019-04-15 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12879 )

Change subject: [backup] Add initial incremental backup/restore support
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@100
PS2, Line 100:   // Generate an operation based on the row action.
 :   val operation = rowAction match {
 : case RowAction.UPSERT => table.newUpsert()
> > Does Scala allow returning tuples like in Python?
OK, I was missing two things here:
1. There's an extra layer of indirection between this and the iterator in 
KuduBackupRDD, and the internal row is the only mechanism to pass data from one 
to the other.
2. More importantly, KuduBackupRDD is the read-side of the equation while this 
is the write-side; there's a persistence step in between, and we need to 
serialize the row action in order for it to survive the persistence.

Next question: why bother converting is_deleted in KuduBackupRDD at all? Why 
not persist it as-is and convert it into a row action on-the-fly during 
restore? Then you won't need a persistence model for RowAction; it could be a 
simple enum, or maybe not exist at all (you could switch here on the value of 
the boolean is_deleted column).


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala@37
PS2, Line 37: tables: Seq[String],
: rootPath: String,
: kuduMasterAddresses: String = 
InetAddress.getLocalHost.getCanonicalHostName,
> The CommonOptions trait is specifying that BackupOptions/RestoreOptions nee
Hmm, not really seeing the point of the trait then, at least not in terms of 
reducing LOC. Is it used somewhere else in the patch that I missed?



--
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Mon, 15 Apr 2019 19:17:10 +
Gerrit-HasComments: Yes


[kudu-CR] [backup] Add initial incremental backup/restore support

2019-04-15 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12879 )

Change subject: [backup] Add initial incremental backup/restore support
..


Patch Set 3:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/protobuf/backup.proto
File java/kudu-backup/src/main/protobuf/backup.proto:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/protobuf/backup.proto@118
PS2, Line 118:   PartitionMetadataPB partitions = 8;
> Skipped 8?
Done


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@163
PS2, Line 163:  * Node class to represent nodes in the backup graph.
> Curious why you went with the noun "vertex"; isn't "node" the more commonly
No specific reason, will change to Node.


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@35
PS2, Line 35:  */
> Nit: Javadoc not quite formatted properly?
Done


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@58
PS2, Line 58:   // use the `to_ms` metadata as the `from_ms` time for this 
backup.
> Nit: too long?
Done


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@144
PS2, Line 144: val columnCount = if (incremental) fieldCount - 1 else 
fieldCount
> This function is called per-row, right? I wonder if it'd be more performant
We could assume the column count is static across the life of an RDD and 
memoize this calculation, but the existing implementation didn't do that, so I 
kept the behavior the same. The additional if check should be cheap relative to 
the other work in the method.


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@39
PS2, Line 39:  */
> Formatting?
Done


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@54
PS2, Line 54: // TODO: Make parallel so each table isn't processed serially.
: options.tables.foreach { tableName =>
:   val graph = io.readBackupGraph(tableName).filterByTim
> Did you test what effect this has? If it may have a drastic impact on perfo
This wasn't meant to be included, I was experimenting with it. I will remove it.


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@100
PS2, Line 100:   // Generate an operation based on the row action.
 :   val operation = rowAction match {
 : case RowAction.UPSERT => table.newUpsert()
> Does Scala allow returning tuples like in Python?

Yes it does

> Seems like it'd be more elegant to return the row action separately from the 
> row itself.

The Row abstraction is a result of the Spark framework. It expects Dataframes 
to work with Rows and is less flexible than the raw RDD API.


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala
File java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala:

http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala@37
PS2, Line 37: tables: Seq[String],
: rootPath: String,
: kuduMasterAddresses: String = 
InetAddress.getLocalHost.getCanonicalHostName,
> Why do these three need to be listed here and in RestoreOptions if both ext
The CommonOptions trait is specifying that BackupOptions/RestoreOptions needs 
to implement a value function for tables, rootPath, and  kuduMasterAddresses. 
But it doesn't specify "how". BackupOptions/RestoreOptions implement those 
functions via a constructor.


http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala@93
PS2, Line 93: // TODO: Document the limitations based on cluster 
configuration
> Nit: too long?
Done



[kudu-CR] [jenkins] fix collection of Java dist-test logs

2019-04-15 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13014 )

Change subject: [jenkins] fix collection of Java dist-test logs
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13014/1/build-support/jenkins/build-and-test.sh
File build-support/jenkins/build-and-test.sh:

PS1:
> As a (cleaner?) alternative, how about renaming test-output.txt in run_dist
Yep, I also thought about that.  It would be nice to get some input from Grant 
on this.  Basically, I want to understand whether keeping that test-output.txt 
pattern was something crucial somewhere else in dist-testing Java test 
scenarios.  Or that was the easiest way to go forward here?

Thanks!



--
To view, visit http://gerrit.cloudera.org:8080/13014
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9c3b537e3592f768b0f14687aaadaa00444550f7
Gerrit-Change-Number: 13014
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 15 Apr 2019 17:46:40 +
Gerrit-HasComments: Yes


[kudu-CR] [jenkins] fix collection of Java dist-test logs

2019-04-15 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13014 )

Change subject: [jenkins] fix collection of Java dist-test logs
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13014/1/build-support/jenkins/build-and-test.sh
File build-support/jenkins/build-and-test.sh:

PS1:
> Yep, I also thought about that.  It would be nice to get some input from Gr
I don't think there is any specific reason it was named test-output.txt. It was 
likely just an easy default choice. Changing it should be fine.



--
To view, visit http://gerrit.cloudera.org:8080/13014
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9c3b537e3592f768b0f14687aaadaa00444550f7
Gerrit-Change-Number: 13014
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 15 Apr 2019 17:52:57 +
Gerrit-HasComments: Yes


[kudu-CR] [backup] Add initial incremental backup/restore support

2019-04-15 Thread Grant Henke (Code Review)
Hello Will Berkeley, Mike Percy, Kudu Jenkins, Adar Dembo,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12879

to look at the new patch set (#3).

Change subject: [backup] Add initial incremental backup/restore support
..

[backup] Add initial incremental backup/restore support

This patch adds initial support for incremental backup
and restore.

A high level overview of the changes in this patch is:
- Added a version to the TableMetadata for future use.
- Broke out io/layout logic to a SessionIO class so it
   could be easily shared.
- Unified the BackupOptions and RestoreOptions
   so common options could be shared.
- Introduced a BackupGraph class to handle chaining
   together backups for backup and restore jobs.
- Enhanced the BackupRDD to output an additional
   RowAction byte column on backup and restore.
- Enhanced the restore job to use the new RowAction
   column and translate them into operations for
   incremental restore jobs.
- Added the ability to restore to a given “time” on a
   per backup basis.
- Added example usage docs to the Backup and Restore
   classes.
- Added hasIsDeleted to RowResult and hasColumn to
   Schema.

Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
---
M java/kudu-backup/src/main/protobuf/backup.proto
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala
D java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala
D 
java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestoreOptions.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/RowAction.java
A java/kudu-backup/src/main/scala/org/apache/kudu/backup/SessionIO.scala
M java/kudu-backup/src/main/scala/org/apache/kudu/backup/TableMetadata.scala
A java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestBackupGraph.scala
M java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala
M java/kudu-client/src/main/java/org/apache/kudu/Schema.java
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java
M java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/RowConverter.scala
M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduTestSuite.scala
18 files changed, 1,112 insertions(+), 396 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/79/12879/3
--
To view, visit http://gerrit.cloudera.org:8080/12879
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890
Gerrit-Change-Number: 12879
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR](gh-pages) [blog] a blogpost about location awareness in Kudu

2019-04-15 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12119 )

Change subject: [blog] a blogpost about location awareness in Kudu
..


Patch Set 9:

Now that the design doc is merged the link can be updated.


--
To view, visit http://gerrit.cloudera.org:8080/12119
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: gh-pages
Gerrit-MessageType: comment
Gerrit-Change-Id: I10b30a80d5661fb889a11285b8118cdea1a87cd2
Gerrit-Change-Number: 12119
Gerrit-PatchSet: 9
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Greg Solovyev 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Mon, 15 Apr 2019 14:17:08 +
Gerrit-HasComments: No


[kudu-CR] [docs] updated docs w.r.t. collocation practices

2019-04-15 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12997 )

Change subject: [docs] updated docs w.r.t. collocation practices
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12997/4/docs/administration.adoc
File docs/administration.adoc:

http://gerrit.cloudera.org:8080/#/c/12997/4/docs/administration.adoc@821
PS4, Line 821: surface area for attacks using tablet servers' security 
vulnerabilities, if any.
> Both kudu-master and kudu-tserver currently runs under 'kudu' OS user.  Kud
Given the tserver and master share a large amount of code and dependencies, I 
suspect most, if not all, vulnerabilities found in the tserver would also apply 
to the master server.

I think we could suggest placing masters on a separate node as an option to 
reduce this risk, but I worry about  making it our default recommendation not 
to colocate masters and tservers. I am pretty sure this is the most common 
deployment today. For users deploying on physical machines, they would need 3 
extra physical nodes with very little load/utilization.



--
To view, visit http://gerrit.cloudera.org:8080/12997
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I88a38117751cfa436c1fd95598274fb8f01f04ea
Gerrit-Change-Number: 12997
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mitch Barnett 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Mon, 15 Apr 2019 14:03:55 +
Gerrit-HasComments: Yes


[kudu-CR] WIP master: use AuthzProvider to generate authz tokens

2019-04-15 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13013 )

Change subject: WIP master: use AuthzProvider to generate authz tokens
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/13013/4/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/13013/4/src/kudu/master/catalog_manager.cc@2762
PS4, Line 2762: RETURN_NOT_OK(token_signer->GenerateAuthzToken(
> Kudu signs tokens using SHA256 digest.  It's pretty fast, you can try to ru
Whoops, it seems it's too late and I need go to bed.  We use SHA256 just to 
create a digest we sign with private key, and that's cheap, sure.

As for actual signing, Kudu now uses 2048 bit RSA keys.  At my machine, below 
is the stats for 1 2.2 GHz Core i7 CPU core:

aserbin-MBP:kudu[java-dis-test-logs]$ openssl speed rsa2048
Doing 2048 bit private rsa's for 10s: 14318 2048 bit private RSA's in 9.91s
Doing 2048 bit public rsa's for 10s: 315288 2048 bit public RSA's in 9.97s
OpenSSL 1.0.2r  26 Feb 2019
built on: reproducible build, date unspecified
options:bn(64,64) rc4(ptr,int) des(idx,cisc,16,int) aes(partial) idea(int) 
blowfish(idx)
compiler: /usr/bin/clang -I. -I.. -I../include  -fPIC -fno-common -DOPENSSL_PIC 
-DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -arch x86_64 
-O3 -DL_ENDIAN -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT 
-DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM 
-DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM 
-DGHASH_ASM -DECP_NISTZ256_ASM
  signverifysign/s verify/s
rsa 2048 bits 0.000692s 0.32s   1444.8  31623.7

So, around 1.4K tokens per second can be signed utilizing 1 CPU core, and in 
total that's a bit over 4.5K tokens/second at all 4 CPU cores of my macbook 
laptop.



--
To view, visit http://gerrit.cloudera.org:8080/13013
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5404d6437699bc6c7c8bb0e530b202109e8f166
Gerrit-Change-Number: 13013
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Mon, 15 Apr 2019 08:37:43 +
Gerrit-HasComments: Yes


[kudu-CR] WIP master: use AuthzProvider to generate authz tokens

2019-04-15 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13013 )

Change subject: WIP master: use AuthzProvider to generate authz tokens
..


Patch Set 4:

(3 comments)

I took a quick look, will take another look tomorrow morning.  Overall looks 
good, I just want to get a better understanding of the coverage that's in the 
new test.

http://gerrit.cloudera.org:8080/#/c/13013/4/src/kudu/master/catalog_manager.cc
File src/kudu/master/catalog_manager.cc:

http://gerrit.cloudera.org:8080/#/c/13013/4/src/kudu/master/catalog_manager.cc@2755
PS4, Line 2755: Should we send back
  : // an error that the client can retry, e.g. if Sentry was 
down?
> Looking through some DDL authz code, seems like if Sentry isn't available,
Yep, Hao and I discussed that a bit in the context of 
https://gerrit.cloudera.org/c/12877/1/src/kudu/integration-tests/master-stress-test.cc#118
 comment.

I'm curious what's Hao stance on this.


http://gerrit.cloudera.org:8080/#/c/13013/4/src/kudu/master/catalog_manager.cc@2762
PS4, Line 2762: RETURN_NOT_OK(token_signer->GenerateAuthzToken(
> Stepping through the code, most of it is shuttling strings and ints around,
Kudu signs tokens using SHA256 digest.  It's pretty fast, you can try to run 
'openssl speed sha256' and see your numbers.  In my case (2.2 GHz Intel Core 
i7, 4 cores, 256KB L2 cache per core), assuming our tokens are of size 256 
bytes, gives a range roughly 2M tokens per CPU core signed per second.  So, my 
machine would be able to sign about 8M tokens (of size 256 bytes) per second.  
I think that's not bad for a macBook laptop.

I'm not sure it makes sense to cache those instead of creating new ones: it 
seems to be cheap.

aserbin-MBP:master[java-dis-test-logs]$ openssl speed sha256
Doing sha256 for 3s on 16 size blocks: 13543954 sha256's in 2.99s
Doing sha256 for 3s on 64 size blocks: 7386135 sha256's in 2.96s
Doing sha256 for 3s on 256 size blocks: 3480055 sha256's in 2.99s
Doing sha256 for 3s on 1024 size blocks: 1059385 sha256's in 2.99s
Doing sha256 for 3s on 8192 size blocks: 137842 sha256's in 2.97s
OpenSSL 1.0.2r  26 Feb 2019
built on: reproducible build, date unspecified
options:bn(64,64) rc4(ptr,int) des(idx,cisc,16,int) aes(partial) idea(int) 
blowfish(idx)
compiler: /usr/bin/clang -I. -I.. -I../include  -fPIC -fno-common -DOPENSSL_PIC 
-DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -arch x86_64 
-O3 -DL_ENDIAN -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT 
-DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM 
-DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM 
-DGHASH_ASM -DECP_NISTZ256_ASM
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes256 bytes   1024 bytes   8192 bytes
sha256   72476.01k   159700.22k   297957.89k   362812.79k   380202.58k


http://gerrit.cloudera.org:8080/#/c/13013/4/src/kudu/util/random_util.h
File src/kudu/util/random_util.h:

http://gerrit.cloudera.org:8080/#/c/13013/4/src/kudu/util/random_util.h@59
PS4, Line 59: c.size() - min_to_return
nit: maybe, just have (c.size() + 1 - min_to_return) here and remove the 'if 
(min_to_return == c.size())' short-circuit above?  That way you don't need to 
think about the difference in the contract of this method and ReservoirSample 
that might be induced by the short-circuiting above.



--
To view, visit http://gerrit.cloudera.org:8080/13013
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5404d6437699bc6c7c8bb0e530b202109e8f166
Gerrit-Change-Number: 13013
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Mon, 15 Apr 2019 07:53:58 +
Gerrit-HasComments: Yes