[kudu-CR] [backup] Add initial incremental backup/restore support
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/12879 ) Change subject: [backup] Add initial incremental backup/restore support .. Patch Set 4: Code-Review+1 (9 comments) Looks good. http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala: http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@33 PS4, Line 33: private val adjacencyList = mutable.Map[Long, mutable.ListBuffer[BackupNode]]() Please add a comment the describing key -> val mapping. http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@34 PS4, Line 34: private val nodes = mutable.Set[Long]() Needs comment. Is 'nodes' used for anything? http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@37 PS4, Line 37: private val fullBackupFromMs = 0 style nit: this looks like a constant, so should be named with InitialCaps? http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@90 PS4, Line 90: adjacencyList(fromMs).flatMap { node => : allPaths(node.metadata.getToMs, path ++ List(node)) Wow, Scala is fancy. http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@149 PS4, Line 149: represents what the graph would look like :* at the passed point in time. suggestion: that represents the graph including only nodes with a ToTime equal to or less than the specified time http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@199 PS4, Line 199: def pathString: String = backups.map(_.metadata.getFromMs).mkString(" -> ") This is amazingly terse. Scala has built in method for everything. http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala: http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@66 PS4, Line 66: } else { We should have a --force-incremental to ensure that a backup administrator can guarantee some kind of SLA if they can't afford the resources to do a full backup and would rather get an error. OK to add this as a TODO item though. http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala: http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@56 PS4, Line 56: val graph = io.readBackupGraph(tableName).filterByTime(options.timestampMs) What do you think about adding a TODO to add an option for an exact match on "to" timestamp instead of the best match based on timestamp? http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-client/src/main/java/org/apache/kudu/Schema.java File java/kudu-client/src/main/java/org/apache/kudu/Schema.java: http://gerrit.cloudera.org:8080/#/c/12879/4/java/kudu-client/src/main/java/org/apache/kudu/Schema.java@206 PS4, Line 206: Integer index = this.columnsByName.get(columnName); : return index != null; nit: how about: return columnsByName.containsKey(columnName); -- To view, visit http://gerrit.cloudera.org:8080/12879 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890 Gerrit-Change-Number: 12879 Gerrit-PatchSet: 4 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Tue, 16 Apr 2019 04:07:28 + Gerrit-HasComments: Yes
[kudu-CR] log block manager: reenable dead container deletion at runtime
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13021 ) Change subject: log_block_manager: reenable dead container deletion at runtime .. Patch Set 1: Code-Review+1 (1 comment) http://gerrit.cloudera.org:8080/#/c/13021/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13021/1//COMMIT_MSG@26 PS1, Line 26: 1. http://dist-test.cloudera.org:8080/download_log?c195c2b0-2845-11e9-a3f1-0242ac110002 Not sure about anyone else, but I get a 404 when I click this. This one works for me though: http://dist-test.cloudera.org:8080/diagnose?key=c195c2b0-2845-11e9-a3f1-0242ac110002 -- To view, visit http://gerrit.cloudera.org:8080/13021 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icd8742c150e1447e7838d9f86173536b5726cd40 Gerrit-Change-Number: 13021 Gerrit-PatchSet: 1 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: helifu Gerrit-Comment-Date: Tue, 16 Apr 2019 03:23:16 + Gerrit-HasComments: Yes
[kudu-CR] WIP [docs] updated master-related scenarios w.r.t. CM
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13022 ) Change subject: WIP [docs] updated master-related scenarios w.r.t. CM .. Patch Set 1: Verified+1 unrelated test flake -- To view, visit http://gerrit.cloudera.org:8080/13022 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I96bf3ec0de07a46151f947a18b9466da5f05adb3 Gerrit-Change-Number: 13022 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Tue, 16 Apr 2019 03:06:10 + Gerrit-HasComments: No
[kudu-CR] WIP [docs] updated master-related scenarios w.r.t. CM
Alexey Serbin has removed a vote on this change. Change subject: WIP [docs] updated master-related scenarios w.r.t. CM .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/13022 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I96bf3ec0de07a46151f947a18b9466da5f05adb3 Gerrit-Change-Number: 13022 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Attila Bukor Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Will Berkeley
[kudu-CR] remove outdated information from consensus.txt
Jia Hongchao has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13023 Change subject: remove outdated information from consensus.txt .. remove outdated information from consensus.txt It says "For the details of the write-ahead-log (WAL) message format, see the README file in this directory.", however, there is no README file there. I think it's better to remove outdated information. Change-Id: I4ae6b4dc92477d54f98ac652b35666a2c21a63a2 --- M src/kudu/consensus/consensus.txt 1 file changed, 1 insertion(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/23/13023/1 -- To view, visit http://gerrit.cloudera.org:8080/13023 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I4ae6b4dc92477d54f98ac652b35666a2c21a63a2 Gerrit-Change-Number: 13023 Gerrit-PatchSet: 1 Gerrit-Owner: Jia Hongchao
[kudu-CR] add document for KUDU-2080
Jia Hongchao has posted comments on this change. ( http://gerrit.cloudera.org:8080/12774 ) Change subject: add document for KUDU-2080 .. Patch Set 2: Sorry for the delay, it is my first time to use Gerrit.I thought there is nothing for me to do. -- To view, visit http://gerrit.cloudera.org:8080/12774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a802a846ad5ec93ce4e0022ec279f1b4c6cc5db Gerrit-Change-Number: 12774 Gerrit-PatchSet: 2 Gerrit-Owner: Jia Hongchao Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Greg Solovyev Gerrit-Reviewer: Jia Hongchao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 16 Apr 2019 02:27:05 + Gerrit-HasComments: No
[kudu-CR] add document for KUDU-2080
Hello Kudu Jenkins, Adar Dembo, Grant Henke, Greg Solovyev, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12774 to look at the new patch set (#2). Change subject: add document for KUDU-2080 .. add document for KUDU-2080 Change-Id: I7a802a846ad5ec93ce4e0022ec279f1b4c6cc5db --- M docs/known_issues.adoc 1 file changed, 2 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/74/12774/2 -- To view, visit http://gerrit.cloudera.org:8080/12774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7a802a846ad5ec93ce4e0022ec279f1b4c6cc5db Gerrit-Change-Number: 12774 Gerrit-PatchSet: 2 Gerrit-Owner: Jia Hongchao Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Greg Solovyev Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] WIP [docs] updated master-related scenarios w.r.t. CM
Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13022 Change subject: WIP [docs] updated master-related scenarios w.r.t. CM .. WIP [docs] updated master-related scenarios w.r.t. CM WIP: Do we need to mention CM in the upstream doc at all? Maybe, we should remove all CM-related references from this doc? Change-Id: I96bf3ec0de07a46151f947a18b9466da5f05adb3 --- M docs/administration.adoc 1 file changed, 8 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/22/13022/1 -- To view, visit http://gerrit.cloudera.org:8080/13022 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I96bf3ec0de07a46151f947a18b9466da5f05adb3 Gerrit-Change-Number: 13022 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin
[kudu-CR] log block manager: reenable dead container deletion at runtime
Hello Andrew Wong, helifu, Hao Hao, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/13021 to review the following change. Change subject: log_block_manager: reenable dead container deletion at runtime .. log_block_manager: reenable dead container deletion at runtime A quick timeline: 1. The feature was added and its behavior enabled. 2. Occasionally, block_manager-stress-test would SIGSEGV [1]. These failures showed up in the flaky test dashboard. 3. So, a feature flag was added in commit 8dc7904. The flag was false by default, only set to true for a handful of tests. 4. In commit 3e0fcc9, the flag was set to true in block_manager-stress-test so that we could continue to debug the issue. 5. Except that it hasn't surfaced since then; the last block_manager-stress-test failure was on 02/03/19. It's a mystery as to why there haven't been any block_manager-stress-test failures since #4. Maybe something isn't working quite right with the feature flag, or maybe the failure was due to something else. Either way, let's try to enable the feature again, since it's still early in the 1.10 release cycle. 1. http://dist-test.cloudera.org:8080/download_log?c195c2b0-2845-11e9-a3f1-0242ac110002 Change-Id: Icd8742c150e1447e7838d9f86173536b5726cd40 --- M src/kudu/fs/block_manager-stress-test.cc M src/kudu/fs/log_block_manager-test.cc M src/kudu/fs/log_block_manager.cc 3 files changed, 1 insertion(+), 15 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/13021/1 -- To view, visit http://gerrit.cloudera.org:8080/13021 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Icd8742c150e1447e7838d9f86173536b5726cd40 Gerrit-Change-Number: 13021 Gerrit-PatchSet: 1 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: helifu
[kudu-CR] add document for KUDU-2080
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12774 ) Change subject: add document for KUDU-2080 .. Patch Set 1: Jia, are you planning to address the feedback? -- To view, visit http://gerrit.cloudera.org:8080/12774 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a802a846ad5ec93ce4e0022ec279f1b4c6cc5db Gerrit-Change-Number: 12774 Gerrit-PatchSet: 1 Gerrit-Owner: Jia Hongchao Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Greg Solovyev Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Mon, 15 Apr 2019 23:45:58 + Gerrit-HasComments: No
[kudu-CR] generic iterators: MergeIterator whole block copy optimization
Adar Dembo has removed Kudu Jenkins from this change. ( http://gerrit.cloudera.org:8080/13011 ) Change subject: generic_iterators: MergeIterator whole block copy optimization .. Removed reviewer Kudu Jenkins with the following votes: * Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/13011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteReviewer Gerrit-Change-Id: I3a4b56169644f35f1aa8aef27d4af0ce4ec0792d Gerrit-Change-Number: 13011 Gerrit-PatchSet: 4 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] generic iterators: MergeIterator whole block copy optimization
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13011 ) Change subject: generic_iterators: MergeIterator whole block copy optimization .. Patch Set 4: Verified+1 Overriding Jenkins, unrelated (probably flaky?) test failure. -- To view, visit http://gerrit.cloudera.org:8080/13011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3a4b56169644f35f1aa8aef27d4af0ce4ec0792d Gerrit-Change-Number: 13011 Gerrit-PatchSet: 4 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 15 Apr 2019 23:21:33 + Gerrit-HasComments: No
[kudu-CR] generic iterators: refactor MergeIterator for whole-block-copy optimization
Adar Dembo has removed a vote on this change. Change subject: generic_iterators: refactor MergeIterator for whole-block-copy optimization .. Removed Verified+1 by Adar Dembo -- To view, visit http://gerrit.cloudera.org:8080/13010 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I050116edc51bb3e91852e6286c221175770e6382 Gerrit-Change-Number: 13010 Gerrit-PatchSet: 4 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] SelectionVector: pad extra bits with zeroes in constructor
Adar Dembo has removed Kudu Jenkins from this change. ( http://gerrit.cloudera.org:8080/13009 ) Change subject: SelectionVector: pad extra bits with zeroes in constructor .. Removed reviewer Kudu Jenkins with the following votes: * Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/13009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteReviewer Gerrit-Change-Id: I74baf87c7f38b7eea0ad46825c2421c99f8f42a1 Gerrit-Change-Number: 13009 Gerrit-PatchSet: 4 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] SelectionVector: pad extra bits with zeroes in constructor
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13009 ) Change subject: SelectionVector: pad extra bits with zeroes in constructor .. Patch Set 4: Verified+1 Overriding Jenkins, known flaky test failures. -- To view, visit http://gerrit.cloudera.org:8080/13009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I74baf87c7f38b7eea0ad46825c2421c99f8f42a1 Gerrit-Change-Number: 13009 Gerrit-PatchSet: 4 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 15 Apr 2019 22:42:11 + Gerrit-HasComments: No
[kudu-CR](gh-pages) [blog] a blogpost about location awareness in Kudu
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/12119 ) Change subject: [blog] a blogpost about location awareness in Kudu .. Patch Set 10: > Patch Set 9: > > Now that the design doc is merged the link can be updated. Yes, indeed. Thank you for kind reminder! -- To view, visit http://gerrit.cloudera.org:8080/12119 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: I10b30a80d5661fb889a11285b8118cdea1a87cd2 Gerrit-Change-Number: 12119 Gerrit-PatchSet: 10 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Greg Solovyev Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Mon, 15 Apr 2019 22:22:21 + Gerrit-HasComments: No
[kudu-CR](gh-pages) [blog] a blogpost about location awareness in Kudu
Hello Will Berkeley, Greg Solovyev, Grant Henke, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12119 to look at the new patch set (#10). Change subject: [blog] a blogpost about location awareness in Kudu .. [blog] a blogpost about location awareness in Kudu Change-Id: I10b30a80d5661fb889a11285b8118cdea1a87cd2 --- A _posts/2019-04-16-location-awareness.md 1 file changed, 372 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/19/12119/10 -- To view, visit http://gerrit.cloudera.org:8080/12119 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: newpatchset Gerrit-Change-Id: I10b30a80d5661fb889a11285b8118cdea1a87cd2 Gerrit-Change-Number: 12119 Gerrit-PatchSet: 10 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Greg Solovyev Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2466: implement three-heap merge algorithm
Adar Dembo has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12947 ) Change subject: KUDU-2466: implement three-heap merge algorithm .. KUDU-2466: implement three-heap merge algorithm This patch implements a new algorithm for efficient merging in MergeIterator. The algorithm is documented extensively in the code so there's no point in recapping it here. There's also a high-level overview with pseudocode available[1]. I microbenchmarked the old implementation against the new one, using both overlapping and non-overlapping inputs with a varying number of lists and 1 rows per list (see generic_iterators-test). Here are the scan times, averaged across five runs: Parameters | old | new | diff +--+--+-- overlapping, 10 lists | 0.015s | 0.0522s | +0.0372 (0.29x) overlapping, 100 lists | 1.0798s | 1.1024s | +0.0226 (0.98x) overlapping, 1000 lists | 184.245s | 22.8156s | -161.4294 (8.07x) non-overlapping, 10 lists | 0.0126s | 0.0196s | +0.007(0.64x) non-overlapping, 100 lists | 0.5038s | 0.129s | -0.3748 (3.91x) non-overlapping, 1000 lists | 89.8626s | 0.9874s | -88.8752 (91x) +--+--+-- The goal was to optimize ORDERED scans for large and mostly compacted tablets, and the new algorithm does just that. With smaller input, the overhead of using heaps becomes more pronounced. Overlapping input still benefits from heap-based merging, but the overlapping defeats the hot vs. cold optimization provided by the algorithm. To see how it performed against real data, I tested the algorithm against a mostly compacted "representative" 40G tablet with 1294 rowsets, all stored on one disk. I used a new CLI tool that does a full tablet scan without sending any data over the wire. I ran the tool three times: first with UNORDERED scans, then with ORDERED scans using the old algorithm, and finally with ORDERED scans using the new algorithm. Each run did six scans; to account for any caching effects, I dropped the first scan. Results: Parameters | Average run time (s) | Peak RSS (kb) -+--+-- UNORDERED: | 232 | 710320 ORDERED, old | 33787| 4465532 ORDERED, new | 979 | 749440 -+--+-- Lastly, the new algorithm opens the door to another optimization: while there's just one "hot" sub-iterator, we can copy data block-by-block instead of row-by-row. I'll implement this in a follow-up. 1. https://docs.google.com/document/d/1uP0ubjM6ulnKVCRrXtwT_dqrTWjF9tlFSRk0JN2e_O0/edit# Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d Reviewed-on: http://gerrit.cloudera.org:8080/12947 Reviewed-by: Mike Percy Tested-by: Kudu Jenkins --- M src/kudu/common/encoded_key.cc M src/kudu/common/generic_iterators-test.cc M src/kudu/common/generic_iterators.cc M src/kudu/common/generic_iterators.h M src/kudu/common/schema.cc M src/kudu/common/schema.h 6 files changed, 424 insertions(+), 131 deletions(-) Approvals: Mike Percy: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/12947 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d Gerrit-Change-Number: 12947 Gerrit-PatchSet: 11 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [backup] Add initial incremental backup/restore support
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12879 ) Change subject: [backup] Add initial incremental backup/restore support .. Patch Set 4: Code-Review+1 Would be good to get another set of eyes on this, preferably with Scala and/or Spark expertise. -- To view, visit http://gerrit.cloudera.org:8080/12879 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890 Gerrit-Change-Number: 12879 Gerrit-PatchSet: 4 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Mon, 15 Apr 2019 21:27:02 + Gerrit-HasComments: No
[kudu-CR] KUDU-2466: implement three-heap merge algorithm
Hello Tidy Bot, Mike Percy, Kudu Jenkins, Grant Henke, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12947 to look at the new patch set (#10). Change subject: KUDU-2466: implement three-heap merge algorithm .. KUDU-2466: implement three-heap merge algorithm This patch implements a new algorithm for efficient merging in MergeIterator. The algorithm is documented extensively in the code so there's no point in recapping it here. There's also a high-level overview with pseudocode available[1]. I microbenchmarked the old implementation against the new one, using both overlapping and non-overlapping inputs with a varying number of lists and 1 rows per list (see generic_iterators-test). Here are the scan times, averaged across five runs: Parameters | old | new | diff +--+--+-- overlapping, 10 lists | 0.015s | 0.0522s | +0.0372 (0.29x) overlapping, 100 lists | 1.0798s | 1.1024s | +0.0226 (0.98x) overlapping, 1000 lists | 184.245s | 22.8156s | -161.4294 (8.07x) non-overlapping, 10 lists | 0.0126s | 0.0196s | +0.007(0.64x) non-overlapping, 100 lists | 0.5038s | 0.129s | -0.3748 (3.91x) non-overlapping, 1000 lists | 89.8626s | 0.9874s | -88.8752 (91x) +--+--+-- The goal was to optimize ORDERED scans for large and mostly compacted tablets, and the new algorithm does just that. With smaller input, the overhead of using heaps becomes more pronounced. Overlapping input still benefits from heap-based merging, but the overlapping defeats the hot vs. cold optimization provided by the algorithm. To see how it performed against real data, I tested the algorithm against a mostly compacted "representative" 40G tablet with 1294 rowsets, all stored on one disk. I used a new CLI tool that does a full tablet scan without sending any data over the wire. I ran the tool three times: first with UNORDERED scans, then with ORDERED scans using the old algorithm, and finally with ORDERED scans using the new algorithm. Each run did six scans; to account for any caching effects, I dropped the first scan. Results: Parameters | Average run time (s) | Peak RSS (kb) -+--+-- UNORDERED: | 232 | 710320 ORDERED, old | 33787| 4465532 ORDERED, new | 979 | 749440 -+--+-- Lastly, the new algorithm opens the door to another optimization: while there's just one "hot" sub-iterator, we can copy data block-by-block instead of row-by-row. I'll implement this in a follow-up. 1. https://docs.google.com/document/d/1uP0ubjM6ulnKVCRrXtwT_dqrTWjF9tlFSRk0JN2e_O0/edit# Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d --- M src/kudu/common/encoded_key.cc M src/kudu/common/generic_iterators-test.cc M src/kudu/common/generic_iterators.cc M src/kudu/common/generic_iterators.h M src/kudu/common/schema.cc M src/kudu/common/schema.h 6 files changed, 424 insertions(+), 131 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/47/12947/10 -- To view, visit http://gerrit.cloudera.org:8080/12947 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d Gerrit-Change-Number: 12947 Gerrit-PatchSet: 10 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Todd Lipcon
[kudu-CR] bitmap: add copy operation
Hello Mike Percy, Kudu Jenkins, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13007 to look at the new patch set (#3). Change subject: bitmap: add copy operation .. bitmap: add copy operation This can be used to copy an arbitrary subsection of a bitmap into another bitmap, overwriting its contents. Change-Id: I9c1206306d7e06f6a3eb6854d1a8b4cc09600ca0 --- M src/kudu/common/partial_row.cc M src/kudu/util/bitmap-test.cc M src/kudu/util/bitmap.cc M src/kudu/util/bitmap.h 4 files changed, 130 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/07/13007/3 -- To view, visit http://gerrit.cloudera.org:8080/13007 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9c1206306d7e06f6a3eb6854d1a8b4cc09600ca0 Gerrit-Change-Number: 13007 Gerrit-PatchSet: 3 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] generic iterators: refactor MergeIterator for whole-block-copy optimization
Hello Mike Percy, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13010 to look at the new patch set (#4). Change subject: generic_iterators: refactor MergeIterator for whole-block-copy optimization .. generic_iterators: refactor MergeIterator for whole-block-copy optimization This patch refactors MergeIterator::MaterializeBlock in preparation for the whole-block-copy optimization. Specifically, the "copy next row" and "advance iter and rebalance heaps" operations have been decomposed into standalone functions so that it'll be easier to add "copy whole block" as an equivalent function in the future. Also of note: we no longer set the entire client selection vector up front, as this won't necessarily be the case when we copy a block. Change-Id: I050116edc51bb3e91852e6286c221175770e6382 --- M src/kudu/common/generic_iterators.cc 1 file changed, 135 insertions(+), 130 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/10/13010/4 -- To view, visit http://gerrit.cloudera.org:8080/13010 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I050116edc51bb3e91852e6286c221175770e6382 Gerrit-Change-Number: 13010 Gerrit-PatchSet: 4 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] generic iterators: MergeIterator whole block copy optimization
Hello Mike Percy, Kudu Jenkins, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13011 to look at the new patch set (#4). Change subject: generic_iterators: MergeIterator whole block copy optimization .. generic_iterators: MergeIterator whole block copy optimization This patch adds the "whole block copy" optimization to the MergeIterator. The idea is simple: whenever there's only one sub-iterator in the merge window, we can copy the entire block out of that sub-iterator instead of copying row-by-row. The challenging aspect was properly handling the client's SelectionVector: - When copying row-by-row, the MergeIterator must skip deselected rows in order to always return the next smallest row. The MergeIterState bookkeeping helped enforce this invariant. - When copying block-by-block, skipping deselected rows is harder (and potentially less performant) than the simple bitmap copy we currently do. Plus it's not necessary for correctness; the scan endpoint will skip deselected rows when serializing the RowBlock. So I opted to retain deselected rows in the block-by-block case, and updated the MergeIterState bookkeeping to cope. I also changed the default RowBlock sizes in various merge-related paths to be a power of 2. This should increase the likelihood of hitting the BitmapCopy "fast path" during a RowBlock copy, though it doesn't guarantee it (e.g. consider a merge where two sub-iterators overlap for 63/128 rows, then one sub-iterator has an additional 65 rows: the BitmapCopy operations in the block copy could not be byte-aligned). It yielded a slight improvement in microbenchmarks, though not in the macrobenchmark. Finally, I tweaked the heuristic for whole-block-copying such that it only happens when there's more than one row to copy. I don't totally buy the rationale, but it did yield an improvement in both the micro and macro benchmarks, so I must be onto something. Below are the micro and macrobenchmark results. The microbenchmark was generic_iterators-test's overlapping and non-overlapping MergeIterator tests with 10 iterations, 1000 lists, and 1 rows per list, collecting the wallclock time to scan and averaging it across the runs. The macrobenchmark was running 'kudu perf tablet_scan' on a representative 40GB tablet six times, dropping the first time (to reduce cache effects), and averaging the remaining wallclock times. no whole-block-copy: - non-overlapping: 0.9437 - overlapping: 9.5113 - representative tablet: 786.732 whole-block-copy, no power-of-2 RowBlocks - non-overlapping: 0.6301 - overlapping: 9.7518 - representative tablet: 751.297 whole-block-copy, power-of-2 RowBlocks - non-overlapping: 0.5316 - overlapping: 9.5718 - representative tablet: 754.961 whole-block-copy, power-of-2 RowBlocks, only copy when >1 rows left: - non-overlapping: 0.5247 - overlapping: 9.1287 - representative tablet: 720.534 Todd and I discussed another possible optimization which I want to note here for posterity: rather than copying blocks, we could "attach" the data to the client's RowbBlock and serialize it to the wire directly. The attachment could be RowBlock-based by allowing RowBlocks to work like a refcounted "iovec", or it could be done more deeply to enable direct serialization of cached decoder data. Either way, anything that helps us avoid copying data is likely to be a performance win. Change-Id: I3a4b56169644f35f1aa8aef27d4af0ce4ec0792d --- M src/kudu/common/generic_iterators-test.cc M src/kudu/common/generic_iterators.cc M src/kudu/common/rowblock.h M src/kudu/tserver/tablet_service.cc 4 files changed, 129 insertions(+), 62 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/11/13011/4 -- To view, visit http://gerrit.cloudera.org:8080/13011 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3a4b56169644f35f1aa8aef27d4af0ce4ec0792d Gerrit-Change-Number: 13011 Gerrit-PatchSet: 4 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] rowblock: add copying functionality
Hello Mike Percy, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13008 to look at the new patch set (#4). Change subject: rowblock: add copying functionality .. rowblock: add copying functionality This patch adds RowBlock::CopyTo, a function that enables copying of row data between RowBlocks. It's a building block for the "whole block copy" MergeIterator optimization, wherein part of a (or an entire) sub-iterator RowBlock is copied to the client's RowBlock. Change-Id: I735796f11e3a388ffc66e3d92f8c2097cdec3a91 --- M src/kudu/common/CMakeLists.txt M src/kudu/common/columnblock-test.cc A src/kudu/common/columnblock.cc M src/kudu/common/columnblock.h M src/kudu/common/rowblock.cc M src/kudu/common/rowblock.h 6 files changed, 262 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/08/13008/4 -- To view, visit http://gerrit.cloudera.org:8080/13008 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I735796f11e3a388ffc66e3d92f8c2097cdec3a91 Gerrit-Change-Number: 13008 Gerrit-PatchSet: 4 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] SelectionVector: pad extra bits with zeroes in constructor
Hello Mike Percy, Kudu Jenkins, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/13009 to look at the new patch set (#4). Change subject: SelectionVector: pad extra bits with zeroes in constructor .. SelectionVector: pad extra bits with zeroes in constructor I found this after removing the MergeIterator's call to SetAllTrue(). It turns out that if you don't call any functions that set all bytes en masse, CountSelected() and AnySelected() will misbehave as they'll read garbage data from any trailing bits beyond the logical end of the bitmap. We can fix this in one of two ways: - Modify CountSelected()/AnySelected() to ignore the trailing bits. - Zero out the trailing bits in the constructor. I opted for the second approach as I found it easier to implement, and I suspect it's more performant than the first. Change-Id: I74baf87c7f38b7eea0ad46825c2421c99f8f42a1 --- M src/kudu/common/rowblock-test.cc M src/kudu/common/rowblock.cc M src/kudu/common/rowblock.h 3 files changed, 41 insertions(+), 14 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/09/13009/4 -- To view, visit http://gerrit.cloudera.org:8080/13009 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I74baf87c7f38b7eea0ad46825c2421c99f8f42a1 Gerrit-Change-Number: 13009 Gerrit-PatchSet: 4 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] generic iterators: pass rowset bounds into grouping iterators
Adar Dembo has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12946 ) Change subject: generic_iterators: pass rowset bounds into grouping iterators .. generic_iterators: pass rowset bounds into grouping iterators The rowset bounds will be used to reduce MergeIterator memory consumption by restricting the "eager RowBlock fetching" behavior at Init time to only those sub-iterators whose rowsets could conceivably participate in the beginning of the merge. Change-Id: Id1aebd84507f87c869d781e117225c37c8f15969 Reviewed-on: http://gerrit.cloudera.org:8080/12946 Reviewed-by: Mike Percy Tested-by: Kudu Jenkins --- M src/kudu/common/generic_iterators-test.cc M src/kudu/common/generic_iterators.cc M src/kudu/common/generic_iterators.h M src/kudu/tablet/rowset.cc M src/kudu/tablet/rowset.h M src/kudu/tablet/tablet.cc M src/kudu/tablet/tablet.h 7 files changed, 171 insertions(+), 83 deletions(-) Approvals: Mike Percy: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/12946 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Id1aebd84507f87c869d781e117225c37c8f15969 Gerrit-Change-Number: 12946 Gerrit-PatchSet: 8 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [backup] Add initial incremental backup/restore support
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12879 ) Change subject: [backup] Add initial incremental backup/restore support .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/12879 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890 Gerrit-Change-Number: 12879 Gerrit-PatchSet: 4 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Mon, 15 Apr 2019 20:34:10 + Gerrit-HasComments: No
[kudu-CR] [backup] Add initial incremental backup/restore support
Grant Henke has removed a vote on this change. Change subject: [backup] Add initial incremental backup/restore support .. Removed Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/12879 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890 Gerrit-Change-Number: 12879 Gerrit-PatchSet: 4 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley
[kudu-CR] [backup] Add initial incremental backup/restore support
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12879 ) Change subject: [backup] Add initial incremental backup/restore support .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala: http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@100 PS2, Line 100: // Get the operation type based on the change type column. : // This will always be the last column in the row. : val rowActionValue = row.getByte(row.length - 1) > OK makes sense. Would like for design decisions like these to be documented I added detailed documentation to the RowAction enum. -- To view, visit http://gerrit.cloudera.org:8080/12879 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890 Gerrit-Change-Number: 12879 Gerrit-PatchSet: 2 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Mon, 15 Apr 2019 20:05:55 + Gerrit-HasComments: Yes
[kudu-CR] [backup] Add initial incremental backup/restore support
Hello Will Berkeley, Mike Percy, Kudu Jenkins, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12879 to look at the new patch set (#4). Change subject: [backup] Add initial incremental backup/restore support .. [backup] Add initial incremental backup/restore support This patch adds initial support for incremental backup and restore. A high level overview of the changes in this patch is: - Added a version to the TableMetadata for future use. - Broke out io/layout logic to a SessionIO class so it could be easily shared. - Unified the BackupOptions and RestoreOptions so common options could be shared. - Introduced a BackupGraph class to handle chaining together backups for backup and restore jobs. - Enhanced the BackupRDD to output an additional RowAction byte column on backup and restore. - Enhanced the restore job to use the new RowAction column and translate them into operations for incremental restore jobs. - Added the ability to restore to a given “time” on a per backup basis. - Added example usage docs to the Backup and Restore classes. - Added hasIsDeleted to RowResult and hasColumn to Schema. Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890 --- M java/kudu-backup/src/main/protobuf/backup.proto A java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala D java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala D java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestoreOptions.scala A java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala A java/kudu-backup/src/main/scala/org/apache/kudu/backup/RowAction.java A java/kudu-backup/src/main/scala/org/apache/kudu/backup/SessionIO.scala M java/kudu-backup/src/main/scala/org/apache/kudu/backup/TableMetadata.scala A java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestBackupGraph.scala M java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala M java/kudu-client/src/main/java/org/apache/kudu/Schema.java M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java M java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/RowConverter.scala M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduTestSuite.scala 18 files changed, 1,127 insertions(+), 396 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/79/12879/4 -- To view, visit http://gerrit.cloudera.org:8080/12879 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890 Gerrit-Change-Number: 12879 Gerrit-PatchSet: 4 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley
[kudu-CR] [backup] Add initial incremental backup/restore support
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12879 ) Change subject: [backup] Add initial incremental backup/restore support .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala: http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@100 PS2, Line 100: // Generate an operation based on the row action. : val operation = rowAction match { : case RowAction.UPSERT => table.newUpsert() > > why bother converting is_deleted in KuduBackupRDD at all? OK makes sense. Would like for design decisions like these to be documented in the code somewhere though. http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala: http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala@37 PS2, Line 37: tables: Seq[String], : rootPath: String, : kuduMasterAddresses: String = InetAddress.getLocalHost.getCanonicalHostName, > The benefit of the trait is seen in SessionIO, I can use BackupOptions as a I did indeed miss that, thanks. -- To view, visit http://gerrit.cloudera.org:8080/12879 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890 Gerrit-Change-Number: 12879 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Mon, 15 Apr 2019 19:39:39 + Gerrit-HasComments: Yes
[kudu-CR] [backup] Add initial incremental backup/restore support
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12879 ) Change subject: [backup] Add initial incremental backup/restore support .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala: http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@100 PS2, Line 100: // Get the operation type based on the change type column. : // This will always be the last column in the row. : val rowActionValue = row.getByte(row.length - 1) > why bother converting is_deleted in KuduBackupRDD at all? I could represent a boolean with a byte or an enum in that same byte of space, I decided to use an enum since it seemed more flexible for an on-disk format which we would need to support long term. I imagine we could use RowAction to support INSERT in the future or perhaps UPDATE if we have full fidelity and sparse row backup support. If go the boolean route, any new feature would need a new column. http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala: http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala@37 PS2, Line 37: tables: Seq[String], : rootPath: String, : kuduMasterAddresses: String = InetAddress.getLocalHost.getCanonicalHostName, > Hmm, not really seeing the point of the trait then, at least not in terms o The benefit of the trait is seen in SessionIO, I can use BackupOptions as a parameter for functions that are used in both the Backup and Restore jobs. -- To view, visit http://gerrit.cloudera.org:8080/12879 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890 Gerrit-Change-Number: 12879 Gerrit-PatchSet: 2 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Mon, 15 Apr 2019 19:23:15 + Gerrit-HasComments: Yes
[kudu-CR] [backup] Add initial incremental backup/restore support
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12879 ) Change subject: [backup] Add initial incremental backup/restore support .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala: http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@100 PS2, Line 100: // Generate an operation based on the row action. : val operation = rowAction match { : case RowAction.UPSERT => table.newUpsert() > > Does Scala allow returning tuples like in Python? OK, I was missing two things here: 1. There's an extra layer of indirection between this and the iterator in KuduBackupRDD, and the internal row is the only mechanism to pass data from one to the other. 2. More importantly, KuduBackupRDD is the read-side of the equation while this is the write-side; there's a persistence step in between, and we need to serialize the row action in order for it to survive the persistence. Next question: why bother converting is_deleted in KuduBackupRDD at all? Why not persist it as-is and convert it into a row action on-the-fly during restore? Then you won't need a persistence model for RowAction; it could be a simple enum, or maybe not exist at all (you could switch here on the value of the boolean is_deleted column). http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala: http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala@37 PS2, Line 37: tables: Seq[String], : rootPath: String, : kuduMasterAddresses: String = InetAddress.getLocalHost.getCanonicalHostName, > The CommonOptions trait is specifying that BackupOptions/RestoreOptions nee Hmm, not really seeing the point of the trait then, at least not in terms of reducing LOC. Is it used somewhere else in the patch that I missed? -- To view, visit http://gerrit.cloudera.org:8080/12879 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890 Gerrit-Change-Number: 12879 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Mon, 15 Apr 2019 19:17:10 + Gerrit-HasComments: Yes
[kudu-CR] [backup] Add initial incremental backup/restore support
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12879 ) Change subject: [backup] Add initial incremental backup/restore support .. Patch Set 3: (11 comments) http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/protobuf/backup.proto File java/kudu-backup/src/main/protobuf/backup.proto: http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/protobuf/backup.proto@118 PS2, Line 118: PartitionMetadataPB partitions = 8; > Skipped 8? Done http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala: http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala@163 PS2, Line 163: * Node class to represent nodes in the backup graph. > Curious why you went with the noun "vertex"; isn't "node" the more commonly No specific reason, will change to Node. http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala: http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@35 PS2, Line 35: */ > Nit: Javadoc not quite formatted properly? Done http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala@58 PS2, Line 58: // use the `to_ms` metadata as the `from_ms` time for this backup. > Nit: too long? Done http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala: http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala@144 PS2, Line 144: val columnCount = if (incremental) fieldCount - 1 else fieldCount > This function is called per-row, right? I wonder if it'd be more performant We could assume the column count is static across the life of an RDD and memoize this calculation, but the existing implementation didn't do that, so I kept the behavior the same. The additional if check should be cheap relative to the other work in the method. http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala: http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@39 PS2, Line 39: */ > Formatting? Done http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@54 PS2, Line 54: // TODO: Make parallel so each table isn't processed serially. : options.tables.foreach { tableName => : val graph = io.readBackupGraph(tableName).filterByTim > Did you test what effect this has? If it may have a drastic impact on perfo This wasn't meant to be included, I was experimenting with it. I will remove it. http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala@100 PS2, Line 100: // Generate an operation based on the row action. : val operation = rowAction match { : case RowAction.UPSERT => table.newUpsert() > Does Scala allow returning tuples like in Python? Yes it does > Seems like it'd be more elegant to return the row action separately from the > row itself. The Row abstraction is a result of the Spark framework. It expects Dataframes to work with Rows and is less flexible than the raw RDD API. http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala: http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala@37 PS2, Line 37: tables: Seq[String], : rootPath: String, : kuduMasterAddresses: String = InetAddress.getLocalHost.getCanonicalHostName, > Why do these three need to be listed here and in RestoreOptions if both ext The CommonOptions trait is specifying that BackupOptions/RestoreOptions needs to implement a value function for tables, rootPath, and kuduMasterAddresses. But it doesn't specify "how". BackupOptions/RestoreOptions implement those functions via a constructor. http://gerrit.cloudera.org:8080/#/c/12879/2/java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala@93 PS2, Line 93: // TODO: Document the limitations based on cluster configuration > Nit: too long? Done
[kudu-CR] [jenkins] fix collection of Java dist-test logs
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13014 ) Change subject: [jenkins] fix collection of Java dist-test logs .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/13014/1/build-support/jenkins/build-and-test.sh File build-support/jenkins/build-and-test.sh: PS1: > As a (cleaner?) alternative, how about renaming test-output.txt in run_dist Yep, I also thought about that. It would be nice to get some input from Grant on this. Basically, I want to understand whether keeping that test-output.txt pattern was something crucial somewhere else in dist-testing Java test scenarios. Or that was the easiest way to go forward here? Thanks! -- To view, visit http://gerrit.cloudera.org:8080/13014 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9c3b537e3592f768b0f14687aaadaa00444550f7 Gerrit-Change-Number: 13014 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Mon, 15 Apr 2019 17:46:40 + Gerrit-HasComments: Yes
[kudu-CR] [jenkins] fix collection of Java dist-test logs
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/13014 ) Change subject: [jenkins] fix collection of Java dist-test logs .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/13014/1/build-support/jenkins/build-and-test.sh File build-support/jenkins/build-and-test.sh: PS1: > Yep, I also thought about that. It would be nice to get some input from Gr I don't think there is any specific reason it was named test-output.txt. It was likely just an easy default choice. Changing it should be fine. -- To view, visit http://gerrit.cloudera.org:8080/13014 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9c3b537e3592f768b0f14687aaadaa00444550f7 Gerrit-Change-Number: 13014 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Mon, 15 Apr 2019 17:52:57 + Gerrit-HasComments: Yes
[kudu-CR] [backup] Add initial incremental backup/restore support
Hello Will Berkeley, Mike Percy, Kudu Jenkins, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12879 to look at the new patch set (#3). Change subject: [backup] Add initial incremental backup/restore support .. [backup] Add initial incremental backup/restore support This patch adds initial support for incremental backup and restore. A high level overview of the changes in this patch is: - Added a version to the TableMetadata for future use. - Broke out io/layout logic to a SessionIO class so it could be easily shared. - Unified the BackupOptions and RestoreOptions so common options could be shared. - Introduced a BackupGraph class to handle chaining together backups for backup and restore jobs. - Enhanced the BackupRDD to output an additional RowAction byte column on backup and restore. - Enhanced the restore job to use the new RowAction column and translate them into operations for incremental restore jobs. - Added the ability to restore to a given “time” on a per backup basis. - Added example usage docs to the Backup and Restore classes. - Added hasIsDeleted to RowResult and hasColumn to Schema. Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890 --- M java/kudu-backup/src/main/protobuf/backup.proto A java/kudu-backup/src/main/scala/org/apache/kudu/backup/BackupGraph.scala M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala D java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala M java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala D java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestoreOptions.scala A java/kudu-backup/src/main/scala/org/apache/kudu/backup/Options.scala A java/kudu-backup/src/main/scala/org/apache/kudu/backup/RowAction.java A java/kudu-backup/src/main/scala/org/apache/kudu/backup/SessionIO.scala M java/kudu-backup/src/main/scala/org/apache/kudu/backup/TableMetadata.scala A java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestBackupGraph.scala M java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala M java/kudu-client/src/main/java/org/apache/kudu/Schema.java M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduScanner.java M java/kudu-client/src/main/java/org/apache/kudu/client/RowResult.java M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/RowConverter.scala M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduTestSuite.scala 18 files changed, 1,112 insertions(+), 396 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/79/12879/3 -- To view, visit http://gerrit.cloudera.org:8080/12879 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I50b21d921fefbf4d7e8bd1766285258e8014d890 Gerrit-Change-Number: 12879 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley
[kudu-CR](gh-pages) [blog] a blogpost about location awareness in Kudu
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12119 ) Change subject: [blog] a blogpost about location awareness in Kudu .. Patch Set 9: Now that the design doc is merged the link can be updated. -- To view, visit http://gerrit.cloudera.org:8080/12119 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: gh-pages Gerrit-MessageType: comment Gerrit-Change-Id: I10b30a80d5661fb889a11285b8118cdea1a87cd2 Gerrit-Change-Number: 12119 Gerrit-PatchSet: 9 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Greg Solovyev Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Mon, 15 Apr 2019 14:17:08 + Gerrit-HasComments: No
[kudu-CR] [docs] updated docs w.r.t. collocation practices
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12997 ) Change subject: [docs] updated docs w.r.t. collocation practices .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/12997/4/docs/administration.adoc File docs/administration.adoc: http://gerrit.cloudera.org:8080/#/c/12997/4/docs/administration.adoc@821 PS4, Line 821: surface area for attacks using tablet servers' security vulnerabilities, if any. > Both kudu-master and kudu-tserver currently runs under 'kudu' OS user. Kud Given the tserver and master share a large amount of code and dependencies, I suspect most, if not all, vulnerabilities found in the tserver would also apply to the master server. I think we could suggest placing masters on a separate node as an option to reduce this risk, but I worry about making it our default recommendation not to colocate masters and tservers. I am pretty sure this is the most common deployment today. For users deploying on physical machines, they would need 3 extra physical nodes with very little load/utilization. -- To view, visit http://gerrit.cloudera.org:8080/12997 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I88a38117751cfa436c1fd95598274fb8f01f04ea Gerrit-Change-Number: 12997 Gerrit-PatchSet: 4 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mitch Barnett Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Mon, 15 Apr 2019 14:03:55 + Gerrit-HasComments: Yes
[kudu-CR] WIP master: use AuthzProvider to generate authz tokens
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13013 ) Change subject: WIP master: use AuthzProvider to generate authz tokens .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/13013/4/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/13013/4/src/kudu/master/catalog_manager.cc@2762 PS4, Line 2762: RETURN_NOT_OK(token_signer->GenerateAuthzToken( > Kudu signs tokens using SHA256 digest. It's pretty fast, you can try to ru Whoops, it seems it's too late and I need go to bed. We use SHA256 just to create a digest we sign with private key, and that's cheap, sure. As for actual signing, Kudu now uses 2048 bit RSA keys. At my machine, below is the stats for 1 2.2 GHz Core i7 CPU core: aserbin-MBP:kudu[java-dis-test-logs]$ openssl speed rsa2048 Doing 2048 bit private rsa's for 10s: 14318 2048 bit private RSA's in 9.91s Doing 2048 bit public rsa's for 10s: 315288 2048 bit public RSA's in 9.97s OpenSSL 1.0.2r 26 Feb 2019 built on: reproducible build, date unspecified options:bn(64,64) rc4(ptr,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: /usr/bin/clang -I. -I.. -I../include -fPIC -fno-common -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -arch x86_64 -O3 -DL_ENDIAN -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM signverifysign/s verify/s rsa 2048 bits 0.000692s 0.32s 1444.8 31623.7 So, around 1.4K tokens per second can be signed utilizing 1 CPU core, and in total that's a bit over 4.5K tokens/second at all 4 CPU cores of my macbook laptop. -- To view, visit http://gerrit.cloudera.org:8080/13013 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5404d6437699bc6c7c8bb0e530b202109e8f166 Gerrit-Change-Number: 13013 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Mon, 15 Apr 2019 08:37:43 + Gerrit-HasComments: Yes
[kudu-CR] WIP master: use AuthzProvider to generate authz tokens
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/13013 ) Change subject: WIP master: use AuthzProvider to generate authz tokens .. Patch Set 4: (3 comments) I took a quick look, will take another look tomorrow morning. Overall looks good, I just want to get a better understanding of the coverage that's in the new test. http://gerrit.cloudera.org:8080/#/c/13013/4/src/kudu/master/catalog_manager.cc File src/kudu/master/catalog_manager.cc: http://gerrit.cloudera.org:8080/#/c/13013/4/src/kudu/master/catalog_manager.cc@2755 PS4, Line 2755: Should we send back : // an error that the client can retry, e.g. if Sentry was down? > Looking through some DDL authz code, seems like if Sentry isn't available, Yep, Hao and I discussed that a bit in the context of https://gerrit.cloudera.org/c/12877/1/src/kudu/integration-tests/master-stress-test.cc#118 comment. I'm curious what's Hao stance on this. http://gerrit.cloudera.org:8080/#/c/13013/4/src/kudu/master/catalog_manager.cc@2762 PS4, Line 2762: RETURN_NOT_OK(token_signer->GenerateAuthzToken( > Stepping through the code, most of it is shuttling strings and ints around, Kudu signs tokens using SHA256 digest. It's pretty fast, you can try to run 'openssl speed sha256' and see your numbers. In my case (2.2 GHz Intel Core i7, 4 cores, 256KB L2 cache per core), assuming our tokens are of size 256 bytes, gives a range roughly 2M tokens per CPU core signed per second. So, my machine would be able to sign about 8M tokens (of size 256 bytes) per second. I think that's not bad for a macBook laptop. I'm not sure it makes sense to cache those instead of creating new ones: it seems to be cheap. aserbin-MBP:master[java-dis-test-logs]$ openssl speed sha256 Doing sha256 for 3s on 16 size blocks: 13543954 sha256's in 2.99s Doing sha256 for 3s on 64 size blocks: 7386135 sha256's in 2.96s Doing sha256 for 3s on 256 size blocks: 3480055 sha256's in 2.99s Doing sha256 for 3s on 1024 size blocks: 1059385 sha256's in 2.99s Doing sha256 for 3s on 8192 size blocks: 137842 sha256's in 2.97s OpenSSL 1.0.2r 26 Feb 2019 built on: reproducible build, date unspecified options:bn(64,64) rc4(ptr,int) des(idx,cisc,16,int) aes(partial) idea(int) blowfish(idx) compiler: /usr/bin/clang -I. -I.. -I../include -fPIC -fno-common -DOPENSSL_PIC -DZLIB -DOPENSSL_THREADS -D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -arch x86_64 -O3 -DL_ENDIAN -Wall -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DMD5_ASM -DAES_ASM -DVPAES_ASM -DBSAES_ASM -DWHIRLPOOL_ASM -DGHASH_ASM -DECP_NISTZ256_ASM The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes256 bytes 1024 bytes 8192 bytes sha256 72476.01k 159700.22k 297957.89k 362812.79k 380202.58k http://gerrit.cloudera.org:8080/#/c/13013/4/src/kudu/util/random_util.h File src/kudu/util/random_util.h: http://gerrit.cloudera.org:8080/#/c/13013/4/src/kudu/util/random_util.h@59 PS4, Line 59: c.size() - min_to_return nit: maybe, just have (c.size() + 1 - min_to_return) here and remove the 'if (min_to_return == c.size())' short-circuit above? That way you don't need to think about the difference in the contract of this method and ReservoirSample that might be induced by the short-circuiting above. -- To view, visit http://gerrit.cloudera.org:8080/13013 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5404d6437699bc6c7c8bb0e530b202109e8f166 Gerrit-Change-Number: 13013 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Comment-Date: Mon, 15 Apr 2019 07:53:58 + Gerrit-HasComments: Yes