[kudu-CR] compaction policy: avoid O(n^2) calls to EstimateOnDiskSize
David Ribeiro Alves has posted comments on this change. Change subject: compaction_policy: avoid O(n^2) calls to EstimateOnDiskSize .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4191 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic2949218d7f5fd822571a7b14d1d0b4430aeee1d Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] tool: port kudu-fs dump, remove kudu-fs list (WIP)
Kudu Jenkins has posted comments on this change. Change subject: tool: port kudu-fs_dump, remove kudu-fs_list (WIP) .. Patch Set 3: Build Started http://104.196.14.100/job/kudu-gerrit/3253/ -- To view, visit http://gerrit.cloudera.org:8080/4305 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1ec628b65613011d8c48b6239c13762276425966 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh BhatGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] tool: port kudu-fs dump, remove kudu-fs list (WIP)
Hello Adar Dembo, Todd Lipcon, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4305 to look at the new patch set (#3). Change subject: tool: port kudu-fs_dump, remove kudu-fs_list (WIP) .. tool: port kudu-fs_dump, remove kudu-fs_list (WIP) This change also moves dump_cfile action under 'kudu fs dump cfile'. kudu-fs_list tool has been removed altogether and a 'kudu fs dump tree' has been added to dump the filesystem tree. Majority of the kudu-fs_dump sub-commands which were relying on tablet id as required parameter have been moved under 'kudu tablet'. Also added tests under kudu-tool-test to exercise each of these fs tools. Change-Id: I1ec628b65613011d8c48b6239c13762276425966 --- M docs/release_notes.adoc M src/kudu/tools/CMakeLists.txt D src/kudu/tools/fs_dump-tool.cc D src/kudu/tools/fs_list-tool.cc D src/kudu/tools/fs_tool.cc D src/kudu/tools/fs_tool.h M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action.h M src/kudu/tools/tool_action_common.cc M src/kudu/tools/tool_action_common.h M src/kudu/tools/tool_action_fs.cc M src/kudu/tools/tool_action_tablet.cc 12 files changed, 1,087 insertions(+), 1,146 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/05/4305/3 -- To view, visit http://gerrit.cloudera.org:8080/4305 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1ec628b65613011d8c48b6239c13762276425966 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh BhatGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Cleanup/refactor tracking of consensus watermarks
Kudu Jenkins has posted comments on this change. Change subject: Cleanup/refactor tracking of consensus watermarks .. Patch Set 6: Build Started http://104.196.14.100/job/kudu-gerrit/3252/ -- To view, visit http://gerrit.cloudera.org:8080/4133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] Cleanup/refactor tracking of consensus watermarks
Hello David Ribeiro Alves, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4133 to look at the new patch set (#6). Change subject: Cleanup/refactor tracking of consensus watermarks .. Cleanup/refactor tracking of consensus watermarks This is a fairly invasive cleanup/refactor to consensus in preparation for propagating replication watermarks from the leader to the followers. The key item here is to address a long-standing TODO that the PeerMessageQueue should itself calculate the commit index, rather than delegate that job to ReplicaState. The flow used to be something like this on the leader upon receiving a response from a peer that it has replicated some new operations: - PeerMessageQueue::ResponseFromPeer - updates peer last_received - calls PeerMessageQueue::AdvanceQueueWatermark - updates PeerMessageQueue::majority_replicated to the new majority watermark - calls NotifyObserversOfMajorityReplOp --> submits an async task to the raft threadpool - for each observer (best I can tell there is currently always only one, but that's a separate issue) -- observer->UpdateMajorityReplicated() to grab committed index (the observer is always the RaftConsensus instance) --- RaftConsensus::UpdateMajorityReplicated ReplicaState::UpdateMajorityReplicatedUnlocked - this checks the condition that the majority-replicated watermark is within the leader's term before advancing the commit index. - calls AdvanceCommittedIndexUnlocked --- commits stuff, etc - sets *committed_index out-param -- PeerMessageQueue picks up this out-param and sets the new value within the PeerMessageQueue This was very difficult to follow, given that the "observer" in fact was passing data back to the "notifier" through an out-parameter. Moreover, it wasn't obvious to see which class was "authoritative" for the tracking and advancement of watermarks. The new design makes the PeerMessageQueue itself fully responsible for calculating when the commit index can advance. The flow is now the following, with '!' at the beginning of the line to denote where it is changed from before: - PeerMessageQueue::ResponseFromPeer - updates peer last_received - calls PeerMessageQueue::AdvanceQueueWatermark - updates PeerMessageQueue::majority_replicated to the new majority watermark. ! - If the majority-replicated watermark is within the current leader !term, advances the commit index (stored in QueueState) !- notifies observers of the new commit index ! - calls observer->NotifyCommitIndex() with the new commit index !- RaftConsensus::NotifyCommitIndex() ! - ReplicaState::AdvanceCommittedIndexUnlocked - commits stuff, etc. This required a fairly invasive surgery because the PeerMessageQueue itself doesn't remember the terms of pending operations, and thus the watermarks became indexes instead of OpIds. This is itself also a simplification -- we previously were very messy about using the word "index" when in fact we had an OpId type. In almost every case, we only used the index of those OpIds. In the Raft paper itself, these watermarks are just indexes and not OpIds, so this change brings us closer to the implementation described in the original design. I looped raft-consensus-itest.MultiThreadedInsertWithFailovers 800 times and the whole suite 500 times and got no failures. Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8 --- M docs/release_notes.adoc M src/kudu/consensus/consensus-test-util.h M src/kudu/consensus/consensus.proto M src/kudu/consensus/consensus_peers-test.cc M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_queue-test.cc M src/kudu/consensus/consensus_queue.cc M src/kudu/consensus/consensus_queue.h M src/kudu/consensus/raft_consensus-test.cc M src/kudu/consensus/raft_consensus.cc M src/kudu/consensus/raft_consensus.h M src/kudu/consensus/raft_consensus_quorum-test.cc M src/kudu/consensus/raft_consensus_state.cc M src/kudu/consensus/raft_consensus_state.h M src/kudu/integration-tests/raft_consensus-itest.cc 15 files changed, 463 insertions(+), 592 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/33/4133/6 -- To view, visit http://gerrit.cloudera.org:8080/4133 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I2aa294472f018013d88f36a9358e9ebd9d5ed8f8 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Inlined dispatch for predicate evaluation
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4164 to look at the new patch set (#7). Change subject: Inlined dispatch for predicate evaluation .. Inlined dispatch for predicate evaluation In order to evaluate a predicate, the correct comparator must first be determined. The default evaluation, which calls ColumnPredicate::Evaluate(), will make a function call to the correct comparator for every row. The evaluation itself gets split into batches, but each row in the batch still makes the function call, which slows performance. To remediate this, the batch evaluation has been templatized so there is only a single function call per batch. Additionally, when decoder-level evaluation is enabled, rather than occuring in batches, dispatch occurs for each cell, which leads to poorer performance. To remediate this, the dispatch has been templatized in hopes that the dispatching and branching are inlined. This figure shows the performance of dictionary encoding for decoder-level evaluation without this templating adjustment. The query selects a single value out of a dictionary-encoded column containing 1M unique strings with decoder-level evaluation enabled. https://raw.githubusercontent.com/anjuwong/kudu/300f1e8bdf05bc16bdacacca22482e579e49de67/docs/images/SELECT_OUTSIDE_OF_RANGE_non_inlined.png Compare the above with the plot below, which is the result of the same query, but with inlined dispatch. https://raw.githubusercontent.com/anjuwong/kudu/300f1e8bdf05bc16bdacacca22482e579e49de67/docs/images/SELECT_OUTSIDE_OF_RANGE_inlined.png Change-Id: Iccfac9bc899362b442337050795b5ca8c4101268 --- M src/kudu/cfile/binary_plain_block.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/common/column_predicate.cc M src/kudu/common/column_predicate.h 4 files changed, 57 insertions(+), 41 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/64/4164/7 -- To view, visit http://gerrit.cloudera.org:8080/4164 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iccfac9bc899362b442337050795b5ca8c4101268 Gerrit-PatchSet: 7 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Predicate evaluation pushdown
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3990 to look at the new patch set (#12). Change subject: Predicate evaluation pushdown .. Predicate evaluation pushdown The premise of this patch is to avoid the excessive use of CPU when evaluating column predicates in specific cases. Dictionary blocks, for instance, can evaluate predicates by checking whether a string's codeword matches with the predicate, rather than by doing string comparisons. This patch uses a bitset to represent the set of codewords that match a given predicate on dictionary-encoded columns. Certain decoders now have the ability to evaluate a predicate without eagerly copying all of their underlying data into a buffer first. Rather, the decoders can first evaluate the predicate and copy only when needed. Since dictionary encoding relies on plain encoding when a dictionary gets too large, plain encoding also supports decoder-level evaluation. While lacking the benefit from reduced string comparisons, this optimization still improves scan speeds by avoiding excessive copies. See the performance doc for a look into the performance differences for dictionary encoding and plain encoding: https://github.com/anjuwong/kudu/blob/pred-pushdown/docs/decoder-eval-perf.md See the design-doc for predicate-eval-pushdown for a brief overview of the considered implementations: https://github.com/anjuwong/kudu/blob/sorted-dict-block/docs/design-docs/predicate-eval-pushdown.md More in-depth analysis and benchmarking in upcoming blog post. Change-Id: I31e4cce21e99f63b089d7c84410af8ed914cb576 --- M src/kudu/cfile/binary_dict_block.cc M src/kudu/cfile/binary_dict_block.h M src/kudu/cfile/binary_plain_block.cc M src/kudu/cfile/binary_plain_block.h M src/kudu/cfile/binary_prefix_block.h M src/kudu/cfile/block_encodings.h M src/kudu/cfile/cfile-test-base.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/cfile/cfile_util.cc M src/kudu/cfile/encoding-test.cc M src/kudu/cfile/gvint_block.h A src/kudu/common/column_materialization_context.h M src/kudu/common/column_predicate.cc M src/kudu/common/column_predicate.h M src/kudu/common/generic_iterators-test.cc M src/kudu/common/generic_iterators.cc M src/kudu/common/generic_iterators.h M src/kudu/common/iterator.h M src/kudu/common/rowblock.h M src/kudu/common/schema.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set-test.cc M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h M src/kudu/tablet/delta_applier.cc M src/kudu/tablet/delta_applier.h M src/kudu/tablet/delta_iterator_merger.cc M src/kudu/tablet/delta_iterator_merger.h M src/kudu/tablet/delta_store.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/deltamemstore.cc M src/kudu/tablet/deltamemstore.h A src/kudu/tablet/tablet-decoder-eval-test.cc M src/kudu/tablet/tablet-test-util.h 37 files changed, 863 insertions(+), 134 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/90/3990/12 -- To view, visit http://gerrit.cloudera.org:8080/3990 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I31e4cce21e99f63b089d7c84410af8ed914cb576 Gerrit-PatchSet: 12 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] c++ client: expose private GetTablet API
Todd Lipcon has posted comments on this change. Change subject: c++ client: expose private GetTablet API .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4179 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: If798820e5d790d07f554aaa6f89b31aaf360a3a5 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] tool: port kudu-fs dump, remove kudu-fs list (WIP)
Dinesh Bhat has posted comments on this change. Change subject: tool: port kudu-fs_dump, remove kudu-fs_list (WIP) .. Patch Set 1: (7 comments) TFTR Adar, updated the patch, please re-review, also added some more tests. I think I can keep adding some more test exercises as you review them. http://gerrit.cloudera.org:8080/#/c/4305/1/docs/release_notes.adoc File docs/release_notes.adoc: Line 58: - The `kudu-fs_list` tool has been removed as it was replicating most of the > As I mentioned in HipChat the other day, I think the LIST_BLOCKS functional Added that back, although I have kept some of the sub commands like tree/list_tablets as well now since it's useful to pretty print them instead of relying on 'find'. Also please find some routines like ListLogSegments etc which I think could be removed. I will do so after taking some confirmation from your end. PS1, Line 59: options > Nit: change to 'functionality' Done http://gerrit.cloudera.org:8080/#/c/4305/1/src/kudu/tools/CMakeLists.txt File src/kudu/tools/CMakeLists.txt: PS1, Line 55: add_library(fs_tool fs_tool.cc) : target_link_libraries(fs_tool : gutil : kudu_common : server_common : consensus : tablet) > Can you remove fs_tool altogether and move the needed functionality into th I have done this, but not exactly as you described. I moved pretty much everything under tool_action_common(as opposed to moving just common routines), reason being I thought these subcommands may go through another round of re-shuffle after feedbacks from dev/users, we could start reorganizing them at that point. For now, FsTool simply moved from fs_tool library to tool_action_common.* and got rid of fs_tool library as such. http://gerrit.cloudera.org:8080/#/c/4305/1/src/kudu/tools/tool_action_fs.cc File src/kudu/tools/tool_action_fs.cc: PS1, Line 187: unique_ptr dump_fs_blocks = : ActionBuilder("tablet_blocks", ) : .Description("Dump the data of tablet") : .AddRequiredParameter({ "tablet_id", "tablet identifier" }) : .AddOptionalParameter("fs_wal_dir") : .AddOptionalParameter("fs_data_dirs") : .Build(); : : unique_ptr dump_fs_data = : ActionBuilder("tablet_data", ) : .Description("Dump the data of tablet") : .AddRequiredParameter({ "tablet_id", "tablet identifier" }) : .AddOptionalParameter("fs_wal_dir") : .AddOptionalParameter("fs_data_dirs") : .Build(); : : unique_ptr dump_fs_meta = : ActionBuilder("tablet_meta", ) : .Description("Dump the metadata of tablet") : .AddRequiredParameter({ "tablet_id", "tablet identifier" }) : .AddOptionalParameter("fs_wal_dir") : .AddOptionalParameter("fs_data_dirs") : .Build(); : : unique_ptr dump_fs_rowset = : ActionBuilder("rowset", ) : .Description("Dump the rowset of tablet") : .AddRequiredParameter({ "tablet_id", "tablet identifier" }) : .AddRequiredParameter({ "rowset_index", "rowset index" }) : .AddOptionalParameter("fs_wal_dir") : .AddOptionalParameter("fs_data_dirs") : .Build(); > These are all scoped to a tablet, so I think they should be in the tablet m Done, kept the dump_wals as-is since it seemed to be taking tablet_id as param. PS1, Line 220: unique_ptr dump_tree = : ActionBuilder("tree", ) : .Description("Dump the tree of Kudu filesystem") : .AddOptionalParameter("fs_wal_dir") : .AddOptionalParameter("fs_data_dirs") : .Build(); > Let's drop this one altogether since basic UNIX tools like find can do the Hmmm, I kept this only for pretty printing purposes, kinda looks nice with hierarchies enforced by indentations. PS1, Line 237: .AddAction(std::move(dump_fs_blocks)) : .AddAction(std::move(dump_fs_data)) : .AddAction(std::move(dump_fs_meta)) : .AddAction(std::move(dump_fs_rowset)) > These are all scoped to the 'fs' mode, so let's drop the fs infix. Done PS1, Line 242: print_uuid > For consistency, let's change this to dump_uuid. You'll probably need to up Done -- To view, visit http://gerrit.cloudera.org:8080/4305 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1ec628b65613011d8c48b6239c13762276425966 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master
[kudu-CR] tool: port kudu-fs dump, remove kudu-fs list (WIP)
Kudu Jenkins has posted comments on this change. Change subject: tool: port kudu-fs_dump, remove kudu-fs_list (WIP) .. Patch Set 2: Build Started http://104.196.14.100/job/kudu-gerrit/3249/ -- To view, visit http://gerrit.cloudera.org:8080/4305 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1ec628b65613011d8c48b6239c13762276425966 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dinesh BhatGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dinesh Bhat Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] Inlined dispatch for predicate evaluation
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4164 to look at the new patch set (#6). Change subject: Inlined dispatch for predicate evaluation .. Inlined dispatch for predicate evaluation In order to evaluate a predicate, the correct comparator given the physical type of interest must first be determined. Currently, this occurs once per batch when ColumnPredicate::Evaluate() is called. A call to this will dispatch the comparator as a lambda to be applied in a loop over all rows in the batch. When decoder-level evaluation is enabled, rather than occuring in batches, dispatch occurs for each cell, which leads to poorer performance. To remediate this, the dispatch has been templatized in hopes that the dispatching and branching are inlined. This figure shows the performance of dictionary encoding for decoder-level evaluation without this templating adjustment. The query selects a single value out of a dictionary-encoded column containing 1M unique strings. https://raw.githubusercontent.com/anjuwong/kudu/300f1e8bdf05bc16bdacacca22482e579e49de67/docs/images/SELECT_OUTSIDE_OF_RANGE_non_inlined.png Compare the above with the plot below, which is the result of the same query, but with inlined dispatch. https://raw.githubusercontent.com/anjuwong/kudu/300f1e8bdf05bc16bdacacca22482e579e49de67/docs/images/SELECT_OUTSIDE_OF_RANGE_inlined.png Change-Id: Iccfac9bc899362b442337050795b5ca8c4101268 --- M src/kudu/cfile/binary_plain_block.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/common/column_predicate.cc M src/kudu/common/column_predicate.h 4 files changed, 57 insertions(+), 41 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/64/4164/6 -- To view, visit http://gerrit.cloudera.org:8080/4164 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iccfac9bc899362b442337050795b5ca8c4101268 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Predicate evaluation pushdown
Kudu Jenkins has posted comments on this change. Change subject: Predicate evaluation pushdown .. Patch Set 11: Build Started http://104.196.14.100/job/kudu-gerrit/3247/ -- To view, visit http://gerrit.cloudera.org:8080/3990 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I31e4cce21e99f63b089d7c84410af8ed914cb576 Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] Predicate evaluation pushdown
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/3990 to look at the new patch set (#11). Change subject: Predicate evaluation pushdown .. Predicate evaluation pushdown The premise of this patch is to avoid the excessive use of CPU when evaluating column predicates in specific cases. Dictionary blocks, for instance, can evaluate predicates by checking whether a string's codeword matches with the predicate, rather than by doing string comparisons. This patch uses a bitset to represent the set of codewords that match a given predicate on dictionary-encoded columns. Certain decoders now have the ability to evaluate a predicate without eagerly copying all of their underlying data into a buffer first. Rather, the decoders can first evaluate the predicate and copy only when needed. Since dictionary encoding relies on plain encoding when a dictionary gets too large, plain encoding also supports decoder-level evaluation. While lacking the benefit from reduced string comparisons, this optimization still improves scan speeds by avoiding excessive copies. See the performance doc for a look into the performance differences for dictionary encoding and plain encoding: https://github.com/anjuwong/kudu/blob/pred-pushdown/docs/decoder-eval-perf.md See the design-doc for predicate-eval-pushdown for a brief overview of the considered implementations: https://github.com/anjuwong/kudu/blob/sorted-dict-block/docs/design-docs/predicate-eval-pushdown.md More in-depth analysis and benchmarking in upcoming blog post. Change-Id: I31e4cce21e99f63b089d7c84410af8ed914cb576 --- M src/kudu/cfile/binary_dict_block.cc M src/kudu/cfile/binary_dict_block.h M src/kudu/cfile/binary_plain_block.cc M src/kudu/cfile/binary_plain_block.h M src/kudu/cfile/binary_prefix_block.h M src/kudu/cfile/block_encodings.h M src/kudu/cfile/cfile-test-base.h M src/kudu/cfile/cfile-test.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/cfile/cfile_reader.h M src/kudu/cfile/cfile_util.cc M src/kudu/cfile/encoding-test.cc M src/kudu/cfile/gvint_block.h A src/kudu/common/column_materialization_context.h M src/kudu/common/column_predicate.cc M src/kudu/common/column_predicate.h M src/kudu/common/generic_iterators-test.cc M src/kudu/common/generic_iterators.cc M src/kudu/common/generic_iterators.h M src/kudu/common/iterator.h M src/kudu/common/rowblock.h M src/kudu/common/schema.h M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/cfile_set-test.cc M src/kudu/tablet/cfile_set.cc M src/kudu/tablet/cfile_set.h M src/kudu/tablet/delta_applier.cc M src/kudu/tablet/delta_applier.h M src/kudu/tablet/delta_iterator_merger.cc M src/kudu/tablet/delta_iterator_merger.h M src/kudu/tablet/delta_store.h M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltafile.h M src/kudu/tablet/deltamemstore.cc M src/kudu/tablet/deltamemstore.h A src/kudu/tablet/tablet-decoder-eval-test.cc M src/kudu/tablet/tablet-test-util.h 37 files changed, 859 insertions(+), 134 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/90/3990/11 -- To view, visit http://gerrit.cloudera.org:8080/3990 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I31e4cce21e99f63b089d7c84410af8ed914cb576 Gerrit-PatchSet: 11 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Inlined dispatch for predicate evaluation
Kudu Jenkins has posted comments on this change. Change subject: Inlined dispatch for predicate evaluation .. Patch Set 6: Build Started http://104.196.14.100/job/kudu-gerrit/3248/ -- To view, visit http://gerrit.cloudera.org:8080/4164 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iccfac9bc899362b442337050795b5ca8c4101268 Gerrit-PatchSet: 6 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1513. consensus: improve log messages for lagging or tablet-copying peers
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1513. consensus: improve log messages for lagging or tablet-copying peers .. Patch Set 2: Build Started http://104.196.14.100/job/kudu-gerrit/3246/ -- To view, visit http://gerrit.cloudera.org:8080/4184 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4dd560309841ba738b031ea87b12f8f612e6c674 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1513. consensus: improve log messages for lagging or tablet-copying peers
Todd Lipcon has posted comments on this change. Change subject: KUDU-1513. consensus: improve log messages for lagging or tablet-copying peers .. Patch Set 2: Code-Review+2 Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4184 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4dd560309841ba738b031ea87b12f8f612e6c674 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] consensus: improve log messages for lagging or tablet-copying peers
Todd Lipcon has posted comments on this change. Change subject: consensus: improve log messages for lagging or tablet-copying peers .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4184/1/src/kudu/consensus/consensus_queue.cc File src/kudu/consensus/consensus_queue.cc: Line 316: KLOG_EVERY_N_SECS_THROTTLER(INFO, 3, peer->status_log_throttler, "tablet copy") > yea, maybe we should clarify this.. the thing is that we do actually contin mind if I defer this to a further improvement later? I think it will actually require tracking more state, etc, to know whether the tablet copy is already in progress or what. -- To view, visit http://gerrit.cloudera.org:8080/4184 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4dd560309841ba738b031ea87b12f8f612e6c674 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes
[kudu-CR] docs: fix list of support encodings
Todd Lipcon has posted comments on this change. Change subject: docs: fix list of support encodings .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4322/1/docs/schema_design.adoc File docs/schema_design.adoc: Line 259: | integer, timestamp | plain, bitshuffle, run length (except for 64-bit) > What about group varint for 32-bit? hrm, iirc group varint only works for unsigned int32, which isn't exposed to users anymore, no? I think it's kept there only for the sake of some unit tests internally -- To view, visit http://gerrit.cloudera.org:8080/4322 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I47d7b29c802b0c8fee178b59d2e26ab00bbfbc72 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Predicate evaluation pushdown
Andrew Wong has posted comments on this change. Change subject: Predicate evaluation pushdown .. Patch Set 10: (8 comments) http://gerrit.cloudera.org:8080/#/c/3990/10/src/kudu/cfile/cfile-test-base.h File src/kudu/cfile/cfile-test-base.h: Line 309: return ctx; > can just be 'return ColumnMaterializationContext(0, nullptr, cb, sel);' Done http://gerrit.cloudera.org:8080/#/c/3990/10/src/kudu/cfile/cfile_reader.cc File src/kudu/cfile/cfile_reader.cc: PS10, Line 719: : // ctx_ should only be null if PrepareMaterializationContext() is not called, in which : // case only seeks are permitted. : if (ctx_ && !ctx_->DecoderEvalNotSupported()) { : > could you get rid of the ctx_ member variable along with the PrepareMateria This should work, and it would clean up the API for when a context is needed. Just need to make sure there's a clean way to pass it to the decoder from the CFileIterator. PS10, Line 976: !ctx->DecoderEvalNotSupported > it strikes me that most of the calls are double-negatives. Worth adding a D A few of these should be !NotSupported, since they should act either when the state is Supported or NotSet (e.g. we should CopyNextAndEval if the state is NotSet or Supported). Here, the decoder may not have made a call to the one of the Copy* functions (e.g. if we're starting out with a null run), in which case, we would be NotSet rather than NotSupported. We could have a DecoderEvalMayBeSupported, which would effectively be the same thing. http://gerrit.cloudera.org:8080/#/c/3990/10/src/kudu/cfile/cfile_reader.h File src/kudu/cfile/cfile_reader.h: PS10, Line 230: used during a scan. > it's a bit confusing here, since this seems to be saying you set it so you Yep, more explicitly, this gets used when determining the matching codewords during scan preparation. Pushing codeword matching to right before a scan should alleviate this, since we would only require the ctx when we really need it (i.e. right before/during a scan). http://gerrit.cloudera.org:8080/#/c/3990/10/src/kudu/cfile/encoding-test.cc File src/kudu/cfile/encoding-test.cc: PS10, Line 540: i*skip_step+skip > nit: i * skip_step + skip (spaces between operators) Done http://gerrit.cloudera.org:8080/#/c/3990/10/src/kudu/cfile/rle_block.h File src/kudu/cfile/rle_block.h: Line 190: void SeekForward(int* n) override { > is this still necessary now that you have the superclass implementation? No, removed. http://gerrit.cloudera.org:8080/#/c/3990/10/src/kudu/tablet/tablet-decoder-eval-test.cc File src/kudu/tablet/tablet-decoder-eval-test.cc: PS10, Line 137: StringPrintf(Substitute("%0$0$1", strlen, PRId64).c_str(), > this cleverness repeats several times, worth factoring into a function like Done Line 224: int main(int argc, char *argv[]) { > shouldn't need a main -- it shoudl get there automagically from the kudu_te Done -- To view, visit http://gerrit.cloudera.org:8080/3990 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I31e4cce21e99f63b089d7c84410af8ed914cb576 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] docs: fix list of support encodings
Adar Dembo has posted comments on this change. Change subject: docs: fix list of support encodings .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/4322/1/docs/schema_design.adoc File docs/schema_design.adoc: Line 259: | integer, timestamp | plain, bitshuffle, run length (except for 64-bit) What about group varint for 32-bit? Line 260: | float | plain, bitshuffle Nit: add double to this list? -- To view, visit http://gerrit.cloudera.org:8080/4322 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I47d7b29c802b0c8fee178b59d2e26ab00bbfbc72 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] docs: fix list of support encodings
Todd Lipcon has uploaded a new change for review. http://gerrit.cloudera.org:8080/4322 Change subject: docs: fix list of support encodings .. docs: fix list of support encodings Change-Id: I47d7b29c802b0c8fee178b59d2e26ab00bbfbc72 --- M docs/schema_design.adoc 1 file changed, 2 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/22/4322/1 -- To view, visit http://gerrit.cloudera.org:8080/4322 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I47d7b29c802b0c8fee178b59d2e26ab00bbfbc72 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon
[kudu-CR] docs: fix list of support encodings
Dan Burkert has posted comments on this change. Change subject: docs: fix list of support encodings .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4322 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I47d7b29c802b0c8fee178b59d2e26ab00bbfbc72 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] log util: downgrade scary "missing footer" message to INFO
Mike Percy has submitted this change and it was merged. Change subject: log_util: downgrade scary "missing footer" message to INFO .. log_util: downgrade scary "missing footer" message to INFO In most cases if a Kudu server restarts, it will leave log segments with no footer. This previously caused a scary-looking warning to be emitted on the next startup. This downgrades the message to INFO for the case of the footer not being found. I also fixed this code path to properly propagate other types of errors (eg IOError) which might occur while reading the footer. Change-Id: Ic60f57e1276e6e4cc209d37d6fdad444c24db3c6 Reviewed-on: http://gerrit.cloudera.org:8080/4319 Tested-by: Kudu Jenkins Reviewed-by: Mike Percy--- M src/kudu/consensus/log_util.cc 1 file changed, 10 insertions(+), 4 deletions(-) Approvals: Mike Percy: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/4319 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ic60f57e1276e6e4cc209d37d6fdad444c24db3c6 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy
[kudu-CR] compaction policy: avoid O(n^2) calls to EstimateOnDiskSize
Kudu Jenkins has posted comments on this change. Change subject: compaction_policy: avoid O(n^2) calls to EstimateOnDiskSize .. Patch Set 2: Build Started http://104.196.14.100/job/kudu-gerrit/3244/ -- To view, visit http://gerrit.cloudera.org:8080/4191 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic2949218d7f5fd822571a7b14d1d0b4430aeee1d Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1582. Optimize budgeted compaction policy with an approximation
David Ribeiro Alves has posted comments on this change. Change subject: KUDU-1582. Optimize budgeted compaction policy with an approximation .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4e611f161d66ddc47e97e3b5328bc1778a4ac958 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Anonymous Coward #174 Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1582. Optimize budgeted compaction policy with an approximation
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1582. Optimize budgeted compaction policy with an approximation .. Patch Set 4: Build Started http://104.196.14.100/job/kudu-gerrit/3243/ -- To view, visit http://gerrit.cloudera.org:8080/4153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4e611f161d66ddc47e97e3b5328bc1778a4ac958 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Anonymous Coward #174 Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1582. Optimize budgeted compaction policy with an approximation
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4153 to look at the new patch set (#4). Change subject: KUDU-1582. Optimize budgeted compaction policy with an approximation .. KUDU-1582. Optimize budgeted compaction policy with an approximation On a server with ~5.5TB of data, I noticed that the maintenance manager was spending tens of seconds computing optimal compactions, and most of the time was in knapsack solving. This patch implements a "first pass" solution to the compaction policy using an approximation algorithm that is significantly faster than the actual knapsack solver. The approximation algorithm gives a lower bound solution which is at least 50% as good as the optimal solution (i.e. it is a '2-approximation'). In practice, it is often optimal or nearly optimal. To test, I dumped the bounds and sizes of rowsets from a 200+GB YCSB tablet from a real cluster that has been inserting for a few days, and wrote a test which imported that layout back in as mock rowsets in order to run the policy. The result (with the default 5% max approximation error) is between a 6x and 20x speedup (depending on the budget size). The resulting loss in solution quality is less than 1%. Before: Time spent Computing compaction with 128MB budget: real 0.965suser 0.964s sys 0.000s quality=0.118006 Time spent Computing compaction with 256MB budget: real 1.202suser 1.204s sys 0.000s quality=0.254289 Time spent Computing compaction with 512MB budget: real 2.233suser 2.232s sys 0.000s quality=0.524391 Time spent Computing compaction with 1024MB budget: real 4.202s user 4.200s sys 0.000s quality=1.06674 After: Time spent Computing compaction with 128MB budget: real 0.148suser 0.148s sys 0.000s quality=0.117985 Time spent Computing compaction with 256MB budget: real 0.162suser 0.160s sys 0.000s quality=0.252118 Time spent Computing compaction with 512MB budget: real 0.174suser 0.176s sys 0.000s quality=0.524391 Time spent Computing compaction with 1024MB budget: real 0.206s user 0.208s sys 0.000s quality=1.06674 Change-Id: I4e611f161d66ddc47e97e3b5328bc1778a4ac958 --- M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/compaction_policy-test.cc M src/kudu/tablet/compaction_policy.cc M src/kudu/tablet/compaction_policy.h A src/kudu/tablet/ycsb-test-rowsets.tsv 5 files changed, 7,161 insertions(+), 116 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/53/4153/4 -- To view, visit http://gerrit.cloudera.org:8080/4153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4e611f161d66ddc47e97e3b5328bc1778a4ac958 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Anonymous Coward #174 Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] compaction policy: fix bound calculation
Kudu Jenkins has posted comments on this change. Change subject: compaction_policy: fix bound calculation .. Patch Set 3: Build Started http://104.196.14.100/job/kudu-gerrit/3242/ -- To view, visit http://gerrit.cloudera.org:8080/4152 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I76364c17debd7293b36884d64c7747f8c604ae12 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward #174 Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] API and style improvements to the Kudu Flume Sink
Mike Percy has posted comments on this change. Change subject: API and style improvements to the Kudu Flume Sink .. Patch Set 1: I also added Ara to the review in case he has input on this -- To view, visit http://gerrit.cloudera.org:8080/4320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I357df8cac7daa6ce105f9568cc3af09697032eb6 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Ara Ebrahimi Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-HasComments: No
[kudu-CR] KUDU-1582. Optimize budgeted compaction policy with an approximation
Todd Lipcon has posted comments on this change. Change subject: KUDU-1582. Optimize budgeted compaction policy with an approximation .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/4153/3/src/kudu/tablet/compaction_policy.cc File src/kudu/tablet/compaction_policy.cc: Line 294: // If we know that the upper bound here is worse than our currently known > Am I wrong in my understanding that what we're in fact doing here is that w hrm, I think you explained it approximately right. I updated the comment, see if you think it's better this time around. Line 414: // do another pass using the real solver. > maybe replace, "do another pass using the real solver" with "do another pas Done -- To view, visit http://gerrit.cloudera.org:8080/4153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4e611f161d66ddc47e97e3b5328bc1778a4ac958 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Anonymous Coward #174 Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1593 - [python] Allow to set the number of replicas per tablet at table creation time
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4315 to look at the new patch set (#2). Change subject: KUDU-1593 - [python] Allow to set the number of replicas per tablet at table creation time .. KUDU-1593 - [python] Allow to set the number of replicas per tablet at table creation time Currently the Kudu Python client does not allow for the number of replicas to be set at table creation time because the num_replicas method of the KuduTableCreator is not exposed to the Python class. This patch exposes that method as well as it exposes the num_replicas method of the KuduTable class so that the number of replicas for a table can be retrieved from the cluster. This patch includes a test. Change-Id: I9dcac887029ef22e850ea8be72e71b684e6bee05 --- M python/kudu/client.pyx M python/kudu/libkudu_client.pxd M python/kudu/tests/test_client.py 3 files changed, 28 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/4315/2 -- To view, visit http://gerrit.cloudera.org:8080/4315 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I9dcac887029ef22e850ea8be72e71b684e6bee05 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Jordan BirdsellGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] compaction policy: fix bound calculation
Todd Lipcon has posted comments on this change. Change subject: compaction_policy: fix bound calculation .. Patch Set 2: The tests in the next commit actually catch this bug. If I revert this patch and run the test from "Optimize budgeted compaction policy with an approximation" it fails with: ../../src/kudu/tablet/compaction_policy-test.cc:112: Failure Expected: (total_size) <= (budget_mb), actual: 144 vs 128 -- To view, visit http://gerrit.cloudera.org:8080/4152 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I76364c17debd7293b36884d64c7747f8c604ae12 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward #174 Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] compaction policy: fix bound calculation
David Ribeiro Alves has posted comments on this change. Change subject: compaction_policy: fix bound calculation .. Patch Set 2: anyway to add a simple test? -- To view, visit http://gerrit.cloudera.org:8080/4152 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I76364c17debd7293b36884d64c7747f8c604ae12 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward #174 Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1582. Optimize budgeted compaction policy with an approximation
David Ribeiro Alves has posted comments on this change. Change subject: KUDU-1582. Optimize budgeted compaction policy with an approximation .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/4153/3/src/kudu/tablet/compaction_policy.cc File src/kudu/tablet/compaction_policy.cc: Line 294: // If we know that the upper bound here is worse than our currently known Am I wrong in my understanding that what we're in fact doing here is that we're testing whether our current solution is within the "aproximation ratio" of the upper bound, in which case we've done almost as good as we can do and we skip the knapsack solver. The verbage could be slightly improved to more closely match that (in particular, "upper bound is worse" sounds slightly fishy since it kind of implies that you could do better.) It's also slightly weird that we're iterating through the rowsetinfos above and that it corresponds to a knapsackproblem index. Could you change the for look to use an index and use the index to get the rsi and the corresponding best upper bound? Line 414: // do another pass using the real solver. maybe replace, "do another pass using the real solver" with "do another pass, using the real solver if the best solution we obtained in Pass 1 can be significantly improved" -- To view, visit http://gerrit.cloudera.org:8080/4153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4e611f161d66ddc47e97e3b5328bc1778a4ac958 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Anonymous Coward #174 Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-668. log block manager should tolerate empty metadata
Todd Lipcon has submitted this change and it was merged. Change subject: KUDU-668. log block manager should tolerate empty metadata .. KUDU-668. log block manager should tolerate empty metadata This solves a commonly seen issue after a server crashes due to "Too many open files". In many cases, this type of crash would leave an empty metadata file and a missing data file. This isn't a full fix for KUDU-668, but handles the above common case by just checking the size and existence of these files. If the metadata is too small to be valid, and the data is empty, then we just remove the container and continue with startup. Change-Id: Ie4fd9d0f510835d100d35efaf541d34829118f56 Reviewed-on: http://gerrit.cloudera.org:8080/4178 Reviewed-by: Adar DemboTested-by: Kudu Jenkins --- M src/kudu/fs/block_manager-test.cc M src/kudu/fs/log_block_manager.cc M src/kudu/util/pb_util.cc M src/kudu/util/pb_util.h 4 files changed, 69 insertions(+), 12 deletions(-) Approvals: Adar Dembo: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/4178 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ie4fd9d0f510835d100d35efaf541d34829118f56 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1590. Fix cache-test failure on some machines
Todd Lipcon has submitted this change and it was merged. Change subject: KUDU-1590. Fix cache-test failure on some machines .. KUDU-1590. Fix cache-test failure on some machines CacheTest.TestEviction was failing on some machines, depending on the number of cores. This is due to bfb6f2338cd57a1c7f8405c97b66c3050c8920ff which changed the cache to determine the number of cache shards based on the core count. The number of shards changed the eviction behavior because the sharded cache is actually only *approximately* LRU. Each shard itself is LRU, but entries in different shards will not evict each other. So, with more shards, the test wasn't evicting the element as expected. The fix simply increases the amount of cache churn caused by the test so that the eviction proceeds as expected. Change-Id: I77362fc67ab5ba8420c21d1e3fb5c28ff9bfab1b Reviewed-on: http://gerrit.cloudera.org:8080/4321 Reviewed-by: Adar DemboTested-by: Kudu Jenkins --- M src/kudu/util/cache-test.cc 1 file changed, 5 insertions(+), 2 deletions(-) Approvals: Adar Dembo: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/4321 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I77362fc67ab5ba8420c21d1e3fb5c28ff9bfab1b Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1590. Fix cache-test failure on some machines
Todd Lipcon has posted comments on this change. Change subject: KUDU-1590. Fix cache-test failure on some machines .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4321/1/src/kudu/util/cache-test.cc File src/kudu/util/cache-test.cc: Line 190: for (int i = 0; i < kNumElems + 1000; i++) { > Just for my own curiosity (don't need to change anything), what exactly is Didn't really dig into this... my guess is that it was right "on the edge" here, and the number of shards just shuffled the elements we're touching here around differently such that sometimes '101' didn't get evicted, and sometimes it did. -- To view, visit http://gerrit.cloudera.org:8080/4321 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I77362fc67ab5ba8420c21d1e3fb5c28ff9bfab1b Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-1582. Optimize budgeted compaction policy with an approximation
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1582. Optimize budgeted compaction policy with an approximation .. Patch Set 3: Build Started http://104.196.14.100/job/kudu-gerrit/3240/ -- To view, visit http://gerrit.cloudera.org:8080/4153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4e611f161d66ddc47e97e3b5328bc1778a4ac958 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Anonymous Coward #174 Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1582. Optimize budgeted compaction policy with an approximation
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4153 to look at the new patch set (#3). Change subject: KUDU-1582. Optimize budgeted compaction policy with an approximation .. KUDU-1582. Optimize budgeted compaction policy with an approximation On a server with ~5.5TB of data, I noticed that the maintenance manager was spending tens of seconds computing optimal compactions, and most of the time was in knapsack solving. This patch implements a "first pass" solution to the compaction policy using an approximation algorithm that is significantly faster than the actual knapsack solver. The approximation algorithm gives a lower bound solution which is at least 50% as good as the optimal solution (i.e. it is a '2-approximation'). In practice, it is often optimal or nearly optimal. To test, I dumped the bounds and sizes of rowsets from a 200+GB YCSB tablet from a real cluster that has been inserting for a few days, and wrote a test which imported that layout back in as mock rowsets in order to run the policy. The result (with the default 5% max approximation error) is between a 6x and 20x speedup (depending on the budget size). The resulting loss in solution quality is less than 1%. Before: Time spent Computing compaction with 128MB budget: real 0.965suser 0.964s sys 0.000s quality=0.118006 Time spent Computing compaction with 256MB budget: real 1.202suser 1.204s sys 0.000s quality=0.254289 Time spent Computing compaction with 512MB budget: real 2.233suser 2.232s sys 0.000s quality=0.524391 Time spent Computing compaction with 1024MB budget: real 4.202s user 4.200s sys 0.000s quality=1.06674 After: Time spent Computing compaction with 128MB budget: real 0.148suser 0.148s sys 0.000s quality=0.117985 Time spent Computing compaction with 256MB budget: real 0.162suser 0.160s sys 0.000s quality=0.252118 Time spent Computing compaction with 512MB budget: real 0.174suser 0.176s sys 0.000s quality=0.524391 Time spent Computing compaction with 1024MB budget: real 0.206s user 0.208s sys 0.000s quality=1.06674 Change-Id: I4e611f161d66ddc47e97e3b5328bc1778a4ac958 --- M src/kudu/tablet/CMakeLists.txt M src/kudu/tablet/compaction_policy-test.cc M src/kudu/tablet/compaction_policy.cc M src/kudu/tablet/compaction_policy.h A src/kudu/tablet/ycsb-test-rowsets.tsv 5 files changed, 7,154 insertions(+), 116 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/53/4153/3 -- To view, visit http://gerrit.cloudera.org:8080/4153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4e611f161d66ddc47e97e3b5328bc1778a4ac958 Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Anonymous Coward #174 Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1582. Optimize budgeted compaction policy with an approximation
Todd Lipcon has posted comments on this change. Change subject: KUDU-1582. Optimize budgeted compaction policy with an approximation .. Patch Set 2: (9 comments) http://gerrit.cloudera.org:8080/#/c/4153/2/src/kudu/tablet/CMakeLists.txt File src/kudu/tablet/CMakeLists.txt: Line 93: execute_process(COMMAND ln -sf ${CMAKE_CURRENT_SOURCE_DIR}/ycsb-test-rowsets.tsv > nit can you move this before the test (even if not needed makes more sense Done http://gerrit.cloudera.org:8080/#/c/4153/2/src/kudu/tablet/compaction_policy-test.cc File src/kudu/tablet/compaction_policy-test.cc: Line 74: CHECK_OK(ReadFileToString(Env::Default(), path, )); > add error message Done Line 79: CHECK_EQ(3, fields.size()) << line; > improve error message Done Line 82: ParseLeadingInt32Value(fields[0], -1) * 1024 * 1024)); > comment in this magic calculation? Done Line 88: TEST(TestCompactionPolicy, TestYcsbCompaction) { > what is this really testing? just that we get better qualities with higher yea, both a benchmark and a kind of "realistic test", will add a comment to that effect http://gerrit.cloudera.org:8080/#/c/4153/2/src/kudu/tablet/compaction_policy.cc File src/kudu/tablet/compaction_policy.cc: Line 46: DEFINE_double(compaction_approximation_ratio, 1.05f, > would it be much harder to have this be 0 based? (i.e. 0.05 instead of 1.05 I think I'd rather stick with this since it's the definition usually used for approximation algorithms (see https://en.wikipedia.org/wiki/Approximation_algorithm#Performance_guarantees ). Would it be clearer if I called it "approximation factor"? PS2, Line 199: top > nit: move this to the line above Done Line 247: // policy implementedin this function. > missing space Done Line 311: best_upper_bounds.reserve(asc_min_key.size()); > the reasoning seems good, but this is very dense, I wish there was a simple I split into two methods, but not sure what an effective unit test would be. Any ideas? I could come up with some toy problem I suppose but verifying the actual test is pretty tough since you can't really run the dynamic program in your head. -- To view, visit http://gerrit.cloudera.org:8080/4153 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4e611f161d66ddc47e97e3b5328bc1778a4ac958 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Anonymous Coward #174 Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] KUDU-668. log block manager should tolerate empty metadata
Adar Dembo has posted comments on this change. Change subject: KUDU-668. log block manager should tolerate empty metadata .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4178 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ie4fd9d0f510835d100d35efaf541d34829118f56 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] KUDU-1590. Fix cache-test failure on some machines
Adar Dembo has posted comments on this change. Change subject: KUDU-1590. Fix cache-test failure on some machines .. Patch Set 1: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/4321/1/src/kudu/util/cache-test.cc File src/kudu/util/cache-test.cc: Line 190: for (int i = 0; i < kNumElems + 1000; i++) { Just for my own curiosity (don't need to change anything), what exactly is the relationship between the number of insertions needed here and the number of shards such that 200 is guaranteed to be evicted? It's not intuitive to me, partly because when I ran the test on a 16 core box it passed, but on an 8 core box (my laptop) it failed. -- To view, visit http://gerrit.cloudera.org:8080/4321 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I77362fc67ab5ba8420c21d1e3fb5c28ff9bfab1b Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] KUDU-1590. Fix cache-test failure on some machines
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1590. Fix cache-test failure on some machines .. Patch Set 1: Build Started http://104.196.14.100/job/kudu-gerrit/3238/ -- To view, visit http://gerrit.cloudera.org:8080/4321 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I77362fc67ab5ba8420c21d1e3fb5c28ff9bfab1b Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] KUDU-1590. Fix cache-test failure on some machines
Hello Adar Dembo, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/4321 to review the following change. Change subject: KUDU-1590. Fix cache-test failure on some machines .. KUDU-1590. Fix cache-test failure on some machines CacheTest.TestEviction was failing on some machines, depending on the number of cores. This is due to bfb6f2338cd57a1c7f8405c97b66c3050c8920ff which changed the cache to determine the number of cache shards based on the core count. The number of shards changed the eviction behavior because the sharded cache is actually only *approximately* LRU. Each shard itself is LRU, but entries in different shards will not evict each other. So, with more shards, the test wasn't evicting the element as expected. The fix simply increases the amount of cache churn caused by the test so that the eviction proceeds as expected. Change-Id: I77362fc67ab5ba8420c21d1e3fb5c28ff9bfab1b --- M src/kudu/util/cache-test.cc 1 file changed, 5 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/4321/1 -- To view, visit http://gerrit.cloudera.org:8080/4321 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I77362fc67ab5ba8420c21d1e3fb5c28ff9bfab1b Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Todd LipconGerrit-Reviewer: Adar Dembo
[kudu-CR] docs: workflow for master migration
David Ribeiro Alves has posted comments on this change. Change subject: docs: workflow for master migration .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/4300/1/docs/administration.adoc File docs/administration.adoc: Line 236: recovering from permanent master failures greatly, and is highly recommended. The alias should be > To clarify, is your question "how does it simplify recovering from permanen My "how" referred to "How is the user supposed to do this. What is the goal and steps". Someone with no context won't know how or why this "simplifies recovering from permanent master failures greatly". It just seems like, with the removal of the "complex and distracting" explanation you settled on something that is worse than not having anything at all. I don't feel strongly about a particular route (between: full explanation, pointing to the design doc or removing it altogether) just find that leaving just this is confusing. Line 241: colocated with other services, though not with another master from the same configuration. what other services? Are we advising that people co-locate the master with a tablet server? make that clear. Line 244: * Identify and record the directory where the master's data will live. > I wanted "record" in there to make it clear that the choice being made need IMO identify leans more towards "finding the identity" of something vs "choosing the identity" of something. Breaking that symmetry is exactly my point since above it's the former case and here it's the latter Line 314: are working properly, consider performing the following sanity checks: > Is there a way to list existing masters in the system and status of each? Yeah, that would me my suggestion too. First the user should make sure that the system state is the expected one and then yes, try it. Maybe point to a tool (like ksck) that also does scans as scanning usually requires writing code and we wouldn't want an admin to have to write custom code to make sure the migration worked. -- To view, visit http://gerrit.cloudera.org:8080/4300 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9b9c66505e0efd1f4aef80884346507d4fe08d9c Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar DemboGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Anonymous Coward #149 Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] API and style improvements to the Kudu Flume Sink
Kudu Jenkins has posted comments on this change. Change subject: API and style improvements to the Kudu Flume Sink .. Patch Set 1: Build Started http://104.196.14.100/job/kudu-gerrit/3237/ -- To view, visit http://gerrit.cloudera.org:8080/4320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I357df8cac7daa6ce105f9568cc3af09697032eb6 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] API and style improvements to the Kudu Flume Sink
Will Berkeley has uploaded a new change for review. http://gerrit.cloudera.org:8080/4320 Change subject: API and style improvements to the Kudu Flume Sink .. API and style improvements to the Kudu Flume Sink This patch cleans up the Flume integration code a bit. Specifically: 1. s/EventProducer/OperationsProducer/ since the "KuduEventProducer" consumed Flume Events and produced Kudu Operations. 2. The KuduOperationsProducer API was changed. Now the initialize method takes just a KuduTable, and should be called once to initialize the KuduOperationsProducer after it is configured. The getOperations method now takes the Event instead. 3. The close method for a KuduOperationsProducer is now called when the KuduSink is called. Previously this was implied so in a comment but was not true. 4. General clean-up and style improvements. Change-Id: I357df8cac7daa6ce105f9568cc3af09697032eb6 --- R java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/AvroKuduOperationsProducer.java D java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/KuduEventProducer.java A java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/KuduOperationsProducer.java M java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/KuduSink.java M java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/KuduSinkConfigurationConstants.java D java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/SimpleKeyedKuduEventProducer.java A java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/SimpleKeyedKuduOperationsProducer.java R java/kudu-flume-sink/src/main/java/org/apache/kudu/flume/sink/SimpleKuduOperationsProducer.java R java/kudu-flume-sink/src/test/avro/testAvroKuduOperationsProducer.avsc R java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/AvroKuduOperationsProducerTest.java R java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/KeyedKuduOperationsProducerTest.java 11 files changed, 318 insertions(+), 288 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/20/4320/1 -- To view, visit http://gerrit.cloudera.org:8080/4320 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I357df8cac7daa6ce105f9568cc3af09697032eb6 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will Berkeley
[kudu-CR] Predicate evaluation pushdown
Kudu Jenkins has posted comments on this change. Change subject: Predicate evaluation pushdown .. Patch Set 10: Build Started http://104.196.14.100/job/kudu-gerrit/3235/ -- To view, visit http://gerrit.cloudera.org:8080/3990 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I31e4cce21e99f63b089d7c84410af8ed914cb576 Gerrit-PatchSet: 10 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] Inlined dispatch for predicate evaluation
Kudu Jenkins has posted comments on this change. Change subject: Inlined dispatch for predicate evaluation .. Patch Set 5: Build Started http://104.196.14.100/job/kudu-gerrit/3236/ -- To view, visit http://gerrit.cloudera.org:8080/4164 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iccfac9bc899362b442337050795b5ca8c4101268 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: No
[kudu-CR] Inlined dispatch for predicate evaluation
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4164 to look at the new patch set (#5). Change subject: Inlined dispatch for predicate evaluation .. Inlined dispatch for predicate evaluation In order to evaluate a predicate, the correct comparator given the physical type of interest must first be determined. Currently, this occurs once per batch when ColumnPredicate::Evaluate() is called. A call to this will dispatch the comparator as a lambda to be applied in a loop over all rows in the batch. When decoder-level evaluation is enabled, rather than occuring in batches, dispatch occurs for each cell, which leads to poorer performance. To remediate this, the dispatch has been templatized in hopes that the dispatching and branching are inlined. This figure shows the performance of dictionary encoding for decoder-level evaluation without this templating adjustment. The query selects a single value out of a dictionary-encoded column containing 1M unique strings. https://raw.githubusercontent.com/anjuwong/kudu/300f1e8bdf05bc16bdacacca22482e579e49de67/docs/images/SELECT_OUTSIDE_OF_RANGE_non_inlined.png Compare the above with the plot below, which is the result of the same query, but with inlined dispatch. https://raw.githubusercontent.com/anjuwong/kudu/300f1e8bdf05bc16bdacacca22482e579e49de67/docs/images/SELECT_OUTSIDE_OF_RANGE_inlined.png Change-Id: Iccfac9bc899362b442337050795b5ca8c4101268 --- M src/kudu/cfile/binary_plain_block.cc M src/kudu/cfile/cfile_reader.cc M src/kudu/common/column_predicate.cc M src/kudu/common/column_predicate.h 4 files changed, 59 insertions(+), 43 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/64/4164/5 -- To view, visit http://gerrit.cloudera.org:8080/4164 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iccfac9bc899362b442337050795b5ca8c4101268 Gerrit-PatchSet: 5 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] [c++ client] AUTO FLUSH BACKGROUND optimizations
Adar Dembo has posted comments on this change. Change subject: [c++ client] AUTO_FLUSH_BACKGROUND optimizations .. Patch Set 1: (2 comments) The Java client uses 50% too so even if the performance didn't change having that consistency is nice. http://gerrit.cloudera.org:8080/#/c/4308/1//COMMIT_MSG Commit Message: PS1, Line 13: Chaning Nit: Changing PS1, Line 53: ve0518.halxg.cloudera.com Presumably the machine was relatively idle at the time? -- To view, visit http://gerrit.cloudera.org:8080/4308 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1f0aa6d02c51bb063498709e8570e8c7214a31a0 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Alexey SerbinGerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] Inlined dispatch for predicate evaluation
Andrew Wong has posted comments on this change. Change subject: Inlined dispatch for predicate evaluation .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/4164/4/src/kudu/common/column_predicate.cc File src/kudu/common/column_predicate.cc: PS4, Line 331: headers > I think this would be more accurately worded as "Instantiate template for a Done http://gerrit.cloudera.org:8080/#/c/4164/4/src/kudu/common/column_predicate.h File src/kudu/common/column_predicate.h: Line 142: void EvaluateForPhysicalType(const ColumnBlock& block, > Can this be private? Done Line 221: // // Templated evaluation will inline the dispatch. > left over? Done -- To view, visit http://gerrit.cloudera.org:8080/4164 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iccfac9bc899362b442337050795b5ca8c4101268 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] [WIP] KUDU-861 Support changing default, storage attributes
Todd Lipcon has posted comments on this change. Change subject: [WIP] KUDU-861 Support changing default, storage attributes .. Patch Set 3: No problem putting a patch up for it. It was assigned to me for more than a year with no action ;-) Feel free to re-assign to yourself -- To view, visit http://gerrit.cloudera.org:8080/4310 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I457d99ba2188ef6e439df47c0d94f2dc1a62ea6c Gerrit-PatchSet: 3 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1113 pushed predicates not displaying on scans web page
Todd Lipcon has posted comments on this change. Change subject: KUDU-1113 pushed predicates not displaying on scans web page .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/4206 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21821a1f7b5e4d5e5e1bdec17353ae5f3ef8e408 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1113 pushed predicates not displaying on scans web page
Dan Burkert has posted comments on this change. Change subject: KUDU-1113 pushed predicates not displaying on scans web page .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/4206 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21821a1f7b5e4d5e5e1bdec17353ae5f3ef8e408 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] Inlined dispatch for predicate evaluation
Dan Burkert has posted comments on this change. Change subject: Inlined dispatch for predicate evaluation .. Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/4164/4/src/kudu/common/column_predicate.cc File src/kudu/common/column_predicate.cc: PS4, Line 331: headers I think this would be more accurately worded as "Instantiate template for all physical types." http://gerrit.cloudera.org:8080/#/c/4164/4/src/kudu/common/column_predicate.h File src/kudu/common/column_predicate.h: Line 142: void EvaluateForPhysicalType(const ColumnBlock& block, Can this be private? Line 147: bool EvaluateCell(const void* cell) const { This body can be defined in the cc file. It's only necessary to put the definition in the header if you are not or can not pre-declare all the template specializations. Line 221: // // Templated evaluation will inline the dispatch. left over? -- To view, visit http://gerrit.cloudera.org:8080/4164 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iccfac9bc899362b442337050795b5ca8c4101268 Gerrit-PatchSet: 4 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Andrew WongGerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Patch resolves KUDU-1593. Modified Client.create table to expose the num replicas method of the KuduTableCreator class.
David Ribeiro Alves has posted comments on this change. Change subject: Patch resolves KUDU-1593. Modified Client.create_table to expose the num_replicas method of the KuduTableCreator class. .. Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/4315/1//COMMIT_MSG Commit Message: Line 7: Patch resolves KUDU-1593. mention what the patch does. look at other patches, it's usually something like: KUDU-1593 - [python] Allow to set the number of replicas per tablet at table creation time Line 8: Modified Client.create_table to expose the num_replicas method of empty line between the title and the commit message, mention the test and possibly the motivation for the patch (short message, something like: "currently Kudu can't do X because of such and such. This patch does Y to solve it and includes a test.) http://gerrit.cloudera.org:8080/#/c/4315/1/python/kudu/client.pyx File python/kudu/client.pyx: Line 239: n_replicas : int mention what the new variable refers to http://gerrit.cloudera.org:8080/#/c/4315/1/python/kudu/tests/test_client.py File python/kudu/tests/test_client.py: Line 97: def test_create_non_replicated_table(self): would there be a way to verify the number of replicas of the created table? right now if you just dropped the new argument this test would still pass. Line 114: n_replicas = 2 ** 15 - 1 I don't think we need to test this, though I'm curious: where did you get this number? Line 115: remove whitespace -- To view, visit http://gerrit.cloudera.org:8080/4315 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9dcac887029ef22e850ea8be72e71b684e6bee05 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Jordan BirdsellGerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: Yes
[kudu-CR] Patch resolves KUDU-1593. Modified Client.create table to expose the num replicas method of the KuduTableCreator class.
Kudu Jenkins has posted comments on this change. Change subject: Patch resolves KUDU-1593. Modified Client.create_table to expose the num_replicas method of the KuduTableCreator class. .. Patch Set 1: Build Started http://104.196.14.100/job/kudu-gerrit/3233/ -- To view, visit http://gerrit.cloudera.org:8080/4315 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I9dcac887029ef22e850ea8be72e71b684e6bee05 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: jordantbirds...@gmail.com Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] Patch resolves KUDU-1593. Modified Client.create table to expose the num replicas method of the KuduTableCreator class.
jordantbirds...@gmail.com has uploaded a new change for review. http://gerrit.cloudera.org:8080/4315 Change subject: Patch resolves KUDU-1593. Modified Client.create_table to expose the num_replicas method of the KuduTableCreator class. .. Patch resolves KUDU-1593. Modified Client.create_table to expose the num_replicas method of the KuduTableCreator class. Change-Id: I9dcac887029ef22e850ea8be72e71b684e6bee05 --- M python/kudu/client.pyx M python/kudu/tests/test_client.py 2 files changed, 29 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/4315/1 -- To view, visit http://gerrit.cloudera.org:8080/4315 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I9dcac887029ef22e850ea8be72e71b684e6bee05 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: jordantbirds...@gmail.com
[kudu-CR] KUDU-1113 pushed predicates not displaying on scans web page
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4206 to look at the new patch set (#2). Change subject: KUDU-1113 pushed predicates not displaying on scans web page .. KUDU-1113 pushed predicates not displaying on scans web page This fixes Scanner so it will display predicates in /scanz. A Scanner is constructed with a reference to its ScanSpec, but the predicates in the ScanSpec are removed by the Scanner's iterator as they are pushed down to lower iterators. The Scanner now holds a copy of the ScanSpec, from just before the ScanSpec is pass to the iterator and (potentially) modified. Change-Id: I21821a1f7b5e4d5e5e1bdec17353ae5f3ef8e408 --- M src/kudu/tserver/scanners.cc M src/kudu/tserver/tablet_service.cc 2 files changed, 8 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/06/4206/2 -- To view, visit http://gerrit.cloudera.org:8080/4206 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I21821a1f7b5e4d5e5e1bdec17353ae5f3ef8e408 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-1113 pushed predicates not displaying on scans web page
Kudu Jenkins has posted comments on this change. Change subject: KUDU-1113 pushed predicates not displaying on scans web page .. Patch Set 2: Build Started http://104.196.14.100/job/kudu-gerrit/3232/ -- To view, visit http://gerrit.cloudera.org:8080/4206 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21821a1f7b5e4d5e5e1bdec17353ae5f3ef8e408 Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: No
[kudu-CR] KUDU-1113 pushed predicates not displaying on scans web page
Will Berkeley has posted comments on this change. Change subject: KUDU-1113 pushed predicates not displaying on scans web page .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4206/1/src/kudu/tserver/tablet_service.cc File src/kudu/tserver/tablet_service.cc: PS1, Line 1488: can report on predicates. > think it would be better to say something like 'report its predicates on th Done -- To view, visit http://gerrit.cloudera.org:8080/4206 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I21821a1f7b5e4d5e5e1bdec17353ae5f3ef8e408 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Will BerkeleyGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-HasComments: Yes