[kudu-CR] [python] : Add Pandas Support to Scanner
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/10809 ) Change subject: [python] : Add Pandas Support to Scanner .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/10809/4/build-support/jenkins/build-and-test.sh File build-support/jenkins/build-and-test.sh: http://gerrit.cloudera.org:8080/#/c/10809/4/build-support/jenkins/build-and-test.sh@454 PS4, Line 454: pip install pandas > Since pandas will be an optional package (even in tests) we cant really add Why is it optional for tests? Even if it's optional for users I don't understand a scenario in which we wouldn't want to test the functionality. -- To view, visit http://gerrit.cloudera.org:8080/10809 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I90fab5c0c42448bcc17ea22be37420c6ef2f4915 Gerrit-Change-Number: 10809 Gerrit-PatchSet: 4 Gerrit-Owner: Jordan Birdsell Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Jordan Birdsell Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Tue, 26 Jun 2018 04:08:11 + Gerrit-HasComments: Yes
[kudu-CR] [python] : Add Pandas Support to Scanner
Jordan Birdsell has posted comments on this change. ( http://gerrit.cloudera.org:8080/10809 ) Change subject: [python] : Add Pandas Support to Scanner .. Patch Set 5: (7 comments) http://gerrit.cloudera.org:8080/#/c/10809/4/build-support/jenkins/build-and-test.sh File build-support/jenkins/build-and-test.sh: http://gerrit.cloudera.org:8080/#/c/10809/4/build-support/jenkins/build-and-test.sh@454 PS4, Line 454: pip install pandas > should we move our dependencies to the setup.py and have pandas as a test o see response to grant ^^ http://gerrit.cloudera.org:8080/#/c/10809/4/build-support/jenkins/build-and-test.sh@454 PS4, Line 454: pip install pandas > I think it would be more intuitive/maintainable to build this into the pyth Since pandas will be an optional package (even in tests) we cant really add it to any requirements (in a file or in setup.py). This is the pattern that i've seen done for other projects, this line here is merely just to enforce our testing of the pandas tests. http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx File python/kudu/client.pyx: http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx@121 PS4, Line 121: are converted incorrectly by Pandas. > converted incorrectly by whom? Done http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx@1976 PS4, Line 1976: This method call is only needed if you want to explicitly release the > nit: add comma before "as" Done http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx@1991 PS4, Line 1991: -- > this will require _twice_ the total dataset size inn memory because the "al gp, handled this by creating a generator method so we can create the final dataframe by iterating through the batches but only storing them into memory once. http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx@2001 PS4, Line 2001: > seems strange to set the type dynamically based on the dataset contents. Is yea, so pandas is weird about nulls and not all data types can have null values stored i believe. For what its worth, i took this approach from pyspark. http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx@2004 PS4, Line 2004: : # Here we are using list comprehension with the > is this just a metadata change or does it actually change the data? This is actually casting the data according to the docs. -- To view, visit http://gerrit.cloudera.org:8080/10809 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I90fab5c0c42448bcc17ea22be37420c6ef2f4915 Gerrit-Change-Number: 10809 Gerrit-PatchSet: 5 Gerrit-Owner: Jordan Birdsell Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Jordan Birdsell Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Tue, 26 Jun 2018 03:53:25 + Gerrit-HasComments: Yes
[kudu-CR] [python] : Add Pandas Support to Scanner
Hello Will Berkeley, David Ribeiro Alves, Kudu Jenkins, Grant Henke, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10809 to look at the new patch set (#5). Change subject: [python] : Add Pandas Support to Scanner .. [python] : Add Pandas Support to Scanner This patch adds support for converting scanner results to a Pandas DataFrame. There are a few basic unit tests included in this patch as well. Change-Id: I90fab5c0c42448bcc17ea22be37420c6ef2f4915 --- M build-support/jenkins/build-and-test.sh M python/kudu/__init__.py M python/kudu/client.pyx M python/kudu/tests/test_scanner.py 4 files changed, 169 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/09/10809/5 -- To view, visit http://gerrit.cloudera.org:8080/10809 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I90fab5c0c42448bcc17ea22be37420c6ef2f4915 Gerrit-Change-Number: 10809 Gerrit-PatchSet: 5 Gerrit-Owner: Jordan Birdsell Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] consensus queue: Clarify return value of ResponseFromPeer()
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10821 ) Change subject: consensus queue: Clarify return value of ResponseFromPeer() .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/10821/1/src/kudu/consensus/consensus_queue-test.cc File src/kudu/consensus/consensus_queue-test.cc: http://gerrit.cloudera.org:8080/#/c/10821/1/src/kudu/consensus/consensus_queue-test.cc@713 PS1, Line 713: SetLastReceivedAndLas > Indeed: is there any assertion needed for send_more_immediately here? Ah, it seems it has been addressed in PS2 already. -- To view, visit http://gerrit.cloudera.org:8080/10821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibe73b5c69630898327ac7787e3dfcb0b761f323e Gerrit-Change-Number: 10821 Gerrit-PatchSet: 2 Gerrit-Owner: Mike Percy Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Tue, 26 Jun 2018 03:41:16 + Gerrit-HasComments: Yes
[kudu-CR] consensus queue: Clarify return value of ResponseFromPeer()
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10821 ) Change subject: consensus queue: Clarify return value of ResponseFromPeer() .. Patch Set 1: (6 comments) looks good, just few nits http://gerrit.cloudera.org:8080/#/c/10821/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10821/1//COMMIT_MSG@9 PS1, Line 9: improvmenets improvements http://gerrit.cloudera.org:8080/#/c/10821/1//COMMIT_MSG@11 PS1, Line 11: anothe another http://gerrit.cloudera.org:8080/#/c/10821/1/src/kudu/consensus/consensus_queue-test.cc File src/kudu/consensus/consensus_queue-test.cc: http://gerrit.cloudera.org:8080/#/c/10821/1/src/kudu/consensus/consensus_queue-test.cc@528 PS1, Line 528: bool send_more_immediately; : send_more_immediately = queue_->ResponseFromPeer(response.responder_uuid(), response); nit: combine definition and assignment http://gerrit.cloudera.org:8080/#/c/10821/1/src/kudu/consensus/consensus_queue-test.cc@713 PS1, Line 713: send_more_immediately Indeed: is there any assertion needed for send_more_immediately here? http://gerrit.cloudera.org:8080/#/c/10821/1/src/kudu/consensus/consensus_queue.cc File src/kudu/consensus/consensus_queue.cc: http://gerrit.cloudera.org:8080/#/c/10821/1/src/kudu/consensus/consensus_queue.cc@1033 PS1, Line 1033: bool send_more_immediately; nit: set to 'false' to use in the closest return: bool send_more_immediately = false; http://gerrit.cloudera.org:8080/#/c/10821/1/src/kudu/consensus/consensus_queue.cc@1043 PS1, Line 1043: return false nit: return send_mode_immediately; -- To view, visit http://gerrit.cloudera.org:8080/10821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibe73b5c69630898327ac7787e3dfcb0b761f323e Gerrit-Change-Number: 10821 Gerrit-PatchSet: 1 Gerrit-Owner: Mike Percy Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Tue, 26 Jun 2018 03:38:54 + Gerrit-HasComments: Yes
[kudu-CR] [tools] small fix on RebalancingAlgo::GetNextMoves()
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10819 ) Change subject: [tools] small fix on RebalancingAlgo::GetNextMoves() .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/10819 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iad7a7bc5bee939ce92d18845d6e43bcadccac5a2 Gerrit-Change-Number: 10819 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Tue, 26 Jun 2018 03:25:14 + Gerrit-HasComments: No
[kudu-CR] consensus queue: Clarify return value of ResponseFromPeer()
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10821 to look at the new patch set (#2). Change subject: consensus queue: Clarify return value of ResponseFromPeer() .. consensus queue: Clarify return value of ResponseFromPeer() This patch simply makes some minor semantic improvmenets to the ConsensusQueue::ResponseFromPeer() API by returning the value of whether to immediately send anothe request and improving the header file comment documentation about those semantics. Change-Id: Ibe73b5c69630898327ac7787e3dfcb0b761f323e --- M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_queue-test.cc M src/kudu/consensus/consensus_queue.cc M src/kudu/consensus/consensus_queue.h 4 files changed, 88 insertions(+), 79 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/10821/2 -- To view, visit http://gerrit.cloudera.org:8080/10821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibe73b5c69630898327ac7787e3dfcb0b761f323e Gerrit-Change-Number: 10821 Gerrit-PatchSet: 2 Gerrit-Owner: Mike Percy Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] [tools] small fix on RebalancingAlgo::GetNextMoves()
Hello Will Berkeley, Mike Percy, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10819 to look at the new patch set (#2). Change subject: [tools] small fix on RebalancingAlgo::GetNextMoves() .. [tools] small fix on RebalancingAlgo::GetNextMoves() Fix on short-circuiting in case of empty input for RebalancingAlgo::GetNextMoves(). Added consistency check for the 'cluster_info' input parameter and corresponding unit test. Change-Id: Iad7a7bc5bee939ce92d18845d6e43bcadccac5a2 --- M src/kudu/tools/rebalance_algo-test.cc M src/kudu/tools/rebalance_algo.cc 2 files changed, 36 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/19/10819/2 -- To view, visit http://gerrit.cloudera.org:8080/10819 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Iad7a7bc5bee939ce92d18845d6e43bcadccac5a2 Gerrit-Change-Number: 10819 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Will Berkeley
[kudu-CR] tools: Add debug mode to pb dump tool
Hello Kudu Jenkins, Adar Dembo, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/9024 to look at the new patch set (#2). Change subject: tools: Add debug mode to pb dump tool .. tools: Add debug mode to pb dump tool This is useful when dumping a partially-flushed file to determine the start of the damaged entry. To remove the damaged entry, the file can be truncated at the beginning of the offending entry. See KUDU-2260. Change-Id: Ica2a74a2e252d140cf7704ff68037a64db19cd80 --- M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_pbc.cc M src/kudu/util/pb_util.cc M src/kudu/util/pb_util.h 4 files changed, 58 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/24/9024/2 -- To view, visit http://gerrit.cloudera.org:8080/9024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ica2a74a2e252d140cf7704ff68037a64db19cd80 Gerrit-Change-Number: 9024 Gerrit-PatchSet: 2 Gerrit-Owner: Mike Percy Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] tools: Add debug mode to pb dump tool
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9024 ) Change subject: tools: Add debug mode to pb dump tool .. Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/9024/1//COMMIT_MSG Commit Message: PS1: > Could you add a short test to kudu-tool-test? A run with --debug and a chec Done http://gerrit.cloudera.org:8080/#/c/9024/1//COMMIT_MSG@13 PS1, Line 13: TODO: Should this be called "verbose" mode instead of debug mode? > I don't think it really matters, since the additional information is just f Done http://gerrit.cloudera.org:8080/#/c/9024/1/src/kudu/tools/tool_action_pbc.cc File src/kudu/tools/tool_action_pbc.cc: http://gerrit.cloudera.org:8080/#/c/9024/1/src/kudu/tools/tool_action_pbc.cc@92 PS1, Line 92: } > Nit: could you add an else that does LOG(FATAL) or something like that? That's not necessary since the flags are all optional. http://gerrit.cloudera.org:8080/#/c/9024/1/src/kudu/tools/tool_action_pbc.cc@232 PS1, Line 232: .AddOptionalParameter("oneline") > Should add debug here. Done http://gerrit.cloudera.org:8080/#/c/9024/1/src/kudu/util/pb_util.cc File src/kudu/util/pb_util.cc: http://gerrit.cloudera.org:8080/#/c/9024/1/src/kudu/util/pb_util.cc@930 PS1, Line 930: // Fallthrough. > use FALLTHROUGH_INTENDED from macros.h which has some magic effect on clang removed the fallthrough http://gerrit.cloudera.org:8080/#/c/9024/1/src/kudu/util/pb_util.cc@937 PS1, Line 937: "---" > Nit: if this is now being used in three places, perhaps make it a constant Done -- To view, visit http://gerrit.cloudera.org:8080/9024 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ica2a74a2e252d140cf7704ff68037a64db19cd80 Gerrit-Change-Number: 9024 Gerrit-PatchSet: 1 Gerrit-Owner: Mike Percy Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 26 Jun 2018 03:10:44 + Gerrit-HasComments: Yes
[kudu-CR] consensus queue: Clarify return value of ResponseFromPeer()
Hello Alexey Serbin, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/10821 to review the following change. Change subject: consensus queue: Clarify return value of ResponseFromPeer() .. consensus queue: Clarify return value of ResponseFromPeer() This patch simply makes some minor semantic improvmenets to the ConsensusQueue::ResponseFromPeer() API by returning the value of whether to immediately send anothe request and improving the header file comment documentation about those semantics. Change-Id: Ibe73b5c69630898327ac7787e3dfcb0b761f323e --- M src/kudu/consensus/consensus_peers.cc M src/kudu/consensus/consensus_queue-test.cc M src/kudu/consensus/consensus_queue.cc M src/kudu/consensus/consensus_queue.h 4 files changed, 85 insertions(+), 79 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/21/10821/1 -- To view, visit http://gerrit.cloudera.org:8080/10821 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ibe73b5c69630898327ac7787e3dfcb0b761f323e Gerrit-Change-Number: 10821 Gerrit-PatchSet: 1 Gerrit-Owner: Mike Percy Gerrit-Reviewer: Alexey Serbin
[kudu-CR] tablet metadata: remove unused method declaration
Hello Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10820 to look at the new patch set (#2). Change subject: tablet_metadata: remove unused method declaration .. tablet_metadata: remove unused method declaration Change-Id: Ifc53af93f5fa79d39f8a13e2791f7830a998991c --- M src/kudu/tablet/tablet_metadata.h 1 file changed, 0 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/20/10820/2 -- To view, visit http://gerrit.cloudera.org:8080/10820 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ifc53af93f5fa79d39f8a13e2791f7830a998991c Gerrit-Change-Number: 10820 Gerrit-PatchSet: 2 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] tablet metadata: remove unused function definition
Andrew Wong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10820 Change subject: tablet_metadata: remove unused function definition .. tablet_metadata: remove unused function definition Change-Id: Ifc53af93f5fa79d39f8a13e2791f7830a998991c --- M src/kudu/tablet/tablet_metadata.h 1 file changed, 0 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/20/10820/1 -- To view, visit http://gerrit.cloudera.org:8080/10820 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ifc53af93f5fa79d39f8a13e2791f7830a998991c Gerrit-Change-Number: 10820 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Wong
[kudu-CR] [tools] small fix on RebalancingAlgo::GetNextMoves()
Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10819 Change subject: [tools] small fix on RebalancingAlgo::GetNextMoves() .. [tools] small fix on RebalancingAlgo::GetNextMoves() Fix on short-circuiting in case of empty input for RebalancingAlgo::GetNextMoves(). Added consistency check for the 'cluster_info' input parameter and corresponding unit test. Change-Id: Iad7a7bc5bee939ce92d18845d6e43bcadccac5a2 --- M src/kudu/tools/rebalance_algo-test.cc M src/kudu/tools/rebalance_algo.cc 2 files changed, 35 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/19/10819/1 -- To view, visit http://gerrit.cloudera.org:8080/10819 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Iad7a7bc5bee939ce92d18845d6e43bcadccac5a2 Gerrit-Change-Number: 10819 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin
[kudu-CR] KUDU-2191: HMS Metadata Consistency Check Tool
Hao Hao has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10217 ) Change subject: KUDU-2191: HMS Metadata Consistency Check Tool .. KUDU-2191: HMS Metadata Consistency Check Tool This commit introduces a metadata consistency check CLI tool for Hive Metastore integration. It checks metadata (table ID, table name, Kudu master addresses, and column name and type) between Kudu and HMS, and reports inconsistent metadata to the user if any. It adds test case using external mini cluster to ensure the tool works as expected. Change-Id: I3495e9c755571b85beaecca849f83457b592ee90 Reviewed-on: http://gerrit.cloudera.org:8080/10217 Reviewed-by: Dan Burkert Tested-by: Kudu Jenkins --- M src/kudu/client/client.h M src/kudu/hms/hms_catalog.h M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_hms.cc 4 files changed, 406 insertions(+), 46 deletions(-) Approvals: Dan Burkert: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/10217 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I3495e9c755571b85beaecca849f83457b592ee90 Gerrit-Change-Number: 10217 Gerrit-PatchSet: 15 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] parse metrics log: update to the new format
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10693 ) Change subject: parse_metrics_log: update to the new format .. parse_metrics_log: update to the new format The new diagnostics logs report more than just metrics, and thus, output a bit differently than they did in simpler times. This patch updates the parsing to be compatible with the new logs. The changes this patch addresses: - each metrics log line had a different number of fields - the metrics log will only emit metrics from entities that have changed; as such, the processing of metrics has been changed; where we previously filled in NaNs for missing data, we now pull from the previous snapshot, or fill in 0 if one doesn't exist I tested this manually by running against some metrics logs I collected across a couple of workloads. The results seem to match what was expected. Change-Id: If11b7ecc93a3f64db3b7c1994f47308b3ec44029 Reviewed-on: http://gerrit.cloudera.org:8080/10693 Reviewed-by: Todd Lipcon Tested-by: Kudu Jenkins --- M src/kudu/scripts/parse_metrics_log.py 1 file changed, 124 insertions(+), 67 deletions(-) Approvals: Todd Lipcon: Looks good to me, approved Kudu Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/10693 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: If11b7ecc93a3f64db3b7c1994f47308b3ec44029 Gerrit-Change-Number: 10693 Gerrit-PatchSet: 5 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2191: HMS inconsistent metadata fix tool
Hello Tidy Bot, Dan Burkert, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10303 to look at the new patch set (#7). Change subject: KUDU-2191: HMS inconsistent metadata fix tool .. KUDU-2191: HMS inconsistent metadata fix tool This commit introduces a CLI tool that fixes inconsistent metadata for Hive Metastore integration. It corrects the table name based on the users input. And it fixes the master addresses property, column name and type based on the values in Kudu. Test case is added using external mini cluster to ensure the tool works as expected. Change-Id: I63c694b5d9877cfbd218139f2e9ecf05fd69c997 --- M src/kudu/client/client.h M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_hms.cc 3 files changed, 251 insertions(+), 26 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/03/10303/7 -- To view, visit http://gerrit.cloudera.org:8080/10303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I63c694b5d9877cfbd218139f2e9ecf05fd69c997 Gerrit-Change-Number: 10303 Gerrit-PatchSet: 7 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] WIP: KUDU-2191: support table-name identifiers with upper case chars
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/10817 ) Change subject: WIP: KUDU-2191: support table-name identifiers with upper case chars .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/10817/1/src/kudu/hms/hms_catalog-test.cc File src/kudu/hms/hms_catalog-test.cc: http://gerrit.cloudera.org:8080/#/c/10817/1/src/kudu/hms/hms_catalog-test.cc@108 PS1, Line 108: UMPED jumps -- To view, visit http://gerrit.cloudera.org:8080/10817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd Gerrit-Change-Number: 10817 Gerrit-PatchSet: 1 Gerrit-Owner: Dan Burkert Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 26 Jun 2018 00:22:18 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2191: HMS Metadata Consistency Check Tool
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/10217 ) Change subject: KUDU-2191: HMS Metadata Consistency Check Tool .. Patch Set 14: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/10217 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3495e9c755571b85beaecca849f83457b592ee90 Gerrit-Change-Number: 10217 Gerrit-PatchSet: 14 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Tue, 26 Jun 2018 00:12:35 + Gerrit-HasComments: No
[kudu-CR] KUDU-2191: HMS Metadata Consistency Check Tool
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/10217 ) Change subject: KUDU-2191: HMS Metadata Consistency Check Tool .. Patch Set 14: (3 comments) http://gerrit.cloudera.org:8080/#/c/10217/13/src/kudu/tools/tool_action_hms.cc File src/kudu/tools/tool_action_hms.cc: http://gerrit.cloudera.org:8080/#/c/10217/13/src/kudu/tools/tool_action_hms.cc@343 PS13, Line 343: "KuduMasterAddresses"}); > These don't appear to be printed to the console. Done http://gerrit.cloudera.org:8080/#/c/10217/13/src/kudu/tools/tool_action_hms.cc@372 PS13, Line 372: shared_ptr kudu_table; > Perhaps the tool could output a table for legacy tables, and if there are a Done http://gerrit.cloudera.org:8080/#/c/10217/13/src/kudu/tools/tool_action_hms.cc@375 PS13, Line 375: unsynced_tables_map[kudu_table->id()].first = kudu_table; > Is it meant that this exact same status will get added multiple times? Done -- To view, visit http://gerrit.cloudera.org:8080/10217 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3495e9c755571b85beaecca849f83457b592ee90 Gerrit-Change-Number: 10217 Gerrit-PatchSet: 14 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Tue, 26 Jun 2018 00:04:59 + Gerrit-HasComments: Yes
[kudu-CR] WIP: KUDU-2191: support table-name identifiers with upper case chars
Hello Adar Dembo, Hao Hao, Todd Lipcon, I'd like you to do a code review. Please visit http://gerrit.cloudera.org:8080/10817 to review the following change. Change subject: WIP: KUDU-2191: support table-name identifiers with upper case chars .. WIP: KUDU-2191: support table-name identifiers with upper case chars The HMS lowercases all database (table) identifiers during database (table) creation, only storing the lowercased version. On database and table lookup the HMS automatically does a case-insensitive compare. During table creation Kudu checks that table names are valid UTF-8, and does no transformations on identiers. During table lookups Kudu requires that the table name match exactly, including case. As a result of these behavior differences and the design of the notification log listener, tables with upper-case characters previously could not be altered or deleted. This commit introduces changes to how the Catalog Manager handles identifiers when the HMS integration is enabled: * During table creation, the Catalog Manager preserves the case of table names. * On table lookup, the Catalog Manager does a case-insensitive comparison to find the table. This is implemented by storing the preserved case in the table's metadata entries in the sys-catalog, and storing a 'normalized' (down-cased) identifier in the by-name table map. The various parts of the catalog manager which deal with the by-name map are converted to use the normalized version of the name. There is one edge case that complicates turning on the HMS integration in rare circumstances: if there are existing (legacy) tables with names which map to the same normalized form (e.g. differ only in case), the catalog manager will fail to startup and instruct the operator to rename the offending tables before trying again. Additionally, this check only applies to tables that otherwise follow the Hive table naming rules (matching regex '(\w_/)+\.(\w_/)+'). WIP: I'm unsure about these details of the patch: * Database names and table names are treated the same, including case preservation. This may lead to confusion since Kudu tables 'foo.bar' and 'FOO.baz' will be in the same Hive database. * When a client opens a table the name stored in the table object is the same as the client specifies. So if table 'default.FOO' is opened with name 'Default.Foo', the table object will internally hold the name 'Default.Foo'. Both of these seem like the right thing to do to me, but may be a source of confusion. Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd --- M src/kudu/hms/hms_catalog-test.cc M src/kudu/hms/hms_catalog.cc M src/kudu/hms/hms_catalog.h M src/kudu/hms/hms_client-test.cc M src/kudu/integration-tests/master_hms-itest.cc M src/kudu/master/catalog_manager.cc M src/kudu/master/catalog_manager.h M src/kudu/mini-cluster/external_mini_cluster.cc M src/kudu/mini-cluster/external_mini_cluster.h 9 files changed, 356 insertions(+), 95 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/10817/1 -- To view, visit http://gerrit.cloudera.org:8080/10817 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I18977d6fe7b2999a36681a728ac0d1e54b7f38cd Gerrit-Change-Number: 10817 Gerrit-PatchSet: 1 Gerrit-Owner: Dan Burkert Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2191: HMS Metadata Consistency Check Tool
Hello Tidy Bot, Dan Burkert, Kudu Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10217 to look at the new patch set (#14). Change subject: KUDU-2191: HMS Metadata Consistency Check Tool .. KUDU-2191: HMS Metadata Consistency Check Tool This commit introduces a metadata consistency check CLI tool for Hive Metastore integration. It checks metadata (table ID, table name, Kudu master addresses, and column name and type) between Kudu and HMS, and reports inconsistent metadata to the user if any. It adds test case using external mini cluster to ensure the tool works as expected. Change-Id: I3495e9c755571b85beaecca849f83457b592ee90 --- M src/kudu/client/client.h M src/kudu/hms/hms_catalog.h M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_hms.cc 4 files changed, 406 insertions(+), 46 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/17/10217/14 -- To view, visit http://gerrit.cloudera.org:8080/10217 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I3495e9c755571b85beaecca849f83457b592ee90 Gerrit-Change-Number: 10217 Gerrit-PatchSet: 14 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] parse metrics log: update to the new format
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10693 ) Change subject: parse_metrics_log: update to the new format .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/10693 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If11b7ecc93a3f64db3b7c1994f47308b3ec44029 Gerrit-Change-Number: 10693 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 25 Jun 2018 22:40:40 + Gerrit-HasComments: No
[kudu-CR] dmsiterator: remove unused field
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10801 ) Change subject: dmsiterator: remove unused field .. dmsiterator: remove unused field Change-Id: I877685b59a9353dcc5b9a4b385e5c1c7749154f8 Reviewed-on: http://gerrit.cloudera.org:8080/10801 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin Reviewed-by: Mike Percy Reviewed-by: Grant Henke --- M src/kudu/tablet/deltafile.cc M src/kudu/tablet/deltamemstore.h 2 files changed, 2 insertions(+), 6 deletions(-) Approvals: Kudu Jenkins: Verified Alexey Serbin: Looks good to me, approved Mike Percy: Looks good to me, approved Grant Henke: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/10801 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I877685b59a9353dcc5b9a4b385e5c1c7749154f8 Gerrit-Change-Number: 10801 Gerrit-PatchSet: 2 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] parse metrics log: update to the new format
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10693 ) Change subject: parse_metrics_log: update to the new format .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/10693/3/src/kudu/scripts/parse_metrics_log.py File src/kudu/scripts/parse_metrics_log.py: http://gerrit.cloudera.org:8080/#/c/10693/3/src/kudu/scripts/parse_metrics_log.py@123 PS3, Line 123: > you could use: Neat http://gerrit.cloudera.org:8080/#/c/10693/3/src/kudu/scripts/parse_metrics_log.py@276 PS3, Line 276: raise Exception("timestamps must be in ascending order (%f <= %f at %s:%d)" > nit: log_type != "metrics" Done -- To view, visit http://gerrit.cloudera.org:8080/10693 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If11b7ecc93a3f64db3b7c1994f47308b3ec44029 Gerrit-Change-Number: 10693 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 25 Jun 2018 22:32:37 + Gerrit-HasComments: Yes
[kudu-CR] parse metrics log: update to the new format
Hello David Ribeiro Alves, Kudu Jenkins, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10693 to look at the new patch set (#4). Change subject: parse_metrics_log: update to the new format .. parse_metrics_log: update to the new format The new diagnostics logs report more than just metrics, and thus, output a bit differently than they did in simpler times. This patch updates the parsing to be compatible with the new logs. The changes this patch addresses: - each metrics log line had a different number of fields - the metrics log will only emit metrics from entities that have changed; as such, the processing of metrics has been changed; where we previously filled in NaNs for missing data, we now pull from the previous snapshot, or fill in 0 if one doesn't exist I tested this manually by running against some metrics logs I collected across a couple of workloads. The results seem to match what was expected. Change-Id: If11b7ecc93a3f64db3b7c1994f47308b3ec44029 --- M src/kudu/scripts/parse_metrics_log.py 1 file changed, 124 insertions(+), 67 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/93/10693/4 -- To view, visit http://gerrit.cloudera.org:8080/10693 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If11b7ecc93a3f64db3b7c1994f47308b3ec44029 Gerrit-Change-Number: 10693 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon
[kudu-CR] parse metrics log: update to the new format
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10693 ) Change subject: parse_metrics_log: update to the new format .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/10693/3/src/kudu/scripts/parse_metrics_log.py File src/kudu/scripts/parse_metrics_log.py: http://gerrit.cloudera.org:8080/#/c/10693/3/src/kudu/scripts/parse_metrics_log.py@123 PS3, Line 123: # Add the metric_id to the metrics map. you could use: ret = collections.defaultdict(dict) and it'll magically create empty dictionaries upon first reference, avoiding this if statement http://gerrit.cloudera.org:8080/#/c/10693/3/src/kudu/scripts/parse_metrics_log.py@276 PS3, Line 276: if not log_type == "metrics": nit: log_type != "metrics" -- To view, visit http://gerrit.cloudera.org:8080/10693 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If11b7ecc93a3f64db3b7c1994f47308b3ec44029 Gerrit-Change-Number: 10693 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 25 Jun 2018 22:16:40 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2191: HMS Metadata Consistency Check Tool
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/10217 ) Change subject: KUDU-2191: HMS Metadata Consistency Check Tool .. Patch Set 13: (3 comments) http://gerrit.cloudera.org:8080/#/c/10217/13/src/kudu/tools/tool_action_hms.cc File src/kudu/tools/tool_action_hms.cc: http://gerrit.cloudera.org:8080/#/c/10217/13/src/kudu/tools/tool_action_hms.cc@343 PS13, Line 343: std::vector error_messages; These don't appear to be printed to the console. http://gerrit.cloudera.org:8080/#/c/10217/13/src/kudu/tools/tool_action_hms.cc@372 PS13, Line 372: LOG(WARNING) << Substitute("Found legacy table $0.$1 in the Hive Metastore, " Perhaps the tool could output a table for legacy tables, and if there are any then you could additionally print this hint to use the upgrade tool once. http://gerrit.cloudera.org:8080/#/c/10217/13/src/kudu/tools/tool_action_hms.cc@375 PS13, Line 375: error_messages.push_back(Status::IllegalState("Legacy table found")); Is it meant that this exact same status will get added multiple times? -- To view, visit http://gerrit.cloudera.org:8080/10217 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3495e9c755571b85beaecca849f83457b592ee90 Gerrit-Change-Number: 10217 Gerrit-PatchSet: 13 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Mon, 25 Jun 2018 21:43:36 + Gerrit-HasComments: Yes
[kudu-CR] [rebalance-test] fix running the test on Ubuntu 18.04
Alexey Serbin has removed Kudu Jenkins from this change. ( http://gerrit.cloudera.org:8080/10812 ) Change subject: [rebalance-test] fix running the test on Ubuntu 18.04 .. Removed reviewer Kudu Jenkins with the following votes: * Verified-1 by Kudu Jenkins (120) -- To view, visit http://gerrit.cloudera.org:8080/10812 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: deleteReviewer Gerrit-Change-Id: Ie4dc5d22bab86f6ab61f18f05e45a063bbf38eba Gerrit-Change-Number: 10812 Gerrit-PatchSet: 3 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley
[kudu-CR] [rebalance-test] fix running the test on Ubuntu 18.04
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10812 ) Change subject: [rebalance-test] fix running the test on Ubuntu 18.04 .. Patch Set 3: Verified+1 Unrelated flakes. -- To view, visit http://gerrit.cloudera.org:8080/10812 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie4dc5d22bab86f6ab61f18f05e45a063bbf38eba Gerrit-Change-Number: 10812 Gerrit-PatchSet: 3 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Mon, 25 Jun 2018 20:54:59 + Gerrit-HasComments: No
[kudu-CR] [rebalance-test] fix running the test on Ubuntu 18.04
Hello Will Berkeley, Tidy Bot, Kudu Jenkins, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10812 to look at the new patch set (#3). Change subject: [rebalance-test] fix running the test on Ubuntu 18.04 .. [rebalance-test] fix running the test on Ubuntu 18.04 Updated custom equality operators for TableBalanceInfo and ClusterBalanceInfo. With this update, the operator compares just the contents of corresponding sub-fields of the ServersByCountMap type (which is map), disregarding the order of elements for the same key. That's the exact behavior desired: the ordering of elements in the ServersByCountMap is not important in all scenarios of the rebalance-test. The motivation for this change is the fact that the test fails on Ubuntu 18.04 without this patch. Implicitly, the reference results relied on the order of elements in hash dictionary container used in the implementation of the Rebalancer::KsckResultsToClusterBalanceInfo() method. It seems the implementation of std::unordered_map in libstdc++ shipped with Ubuntu 18.04 has changed compared with older Linux distro versions. Change-Id: Ie4dc5d22bab86f6ab61f18f05e45a063bbf38eba --- M src/kudu/tools/rebalance-test.cc 1 file changed, 64 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/10812/3 -- To view, visit http://gerrit.cloudera.org:8080/10812 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie4dc5d22bab86f6ab61f18f05e45a063bbf38eba Gerrit-Change-Number: 10812 Gerrit-PatchSet: 3 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley
[kudu-CR] [rebalance-test] fix running the test on Ubuntu 18.04
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10812 ) Change subject: [rebalance-test] fix running the test on Ubuntu 18.04 .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/10812/1/src/kudu/tools/rebalance-test.cc File src/kudu/tools/rebalance-test.cc: http://gerrit.cloudera.org:8080/#/c/10812/1/src/kudu/tools/rebalance-test.cc@22 PS1, Line 22: #include > warning: #includes are not sorted properly [llvm-include-order] Done http://gerrit.cloudera.org:8080/#/c/10812/1/src/kudu/tools/rebalance-test.cc@40 PS1, Line 40: using std::numeric_limits; > warning: using decl 'numeric_limits' is unused [misc-unused-using-decls] Done http://gerrit.cloudera.org:8080/#/c/10812/1/src/kudu/tools/rebalance-test.cc@42 PS1, Line 42: using std::set; > warning: using decl 'set' is unused [misc-unused-using-decls] Done -- To view, visit http://gerrit.cloudera.org:8080/10812 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie4dc5d22bab86f6ab61f18f05e45a063bbf38eba Gerrit-Change-Number: 10812 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Mon, 25 Jun 2018 20:11:29 + Gerrit-HasComments: Yes
[kudu-CR] [rebalance-test] fix running the test on Ubuntu 18.04
Hello Will Berkeley, Tidy Bot, Kudu Jenkins, Adar Dembo, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10812 to look at the new patch set (#2). Change subject: [rebalance-test] fix running the test on Ubuntu 18.04 .. [rebalance-test] fix running the test on Ubuntu 18.04 Updated custom equality operators for TableBalanceInfo and ClusterBalanceInfo. With this update, the operator compares just the contents of corresponding sub-fields of the ServersByCountMap type (which is map), disregarding the order of elements for the same key. That's the exact behavior desired: the ordering of elements in the ServersByCountMap is not important in all scenarios of the rebalance-test. The motivation for this change is the fact that the test fails on Ubuntu 18.04 without this patch. Implicitly, the reference results relied on the order of elements in hash dictionary container used in the implementation of the Rebalancer::KsckResultsToClusterBalanceInfo() method. It seems the implementation of std::unordered_map in libstdc++ shipped with Ubuntu 18.04 has changed compared with older Linux distro versions. Change-Id: Ie4dc5d22bab86f6ab61f18f05e45a063bbf38eba --- M src/kudu/tools/rebalance-test.cc 1 file changed, 64 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/10812/2 -- To view, visit http://gerrit.cloudera.org:8080/10812 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ie4dc5d22bab86f6ab61f18f05e45a063bbf38eba Gerrit-Change-Number: 10812 Gerrit-PatchSet: 2 Gerrit-Owner: Alexey Serbin Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Will Berkeley
[kudu-CR] Kudu Backup/Restore Spark Jobs
Grant Henke has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10375 ) Change subject: Kudu Backup/Restore Spark Jobs .. Kudu Backup/Restore Spark Jobs Adds a rough base implementation of Kudu backup and restore Spark jobs. There are many todos indicating gaps and more testing and details to be be finished. However, these base jobs work and are in a functional state that can be committed and iterated on as we build up and improve our backup functionality. These jobs, as annotated, should be considered private, unstable, and experimental. The backup job can output one to many tables data to any spark compatible path in any spark compatible format, the defaults being HDFS and Parquet. Each table’s data is written in a subdirectory of the provided path. The subdirectory’s name is the url encoded table name. Additionally in each table’s directory a json metadata file is output with the metadata needed to recreate the table that was exported when restoring. The restore job can read the data and metadata generated and create “restore” tables with a matching schema and reload the data. The job arguments are a work in progress and will likely be enhanced and simplified as we find what is useful and what isn’t through performance and functional testing. More documentation will be generated when the jobs are ready for general use. Change-Id: If02183a2f833ffa0225eb7b0a35fc7531109e6f7 Reviewed-on: http://gerrit.cloudera.org:8080/10375 Tested-by: Kudu Jenkins Reviewed-by: Mike Percy --- M java/gradle/dependencies.gradle A java/kudu-backup/build.gradle A java/kudu-backup/pom.xml A java/kudu-backup/src/main/protobuf/backup.proto A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestoreOptions.scala A java/kudu-backup/src/main/scala/org/apache/kudu/backup/TableMetadata.scala A java/kudu-backup/src/test/resources/log4j.properties A java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala M java/kudu-client/src/main/java/org/apache/kudu/Type.java M java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestUtils.java M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/TestContext.scala M java/pom.xml M java/settings.gradle 18 files changed, 1,594 insertions(+), 7 deletions(-) Approvals: Kudu Jenkins: Verified Mike Percy: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/10375 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: If02183a2f833ffa0225eb7b0a35fc7531109e6f7 Gerrit-Change-Number: 10375 Gerrit-PatchSet: 21 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] Kudu Backup/Restore Spark Jobs
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/10375 ) Change subject: Kudu Backup/Restore Spark Jobs .. Patch Set 20: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/10375 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If02183a2f833ffa0225eb7b0a35fc7531109e6f7 Gerrit-Change-Number: 10375 Gerrit-PatchSet: 20 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 25 Jun 2018 19:31:48 + Gerrit-HasComments: No
[kudu-CR] [rebalance-test] fix running the test on Ubuntu 18.04
Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10812 Change subject: [rebalance-test] fix running the test on Ubuntu 18.04 .. [rebalance-test] fix running the test on Ubuntu 18.04 Updated custom equality operators for TableBalanceInfo and ClusterBalanceInfo. With this update, the operator compares just the contents of corresponding sub-fields of the ServersByCountMap type (which is map), disregarding the order of elements for the same key. That's the exact behavior desired: the ordering of elements in the ServersByCountMap is not important in all scenarios of the rebalance-test. The motivation for this change is the fact that the test fails on Ubuntu 18.04 without this patch. Implicitly, the reference results relied on the order of elements in hash dictionary container used in the implementation of the Rebalancer::KsckResultsToClusterBalanceInfo() method. It seems the implementation of std::unordered_map in libstdc++ shipped with Ubuntu 18.04 has changed compared with older Linux distro versions. Change-Id: Ie4dc5d22bab86f6ab61f18f05e45a063bbf38eba --- M src/kudu/tools/rebalance-test.cc 1 file changed, 64 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/12/10812/1 -- To view, visit http://gerrit.cloudera.org:8080/10812 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ie4dc5d22bab86f6ab61f18f05e45a063bbf38eba Gerrit-Change-Number: 10812 Gerrit-PatchSet: 1 Gerrit-Owner: Alexey Serbin
[kudu-CR] KUDU-2191: HMS inconsistent metadata fix tool
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/10303 ) Change subject: KUDU-2191: HMS inconsistent metadata fix tool .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/10303/4/src/kudu/tools/tool_action_hms.cc File src/kudu/tools/tool_action_hms.cc: http://gerrit.cloudera.org:8080/#/c/10303/4/src/kudu/tools/tool_action_hms.cc@502 PS4, Line 502: // Fail the fix process as encountered unfixable errors. > Hmm, I did a quick test in master-hms-itest to rename a table with the same Yep, I think it might. I tested the same thing, but without the HMS integration turned on. I'll fix it since it's almost certainly an issue with the notification log listener. -- To view, visit http://gerrit.cloudera.org:8080/10303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I63c694b5d9877cfbd218139f2e9ecf05fd69c997 Gerrit-Change-Number: 10303 Gerrit-PatchSet: 6 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Mon, 25 Jun 2018 18:53:42 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2191: HMS inconsistent metadata fix tool
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/10303 ) Change subject: KUDU-2191: HMS inconsistent metadata fix tool .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/10303/4/src/kudu/tools/tool_action_hms.cc File src/kudu/tools/tool_action_hms.cc: http://gerrit.cloudera.org:8080/#/c/10303/4/src/kudu/tools/tool_action_hms.cc@502 PS4, Line 502: // Fail the fix process as encountered unfixable errors. > Renaming a table to the same name returns an error, so does this indicate a Hmm, I did a quick test in master-hms-itest to rename a table with the same name. And it works fine without returning an error. You think this is unexpected? -- To view, visit http://gerrit.cloudera.org:8080/10303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I63c694b5d9877cfbd218139f2e9ecf05fd69c997 Gerrit-Change-Number: 10303 Gerrit-PatchSet: 6 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Mon, 25 Jun 2018 18:49:19 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2191: HMS inconsistent metadata fix tool
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/10303 ) Change subject: KUDU-2191: HMS inconsistent metadata fix tool .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/10303/4/src/kudu/tools/tool_action_hms.cc File src/kudu/tools/tool_action_hms.cc: http://gerrit.cloudera.org:8080/#/c/10303/4/src/kudu/tools/tool_action_hms.cc@502 PS4, Line 502: // Fail the fix process as encountered unfixable errors. > Done Renaming a table to the same name returns an error, so does this indicate a missing test? -- To view, visit http://gerrit.cloudera.org:8080/10303 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I63c694b5d9877cfbd218139f2e9ecf05fd69c997 Gerrit-Change-Number: 10303 Gerrit-PatchSet: 6 Gerrit-Owner: Hao Hao Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Mon, 25 Jun 2018 18:02:41 + Gerrit-HasComments: Yes
[kudu-CR] parse metrics log: update to the new format
David Ribeiro Alves has posted comments on this change. ( http://gerrit.cloudera.org:8080/10693 ) Change subject: parse_metrics_log: update to the new format .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/10693 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If11b7ecc93a3f64db3b7c1994f47308b3ec44029 Gerrit-Change-Number: 10693 Gerrit-PatchSet: 3 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 25 Jun 2018 18:00:03 + Gerrit-HasComments: No
[kudu-CR] consensus: separate storage from ConsensusMetadata
David Ribeiro Alves has posted comments on this change. ( http://gerrit.cloudera.org:8080/10649 ) Change subject: consensus: separate storage from ConsensusMetadata .. Patch Set 4: (5 comments) I guess andrew's point is that if we want to change where/how we store the consensus meta (.e.g if we want to move it to rocksdb or something), with this change we only need to do it in one place, versus having to change consensusmetadata too. http://gerrit.cloudera.org:8080/#/c/10649/4/src/kudu/consensus/consensus_meta-test.cc File src/kudu/consensus/consensus_meta-test.cc: http://gerrit.cloudera.org:8080/#/c/10649/4/src/kudu/consensus/consensus_meta-test.cc@322 PS4, Line 322: const string& peer_uuid = fs_manager_.uuid(); same as mike's comment above. this is a test, I think copying this string (and defending against fs_manager lifecycle bugs) is probably best http://gerrit.cloudera.org:8080/#/c/10649/4/src/kudu/consensus/consensus_meta_manager.h File src/kudu/consensus/consensus_meta_manager.h: http://gerrit.cloudera.org:8080/#/c/10649/4/src/kudu/consensus/consensus_meta_manager.h@56 PS4, Line 56: Status Init() const; hum, Init() is const? that's a bit weird... http://gerrit.cloudera.org:8080/#/c/10649/4/src/kudu/consensus/consensus_meta_manager.h@112 PS4, Line 112: LoadFromDisk do we need to have the "fromdisk" suffix? I thought the point of this change was that we could store the metadata elsewhere, e.g. rocksdb http://gerrit.cloudera.org:8080/#/c/10649/4/src/kudu/consensus/consensus_meta_manager.h@117 PS4, Line 117: FlushToDisk same http://gerrit.cloudera.org:8080/#/c/10649/4/src/kudu/consensus/consensus_meta_manager.cc File src/kudu/consensus/consensus_meta_manager.cc: http://gerrit.cloudera.org:8080/#/c/10649/4/src/kudu/consensus/consensus_meta_manager.cc@153 PS4, Line 153: const string& cmeta_path = fs_manager_->GetConsensusMetadataPath(tablet_id); : RETURN_NOT_OK_PREPEND(fs_manager_->env()->DeleteFile(cmeta_path), : Substitute("Unable to delete consensus metadata for tablet $0", tablet_id)); : lock_guard l(lock_); : cmeta_cache_.erase(tablet_id); // OK to delete uncached cmeta; ignore the return value. : return Status::OK(); hum, are you sure the change of ordering here doesn't matter? -- To view, visit http://gerrit.cloudera.org:8080/10649 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2f17123e281789976df1c436fe8116a4d4b48923 Gerrit-Change-Number: 10649 Gerrit-PatchSet: 4 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Mon, 25 Jun 2018 17:59:04 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2441 - [python] Enable configuration of mutation buffer
David Ribeiro Alves has posted comments on this change. ( http://gerrit.cloudera.org:8080/10674 ) Change subject: KUDU-2441 - [python] Enable configuration of mutation buffer .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/10674/4/python/kudu/client.pyx File python/kudu/client.pyx: http://gerrit.cloudera.org:8080/#/c/10674/4/python/kudu/client.pyx@541 PS4, Line 541: Watermark level as percentage of the mutation buffer size. what does this watermark do, i.e. what happens when we go ober the waterman? http://gerrit.cloudera.org:8080/#/c/10674/4/python/kudu/client.pyx@544 PS4, Line 544: spurious tab http://gerrit.cloudera.org:8080/#/c/10674/4/python/kudu/tests/test_client.py File python/kudu/tests/test_client.py: http://gerrit.cloudera.org:8080/#/c/10674/4/python/kudu/tests/test_client.py@259 PS4, Line 259: def test_session_mutation_buffer_settings(self): : self.client.new_session(flush_mode=kudu.FLUSH_AUTO_BACKGROUND, : mutation_buffer_sz= 10*1024*1024, : mutation_buffer_watermark=0.5, : mutation_buffer_flush_interval=2000, : mutation_buffer_max_num=3) : : session = self.client.new_session(flush_mode=kudu.FLUSH_AUTO_BACKGROUND) : session.set_mutation_buffer_space(10*1024*1024) : session.set_mutation_buffer_flush_watermark(0.5) : session.set_mutation_buffer_flush_interval(2000) : session.set_mutation_buffer_max_num(3) hum thsi is only testing the happy cases maybe throw in a error case or two? -- To view, visit http://gerrit.cloudera.org:8080/10674 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I52ac48e7dddc31e666a95ace4c7672da51d80b11 Gerrit-Change-Number: 10674 Gerrit-PatchSet: 4 Gerrit-Owner: Jordan Birdsell Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Mon, 25 Jun 2018 17:46:32 + Gerrit-HasComments: Yes
[kudu-CR] [examples] KUDU-2382: Add Kudu Basic Python Example
David Ribeiro Alves has posted comments on this change. ( http://gerrit.cloudera.org:8080/10794 ) Change subject: [examples] KUDU-2382: Add Kudu Basic Python Example .. Patch Set 4: (3 comments) http://gerrit.cloudera.org:8080/#/c/10794/4/examples/python/basic-python-example/README.md File examples/python/basic-python-example/README.md: http://gerrit.cloudera.org:8080/#/c/10794/4/examples/python/basic-python-example/README.md@24 PS4, Line 24: latest maybe mention somehow that this points to the lastest build which is normally debug? people wanting to test kudu out should probably use a release build http://gerrit.cloudera.org:8080/#/c/10794/4/examples/python/basic-python-example/README.md@44 PS4, Line 44: nit: extra space http://gerrit.cloudera.org:8080/#/c/10794/4/examples/python/basic-python-example/basic_example.py File examples/python/basic-python-example/basic_example.py: http://gerrit.cloudera.org:8080/#/c/10794/4/examples/python/basic-python-example/basic_example.py@18 PS4, Line 18: import kudu : from kudu.client import Partitioning : from datetime import datetime : import argparse aside: maybe we should start enforcing some import discipline? suggestion: It not at leas order these alphabetically and separate import from froms -- To view, visit http://gerrit.cloudera.org:8080/10794 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12016a7c9b1f4a1e20556297654c7abfa7b673cb Gerrit-Change-Number: 10794 Gerrit-PatchSet: 4 Gerrit-Owner: Jordan Birdsell Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Jordan Birdsell Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Mon, 25 Jun 2018 17:41:27 + Gerrit-HasComments: Yes
[kudu-CR] [python] : Add Pandas Support to Scanner
David Ribeiro Alves has posted comments on this change. ( http://gerrit.cloudera.org:8080/10809 ) Change subject: [python] : Add Pandas Support to Scanner .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/10809/4/build-support/jenkins/build-and-test.sh File build-support/jenkins/build-and-test.sh: http://gerrit.cloudera.org:8080/#/c/10809/4/build-support/jenkins/build-and-test.sh@454 PS4, Line 454: pip install pandas should we move our dependencies to the setup.py and have pandas as a test or optional dependency? http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx File python/kudu/client.pyx: http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx@2004 PS4, Line 2004: for col, dtype in types.items(): : df[col] = df[col].astype(dtype, copy=False) is this just a metadata change or does it actually change the data? -- To view, visit http://gerrit.cloudera.org:8080/10809 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I90fab5c0c42448bcc17ea22be37420c6ef2f4915 Gerrit-Change-Number: 10809 Gerrit-PatchSet: 4 Gerrit-Owner: Jordan Birdsell Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Mon, 25 Jun 2018 17:36:14 + Gerrit-HasComments: Yes
[kudu-CR] [python] : Add Pandas Support to Scanner
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/10809 ) Change subject: [python] : Add Pandas Support to Scanner .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/10809/4/build-support/jenkins/build-and-test.sh File build-support/jenkins/build-and-test.sh: http://gerrit.cloudera.org:8080/#/c/10809/4/build-support/jenkins/build-and-test.sh@454 PS4, Line 454: pip install pandas I think it would be more intuitive/maintainable to build this into the python code itself somehow. Perhaps into the setup.py like suggested here: https://stackoverflow.com/a/15422703 or perhaps a test-requirements.txt file? so we can run `pip install -r test-requirements.txt` Whatever approach is the most python standard I suppose. -- To view, visit http://gerrit.cloudera.org:8080/10809 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I90fab5c0c42448bcc17ea22be37420c6ef2f4915 Gerrit-Change-Number: 10809 Gerrit-PatchSet: 4 Gerrit-Owner: Jordan Birdsell Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Mon, 25 Jun 2018 17:33:51 + Gerrit-HasComments: Yes
[kudu-CR] dmsiterator: remove unused field
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/10801 ) Change subject: dmsiterator: remove unused field .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/10801 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I877685b59a9353dcc5b9a4b385e5c1c7749154f8 Gerrit-Change-Number: 10801 Gerrit-PatchSet: 1 Gerrit-Owner: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 25 Jun 2018 17:15:22 + Gerrit-HasComments: No
[kudu-CR] KUDU-1861: add range-partitions to loadgen tables
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10633 ) Change subject: KUDU-1861: add range-partitions to loadgen tables .. KUDU-1861: add range-partitions to loadgen tables This patch adds the ability to generate a range-partitioned table with the loadgen tool. The range partitioning schema is designed such that the non-random write workload will insert sequentially on the primary key, provided the number of threads is equal to the number of tablets. This sequential workload per tablet both reduces the number of compactions and avoids bloom filter lookups. This patch also renames --table_num_buckets to --table_num_hash_partitions, which can be combined with --table_num_range_partitions if desired. I tested this out on a singler-tserver cluster and verified via the metrics logs that the number of bloom lookups for a non-random workload where the number of insert threads and the number of tablets were equal stayed at zero. When the number of threads was not a factor of the number of buckets, the number of bloom lookups was non-zero. See TestNonRandomWorkloadLoadgen in kudu-tool-test.cc for more details. Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013 Reviewed-on: http://gerrit.cloudera.org:8080/10633 Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin --- M src/kudu/tools/kudu-tool-test.cc M src/kudu/tools/tool_action_perf.cc 2 files changed, 244 insertions(+), 16 deletions(-) Approvals: Kudu Jenkins: Verified Alexey Serbin: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/10633 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013 Gerrit-Change-Number: 10633 Gerrit-PatchSet: 9 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot
[kudu-CR] java: Move static helper functions out of BaseKuduTest
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/10729 ) Change subject: java: Move static helper functions out of BaseKuduTest .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/10729/3/java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java File java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java: http://gerrit.cloudera.org:8080/#/c/10729/3/java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java@20 PS3, Line 20: import static org.apache.kudu.util.ClusterTestUtil.getSchemaWithAllTypes; Are these needed? http://gerrit.cloudera.org:8080/#/c/10729/3/java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java@125 PS3, Line 125: protected static KuduTable createFourTabletsTableWithNineRows(String tableName) throws Can this very specific method be moved too? Maybe to some util in the kudu-client-tools test directory? http://gerrit.cloudera.org:8080/#/c/10729/3/java/kudu-client/src/test/java/org/apache/kudu/util/ClusterTestUtil.java File java/kudu-client/src/test/java/org/apache/kudu/util/ClusterTestUtil.java: http://gerrit.cloudera.org:8080/#/c/10729/3/java/kudu-client/src/test/java/org/apache/kudu/util/ClusterTestUtil.java@31 PS3, Line 31: public abstract class ClusterTestUtil { Perhaps ClientTestUtil since it's not really testing a cluster but testing the client functionality (single node or not). -- To view, visit http://gerrit.cloudera.org:8080/10729 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1466dcc0fc86897b504c154053c8f7e76f33c92c Gerrit-Change-Number: 10729 Gerrit-PatchSet: 3 Gerrit-Owner: Mike Percy Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Comment-Date: Mon, 25 Jun 2018 15:36:04 + Gerrit-HasComments: Yes
[kudu-CR] [python] : Add Pandas Support to Scanner
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10809 ) Change subject: [python] : Add Pandas Support to Scanner .. Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx File python/kudu/client.pyx: http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx@121 PS4, Line 121: are converted incorrectly. converted incorrectly by whom? http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx@1976 PS4, Line 1976: to be small as Pandas will load the entire contents into memory. nit: add comma before "as" http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx@1991 PS4, Line 1991: df = pd.DataFrame.from_records(self.read_all_tuples(), this will require _twice_ the total dataset size inn memory because the "all tuples" is instantiated. is it possible to read batch-by-batch and build up a pandas data frame by concatenating smaller ones, perhaps, so that you have a smaller fixed memory overhead? http://gerrit.cloudera.org:8080/#/c/10809/4/python/kudu/client.pyx@2001 PS4, Line 2001: df[column.name].isnull().any() seems strange to set the type dynamically based on the dataset contents. Is this typical for other Pandas integrations? -- To view, visit http://gerrit.cloudera.org:8080/10809 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I90fab5c0c42448bcc17ea22be37420c6ef2f4915 Gerrit-Change-Number: 10809 Gerrit-PatchSet: 4 Gerrit-Owner: Jordan Birdsell Gerrit-Reviewer: David Ribeiro Alves Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Mon, 25 Jun 2018 15:09:08 + Gerrit-HasComments: Yes
[kudu-CR] Kudu Backup/Restore Spark Jobs
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/10375 ) Change subject: Kudu Backup/Restore Spark Jobs .. Patch Set 19: (6 comments) http://gerrit.cloudera.org:8080/#/c/10375/19//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10375/19//COMMIT_MSG@22 PS19, Line 22: tables > Nit: table's Done http://gerrit.cloudera.org:8080/#/c/10375/19/java/kudu-backup/src/main/protobuf/backup.proto File java/kudu-backup/src/main/protobuf/backup.proto: http://gerrit.cloudera.org:8080/#/c/10375/19/java/kudu-backup/src/main/protobuf/backup.proto@56 PS19, Line 56: encoded human-readable > Aren't these contradictory? I'd expect an encoded value to be human unreada I mean't encoded in that it's been encoded to a human-readable string, but I can remove the word to avoid confusion. http://gerrit.cloudera.org:8080/#/c/10375/19/java/kudu-backup/src/main/protobuf/backup.proto@109 PS19, Line 109: // The number of replicas this table has. > Nit: technically, it's the number of replicas of each tablet in the table. I will update the comment. I guess num_replicas isn't really a great name for the field either then? It's taken directly from CreateTableOptions.setNumReplicas(...) and KuduTable.getNumReplicas(...). http://gerrit.cloudera.org:8080/#/c/10375/19/java/kudu-backup/src/main/protobuf/backup.proto@111 PS19, Line 111: // The metadata for the tables columns. > Nit: table's Done http://gerrit.cloudera.org:8080/#/c/10375/19/java/kudu-backup/src/main/protobuf/backup.proto@113 PS19, Line 113: tables > Nit: table's Done http://gerrit.cloudera.org:8080/#/c/10375/19/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala File java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala: http://gerrit.cloudera.org:8080/#/c/10375/19/java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala@37 PS19, Line 37: val DefaultFormat: String = "parquet" : val DefaultScanBatchSize: Int = 1024*1024*20 // 20 MiB : val DefaultScanRequestTimeout: Long = AsyncKuduClient.DEFAULT_OPERATION_TIMEOUT_MS // 30 seconds : val DefaultScanPrefetching: Boolean = false // TODO: Add a test per KUDU-1260 and enable by default? > I brought this up earlier (they were camelCase) and UpperCamelCase is corre Like mike said this matches Scala's standard. -- To view, visit http://gerrit.cloudera.org:8080/10375 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If02183a2f833ffa0225eb7b0a35fc7531109e6f7 Gerrit-Change-Number: 10375 Gerrit-PatchSet: 19 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 25 Jun 2018 14:30:51 + Gerrit-HasComments: Yes
[kudu-CR] Kudu Backup/Restore Spark Jobs
Hello Mike Percy, Kudu Jenkins, Adar Dembo, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10375 to look at the new patch set (#20). Change subject: Kudu Backup/Restore Spark Jobs .. Kudu Backup/Restore Spark Jobs Adds a rough base implementation of Kudu backup and restore Spark jobs. There are many todos indicating gaps and more testing and details to be be finished. However, these base jobs work and are in a functional state that can be committed and iterated on as we build up and improve our backup functionality. These jobs, as annotated, should be considered private, unstable, and experimental. The backup job can output one to many tables data to any spark compatible path in any spark compatible format, the defaults being HDFS and Parquet. Each table’s data is written in a subdirectory of the provided path. The subdirectory’s name is the url encoded table name. Additionally in each table’s directory a json metadata file is output with the metadata needed to recreate the table that was exported when restoring. The restore job can read the data and metadata generated and create “restore” tables with a matching schema and reload the data. The job arguments are a work in progress and will likely be enhanced and simplified as we find what is useful and what isn’t through performance and functional testing. More documentation will be generated when the jobs are ready for general use. Change-Id: If02183a2f833ffa0225eb7b0a35fc7531109e6f7 --- M java/gradle/dependencies.gradle A java/kudu-backup/build.gradle A java/kudu-backup/pom.xml A java/kudu-backup/src/main/protobuf/backup.proto A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackup.scala A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupOptions.scala A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduBackupRDD.scala A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestore.scala A java/kudu-backup/src/main/scala/org/apache/kudu/backup/KuduRestoreOptions.scala A java/kudu-backup/src/main/scala/org/apache/kudu/backup/TableMetadata.scala A java/kudu-backup/src/test/resources/log4j.properties A java/kudu-backup/src/test/scala/org/apache/kudu/backup/TestKuduBackup.scala M java/kudu-client/src/main/java/org/apache/kudu/Type.java M java/kudu-client/src/test/java/org/apache/kudu/client/BaseKuduTest.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestUtils.java M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/TestContext.scala M java/pom.xml M java/settings.gradle 18 files changed, 1,594 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/10375/20 -- To view, visit http://gerrit.cloudera.org:8080/10375 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If02183a2f833ffa0225eb7b0a35fc7531109e6f7 Gerrit-Change-Number: 10375 Gerrit-PatchSet: 20 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-1861: add range-partitions to loadgen tables
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/10633 ) Change subject: KUDU-1861: add range-partitions to loadgen tables .. Patch Set 8: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/10633 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If4f552a4c73dc82f3b7934082769522557ee5013 Gerrit-Change-Number: 10633 Gerrit-PatchSet: 8 Gerrit-Owner: Andrew Wong Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Comment-Date: Mon, 25 Jun 2018 13:53:04 + Gerrit-HasComments: No