[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API
Grant Henke has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12275 ) Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API .. KUDU-2674: [java] Add a Java KuduPartitioner API This patch is a Java port of the c++ KuduPartitioner API introduced in KUDU-1713 (https://gerrit.cloudera.org/#/c/5775/). The API allows a client to determine which partition a row falls into without actually writing that row. This would allow Spark and other Java integrations to repartition and pre-sort the data before writing to Kudu. This patch also fixes a bug where calls to AsyncKuduClient.locateTable could take much longer than the specified timeout. The timeout was not propogated to subsequent locateTablet call and each locateTablet used the default admin operation timeout as a result. Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478 Reviewed-on: http://gerrit.cloudera.org:8080/12275 Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo --- M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java A java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java A java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java M src/kudu/client/client-test.cc M src/kudu/client/client.h 6 files changed, 430 insertions(+), 8 deletions(-) Approvals: Kudu Jenkins: Verified Adar Dembo: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/12275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478 Gerrit-Change-Number: 12275 Gerrit-PatchSet: 8 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12275 ) Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API .. Patch Set 7: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478 Gerrit-Change-Number: 12275 Gerrit-PatchSet: 7 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Wed, 13 Feb 2019 16:52:58 + Gerrit-HasComments: No
[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12275 ) Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API .. Patch Set 6: (6 comments) http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java: http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@1825 PS6, Line 1825: propogte > propagate Done http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@2163 PS6, Line 2163:* @param table the table > Update this list. Done http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@2212 PS6, Line 2212: if (lookupType == LookupType.POINT || entry.getUpperBoundPartitionKey().length == 0) { > Too long; wrap? Done http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java: http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java@178 PS6, Line 178: // Ensure the table information can't be found to force a timeout. : harness.killAllMasterServers(); > Hmm, surprised the partitioner use below doesn't hit the table locations ca client.createTable doesn't populate the cache with all the partition information. I will add a test that proves the cache is being used. http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java@182 PS6, Line 182: long now = System.currentTimeMillis(); > Nit: probably more useful as 'start'. Done http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java@192 PS6, Line 192: assertTrue(elapsed <= timeoutMs); > Why would this hold? Seems potentially flaky. How about < timeoutMs * 2 or I haven't seen flakiness. I can add a small amount of time but I don't want to double the timeout since that isn't representative of validating the timeout. -- To view, visit http://gerrit.cloudera.org:8080/12275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478 Gerrit-Change-Number: 12275 Gerrit-PatchSet: 6 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Wed, 13 Feb 2019 16:17:52 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API
Hello Will Berkeley, Mike Percy, Kudu Jenkins, Adar Dembo, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12275 to look at the new patch set (#7). Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API .. KUDU-2674: [java] Add a Java KuduPartitioner API This patch is a Java port of the c++ KuduPartitioner API introduced in KUDU-1713 (https://gerrit.cloudera.org/#/c/5775/). The API allows a client to determine which partition a row falls into without actually writing that row. This would allow Spark and other Java integrations to repartition and pre-sort the data before writing to Kudu. This patch also fixes a bug where calls to AsyncKuduClient.locateTable could take much longer than the specified timeout. The timeout was not propogated to subsequent locateTablet call and each locateTablet used the default admin operation timeout as a result. Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478 --- M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java A java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java A java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java M src/kudu/client/client-test.cc M src/kudu/client/client.h 6 files changed, 430 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/12275/7 -- To view, visit http://gerrit.cloudera.org:8080/12275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478 Gerrit-Change-Number: 12275 Gerrit-PatchSet: 7 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley
[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12275 ) Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API .. Patch Set 6: (5 comments) http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java: http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@1825 PS6, Line 1825: propogte propagate http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@2163 PS6, Line 2163:* @param table the table Update this list. http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@2212 PS6, Line 2212: if (lookupType == LookupType.POINT || entry.getUpperBoundPartitionKey().length == 0) { Too long; wrap? http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java: http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java@178 PS6, Line 178: // Ensure the table information can't be found to force a timeout. : harness.killAllMasterServers(); Hmm, surprised the partitioner use below doesn't hit the table locations cache. http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java@182 PS6, Line 182: long now = System.currentTimeMillis(); Nit: probably more useful as 'start'. -- To view, visit http://gerrit.cloudera.org:8080/12275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478 Gerrit-Change-Number: 12275 Gerrit-PatchSet: 6 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Wed, 13 Feb 2019 00:16:24 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/12275 ) Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API .. Patch Set 6: Code-Review+1 (1 comment) looks good just a minor test comment http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java: http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java@192 PS6, Line 192: assertTrue(elapsed <= timeoutMs); Why would this hold? Seems potentially flaky. How about < timeoutMs * 2 or something like that instead? -- To view, visit http://gerrit.cloudera.org:8080/12275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478 Gerrit-Change-Number: 12275 Gerrit-PatchSet: 6 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 11 Feb 2019 18:59:38 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12275 ) Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API .. Patch Set 6: (4 comments) http://gerrit.cloudera.org:8080/#/c/12275/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java File java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java: http://gerrit.cloudera.org:8080/#/c/12275/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@133 PS4, Line 133: * Defaults to the {@link AsyncKuduClient#getDefaultAdminOperationTimeoutMs()}. > seems to be missing the equivalent of SetBuildTimeout() in the C++ impl Done http://gerrit.cloudera.org:8080/#/c/12275/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@137 PS4, Line 137: li > NON_COVERED_RANGE_INDEX Done http://gerrit.cloudera.org:8080/#/c/12275/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@146 PS4, Line 146: neTracker deadlineTracker = new DeadlineTrac > This times the number of partitions could take arbitrarily long so it seems Done http://gerrit.cloudera.org:8080/#/c/12275/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@155 PS4, Line 155: > NON_COVERED_RANGE_INDEX Done -- To view, visit http://gerrit.cloudera.org:8080/12275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478 Gerrit-Change-Number: 12275 Gerrit-PatchSet: 6 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Mon, 11 Feb 2019 18:39:44 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API
Hello Mike Percy, Kudu Jenkins, Adar Dembo, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12275 to look at the new patch set (#6). Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API .. KUDU-2674: [java] Add a Java KuduPartitioner API This patch is a Java port of the c++ KuduPartitioner API introduced in KUDU-1713 (https://gerrit.cloudera.org/#/c/5775/). The API allows a client to determine which partition a row falls into without actually writing that row. This would allow Spark and other Java integrations to repartition and pre-sort the data before writing to Kudu. This patch also fixes a bug where calls to AsyncKuduClient.locateTable could take much longer than the specified timeout. The timeout was not propogated to subsequent locateTablet call and each locateTablet used the default admin operation timeout as a result. Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478 --- M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java A java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java A java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java M src/kudu/client/client-test.cc M src/kudu/client/client.h 6 files changed, 408 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/12275/6 -- To view, visit http://gerrit.cloudera.org:8080/12275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478 Gerrit-Change-Number: 12275 Gerrit-PatchSet: 6 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API
Hello Mike Percy, Kudu Jenkins, Adar Dembo, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12275 to look at the new patch set (#5). Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API .. KUDU-2674: [java] Add a Java KuduPartitioner API This patch is a Java port of the c++ KuduPartitioner API introduced in KUDU-1713 (https://gerrit.cloudera.org/#/c/5775/). The API allows a client to determine which partition a row falls into without actually writing that row. This would allow Spark and other Java integrations to repartition and pre-sort the data before writing to Kudu. This patch also fixes a bug where calls to AsyncKuduClient.locateTable could take much longer than the specified timeout. The timeout was not propogated to subsequent locateTablet call and each locateTablet used the default admin operation timeout as a result. Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478 --- M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java A java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java A java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java M src/kudu/client/client-test.cc M src/kudu/client/client.h 6 files changed, 408 insertions(+), 8 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/12275/5 -- To view, visit http://gerrit.cloudera.org:8080/12275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478 Gerrit-Change-Number: 12275 Gerrit-PatchSet: 5 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/12275 ) Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API .. Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/12275/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java File java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java: http://gerrit.cloudera.org:8080/#/c/12275/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@133 PS4, Line 133: public KuduPartitioner build() throws KuduException { seems to be missing the equivalent of SetBuildTimeout() in the C++ impl http://gerrit.cloudera.org:8080/#/c/12275/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@137 PS4, Line 137: -1 NON_COVERED_RANGE_INDEX http://gerrit.cloudera.org:8080/#/c/12275/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@146 PS4, Line 146: AsyncKuduClient.DEFAULT_OPERATION_TIMEOUT_MS This times the number of partitions could take arbitrarily long so it seems useful to be able to set a timeout on the while series of operations like I mentioned in another comment (since this is intended to be a general-purpose API) http://gerrit.cloudera.org:8080/#/c/12275/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@155 PS4, Line 155: -1 NON_COVERED_RANGE_INDEX This is a bit tricky and I think is worth commenting that if there are non-covered ranges, the start of the non-covered range will have the value of NON_COVERED_RANGE_INDEX while if the range is covered then the start key for the next partition will overwrite the NON_COVERED_RANGE_INDEX value written here. -- To view, visit http://gerrit.cloudera.org:8080/12275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478 Gerrit-Change-Number: 12275 Gerrit-PatchSet: 4 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 07 Feb 2019 10:07:22 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API
Hello Mike Percy, Kudu Jenkins, Adar Dembo, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12275 to look at the new patch set (#3). Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API .. KUDU-2674: [java] Add a Java KuduPartitioner API This patch is a Java port of the c++ KuduPartitioner API introduced in KUDU-1713 (https://gerrit.cloudera.org/#/c/5775/). The API allows a client to determine which partition a row falls into without actually writing that row. This would allow Spark and other Java integrations to repartition and pre-sort the data before writing to Kudu. Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478 --- M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java A java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java A java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java M src/kudu/client/client-test.cc M src/kudu/client/client.h 6 files changed, 357 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/12275/3 -- To view, visit http://gerrit.cloudera.org:8080/12275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478 Gerrit-Change-Number: 12275 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API
Hello Mike Percy, Kudu Jenkins, Adar Dembo, Todd Lipcon, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12275 to look at the new patch set (#2). Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API .. KUDU-2674: [java] Add a Java KuduPartitioner API This patch is a Java port of the c++ KuduPartitioner API introduced in KUDU-1713 (https://gerrit.cloudera.org/#/c/5775/). The API allows a client to determine which partition a row falls into without actually writing that row. This would allow Spark and other Java integrations to repartition and pre-sort the data before writing to Kudu. Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478 --- M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java A java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java A java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java M src/kudu/client/client-test.cc M src/kudu/client/client.h 6 files changed, 363 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/12275/2 -- To view, visit http://gerrit.cloudera.org:8080/12275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478 Gerrit-Change-Number: 12275 Gerrit-PatchSet: 2 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon
[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12275 ) Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API .. Patch Set 1: (9 comments) http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java: http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@2167 PS1, Line 2167: long deadline) { > Would the plumbing be a little cleaner if this were a DeadLineTracker rathe I think it makes calling these methods a bit more of a pain plumbing the long deadline from the public methods as far down as possible is easier. http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java File java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java: http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@17 PS1, Line 17: it will not reflect any metadata changes to the table that have occurred > duplicated Done http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@32 PS1, Line 32: they > Nit: "the user" or "the client" Done http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@37 PS1, Line 37: while (true) { > I don't think this should be done here; we don't typically do blocking oper I can change over to a builder pattern. http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@47 PS1, Line 47: catch (KuduException ex) { : throw new IllegalStateException(ex); : } > Let's do this in a different method where we could throw the original KuduE Done http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@76 PS1, Line 76:* @return The resulting partition index, or -1 if the row falls into a :* non-covered range. The result will be less than numPartitions(). > Wouldn't it be more Java-rific to throw an exception if the row falls into Done http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@87 PS1, Line 87: if (entry == null) { : throw new IllegalArgumentException(); : } > Given the presence of the sentinel, how is this case even possible? If it's Done http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java: http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java@51 PS1, Line 51: public void testPartitioner() throws Exception { > I think it'd also be good to test the partitioner in a situation where it's I added a test to check for expected NonCoveredRangeExceptions. Beyond that a partitioner that fails cannot be built. http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java@95 PS1, Line 95: : // We don't expect a completely even division of rows into partitions, but : // we should be within 10% of that. > Couldn't we just calculate the exact number of rows to fall into each parti I primarily wrote the test this way to match the c++ tests and verify the Java implementation behaves the same way. We could in hindsight with knowledge of the hash function used, but I didn't want this test to do that because "knowing" where the partition lands is basically what the partitioner code does. My test would become a re-implementation of the partitioner in a way. -- To view, visit http://gerrit.cloudera.org:8080/12275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478 Gerrit-Change-Number: 12275 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 06 Feb 2019 19:40:40 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12275 ) Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API .. Patch Set 1: (10 comments) http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java: http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@2167 PS1, Line 2167: long deadline) { Would the plumbing be a little cleaner if this were a DeadLineTracker rather than a raw deadline value? http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java File java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java: PS1: License header. http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@17 PS1, Line 17: it will not reflect any metadata changes to the table that have occurred duplicated http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@32 PS1, Line 32: they Nit: "the user" or "the client" http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@37 PS1, Line 37: while (true) { I don't think this should be done here; we don't typically do blocking operations in constructors. http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@47 PS1, Line 47: catch (KuduException ex) { : throw new IllegalStateException(ex); : } Let's do this in a different method where we could throw the original KuduException. http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@76 PS1, Line 76:* @return The resulting partition index, or -1 if the row falls into a :* non-covered range. The result will be less than numPartitions(). Wouldn't it be more Java-rific to throw an exception if the row falls into a non-covered range? http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@87 PS1, Line 87: if (entry == null) { : throw new IllegalArgumentException(); : } Given the presence of the sentinel, how is this case even possible? If it's not, I'd just combine the whole thing: return partitionByStartKey.floorEntry(partitionKey).getValue(); http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java: http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java@51 PS1, Line 51: public void testPartitioner() throws Exception { I think it'd also be good to test the partitioner in a situation where it's not possible to perform the partitioning (i.e. masters down or something). http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java@95 PS1, Line 95: : // We don't expect a completely even division of rows into partitions, but : // we should be within 10% of that. Couldn't we just calculate the exact number of rows to fall into each partition? There's no randomness here so the assertion fuzziness seems unwarranted. -- To view, visit http://gerrit.cloudera.org:8080/12275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478 Gerrit-Change-Number: 12275 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Sun, 27 Jan 2019 19:22:30 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API
Grant Henke has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12275 Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API .. KUDU-2674: [java] Add a Java KuduPartitioner API This patch is a Java port of the c++ KuduPartitioner API introduced in KUDU-1713 (https://gerrit.cloudera.org/#/c/5775/). The API allows a client to determine which partition a row falls into without actually writing that row. This would allow Spark and other Java integrations to repartition and pre-sort the data before writing to Kudu. Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478 --- M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java A java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java A java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java M src/kudu/client/client-test.cc M src/kudu/client/client.h 6 files changed, 279 insertions(+), 7 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/12275/1 -- To view, visit http://gerrit.cloudera.org:8080/12275 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478 Gerrit-Change-Number: 12275 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke