[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API

2019-02-13 Thread Grant Henke (Code Review)
Grant Henke has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/12275 )

Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API
..

KUDU-2674: [java] Add a Java KuduPartitioner API

This patch is a Java port of  the c++ KuduPartitioner API
introduced in KUDU-1713 (https://gerrit.cloudera.org/#/c/5775/).

The API allows a client to determine which partition a
row falls into without actually writing that row. This would
allow Spark and other Java integrations to repartition and
pre-sort the data before writing to Kudu.

This patch also fixes a bug where calls to
AsyncKuduClient.locateTable could take much
longer than the specified timeout. The timeout
was not propogated to subsequent locateTablet
call and each locateTablet used the default
admin operation timeout as a result.

Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
Reviewed-on: http://gerrit.cloudera.org:8080/12275
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo 
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java
A java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java
A java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java
M src/kudu/client/client-test.cc
M src/kudu/client/client.h
6 files changed, 430 insertions(+), 8 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Adar Dembo: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/12275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
Gerrit-Change-Number: 12275
Gerrit-PatchSet: 8
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API

2019-02-13 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12275 )

Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API
..


Patch Set 7: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
Gerrit-Change-Number: 12275
Gerrit-PatchSet: 7
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Wed, 13 Feb 2019 16:52:58 +
Gerrit-HasComments: No


[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API

2019-02-13 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12275 )

Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API
..


Patch Set 6:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java:

http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@1825
PS6, Line 1825: propogte
> propagate
Done


http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@2163
PS6, Line 2163:* @param table the table
> Update this list.
Done


http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@2212
PS6, Line 2212: if (lookupType == LookupType.POINT || 
entry.getUpperBoundPartitionKey().length == 0) {
> Too long; wrap?
Done


http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java
File 
java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java:

http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java@178
PS6, Line 178: // Ensure the table information can't be found to force a 
timeout.
 : harness.killAllMasterServers();
> Hmm, surprised the partitioner use below doesn't hit the table locations ca
client.createTable doesn't populate the cache with all the partition 
information. I will add a test that proves the cache is being used.


http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java@182
PS6, Line 182: long now = System.currentTimeMillis();
> Nit: probably more useful as 'start'.
Done


http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java@192
PS6, Line 192: assertTrue(elapsed <= timeoutMs);
> Why would this hold? Seems potentially flaky. How about < timeoutMs * 2 or
I haven't seen flakiness. I can add a small amount of time but I don't want to 
double the timeout since that isn't representative of validating the timeout.



--
To view, visit http://gerrit.cloudera.org:8080/12275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
Gerrit-Change-Number: 12275
Gerrit-PatchSet: 6
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Wed, 13 Feb 2019 16:17:52 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API

2019-02-13 Thread Grant Henke (Code Review)
Hello Will Berkeley, Mike Percy, Kudu Jenkins, Adar Dembo, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12275

to look at the new patch set (#7).

Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API
..

KUDU-2674: [java] Add a Java KuduPartitioner API

This patch is a Java port of  the c++ KuduPartitioner API
introduced in KUDU-1713 (https://gerrit.cloudera.org/#/c/5775/).

The API allows a client to determine which partition a
row falls into without actually writing that row. This would
allow Spark and other Java integrations to repartition and
pre-sort the data before writing to Kudu.

This patch also fixes a bug where calls to
AsyncKuduClient.locateTable could take much
longer than the specified timeout. The timeout
was not propogated to subsequent locateTablet
call and each locateTablet used the default
admin operation timeout as a result.

Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java
A java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java
A java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java
M src/kudu/client/client-test.cc
M src/kudu/client/client.h
6 files changed, 430 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/12275/7
--
To view, visit http://gerrit.cloudera.org:8080/12275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
Gerrit-Change-Number: 12275
Gerrit-PatchSet: 7
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 


[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API

2019-02-12 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12275 )

Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API
..


Patch Set 6:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java:

http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@1825
PS6, Line 1825: propogte
propagate


http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@2163
PS6, Line 2163:* @param table the table
Update this list.


http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@2212
PS6, Line 2212: if (lookupType == LookupType.POINT || 
entry.getUpperBoundPartitionKey().length == 0) {
Too long; wrap?


http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java
File 
java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java:

http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java@178
PS6, Line 178: // Ensure the table information can't be found to force a 
timeout.
 : harness.killAllMasterServers();
Hmm, surprised the partitioner use below doesn't hit the table locations cache.


http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java@182
PS6, Line 182: long now = System.currentTimeMillis();
Nit: probably more useful as 'start'.



--
To view, visit http://gerrit.cloudera.org:8080/12275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
Gerrit-Change-Number: 12275
Gerrit-PatchSet: 6
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Will Berkeley 
Gerrit-Comment-Date: Wed, 13 Feb 2019 00:16:24 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API

2019-02-11 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12275 )

Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API
..


Patch Set 6: Code-Review+1

(1 comment)

looks good just a minor test comment

http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java
File 
java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java:

http://gerrit.cloudera.org:8080/#/c/12275/6/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java@192
PS6, Line 192: assertTrue(elapsed <= timeoutMs);
Why would this hold? Seems potentially flaky. How about < timeoutMs * 2 or 
something like that instead?



--
To view, visit http://gerrit.cloudera.org:8080/12275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
Gerrit-Change-Number: 12275
Gerrit-PatchSet: 6
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 11 Feb 2019 18:59:38 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API

2019-02-11 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12275 )

Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API
..


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/12275/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java
File java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java:

http://gerrit.cloudera.org:8080/#/c/12275/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@133
PS4, Line 133:  * Defaults to the {@link 
AsyncKuduClient#getDefaultAdminOperationTimeoutMs()}.
> seems to be missing the equivalent of SetBuildTimeout() in the C++ impl
Done


http://gerrit.cloudera.org:8080/#/c/12275/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@137
PS4, Line 137: li
> NON_COVERED_RANGE_INDEX
Done


http://gerrit.cloudera.org:8080/#/c/12275/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@146
PS4, Line 146: neTracker deadlineTracker = new DeadlineTrac
> This times the number of partitions could take arbitrarily long so it seems
Done


http://gerrit.cloudera.org:8080/#/c/12275/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@155
PS4, Line 155:
> NON_COVERED_RANGE_INDEX
Done



--
To view, visit http://gerrit.cloudera.org:8080/12275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
Gerrit-Change-Number: 12275
Gerrit-PatchSet: 6
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Mon, 11 Feb 2019 18:39:44 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API

2019-02-07 Thread Grant Henke (Code Review)
Hello Mike Percy, Kudu Jenkins, Adar Dembo, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12275

to look at the new patch set (#6).

Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API
..

KUDU-2674: [java] Add a Java KuduPartitioner API

This patch is a Java port of  the c++ KuduPartitioner API
introduced in KUDU-1713 (https://gerrit.cloudera.org/#/c/5775/).

The API allows a client to determine which partition a
row falls into without actually writing that row. This would
allow Spark and other Java integrations to repartition and
pre-sort the data before writing to Kudu.

This patch also fixes a bug where calls to
AsyncKuduClient.locateTable could take much
longer than the specified timeout. The timeout
was not propogated to subsequent locateTablet
call and each locateTablet used the default
admin operation timeout as a result.

Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java
A java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java
A java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java
M src/kudu/client/client-test.cc
M src/kudu/client/client.h
6 files changed, 408 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/12275/6
--
To view, visit http://gerrit.cloudera.org:8080/12275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
Gerrit-Change-Number: 12275
Gerrit-PatchSet: 6
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API

2019-02-07 Thread Grant Henke (Code Review)
Hello Mike Percy, Kudu Jenkins, Adar Dembo, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12275

to look at the new patch set (#5).

Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API
..

KUDU-2674: [java] Add a Java KuduPartitioner API

This patch is a Java port of  the c++ KuduPartitioner API
introduced in KUDU-1713 (https://gerrit.cloudera.org/#/c/5775/).

The API allows a client to determine which partition a
row falls into without actually writing that row. This would
allow Spark and other Java integrations to repartition and
pre-sort the data before writing to Kudu.

This patch also fixes a bug where calls to
AsyncKuduClient.locateTable could take much
longer than the specified timeout. The timeout
was not propogated to subsequent locateTablet
call and each locateTablet used the default
admin operation timeout as a result.

Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java
A java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java
A java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java
M src/kudu/client/client-test.cc
M src/kudu/client/client.h
6 files changed, 408 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/12275/5
--
To view, visit http://gerrit.cloudera.org:8080/12275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
Gerrit-Change-Number: 12275
Gerrit-PatchSet: 5
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API

2019-02-07 Thread Mike Percy (Code Review)
Mike Percy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12275 )

Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API
..


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/12275/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java
File java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java:

http://gerrit.cloudera.org:8080/#/c/12275/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@133
PS4, Line 133: public KuduPartitioner build() throws KuduException {
seems to be missing the equivalent of SetBuildTimeout() in the C++ impl


http://gerrit.cloudera.org:8080/#/c/12275/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@137
PS4, Line 137: -1
NON_COVERED_RANGE_INDEX


http://gerrit.cloudera.org:8080/#/c/12275/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@146
PS4, Line 146: AsyncKuduClient.DEFAULT_OPERATION_TIMEOUT_MS
This times the number of partitions could take arbitrarily long so it seems 
useful to be able to set a timeout on the while series of operations like I 
mentioned in another comment (since this is intended to be a general-purpose 
API)


http://gerrit.cloudera.org:8080/#/c/12275/4/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@155
PS4, Line 155: -1
NON_COVERED_RANGE_INDEX

This is a bit tricky and I think is worth commenting that if there are 
non-covered ranges, the start of the non-covered range will have the value of 
NON_COVERED_RANGE_INDEX while if the range is covered then the start key for 
the next partition will overwrite the NON_COVERED_RANGE_INDEX value written 
here.



--
To view, visit http://gerrit.cloudera.org:8080/12275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
Gerrit-Change-Number: 12275
Gerrit-PatchSet: 4
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 07 Feb 2019 10:07:22 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API

2019-02-06 Thread Grant Henke (Code Review)
Hello Mike Percy, Kudu Jenkins, Adar Dembo, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12275

to look at the new patch set (#3).

Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API
..

KUDU-2674: [java] Add a Java KuduPartitioner API

This patch is a Java port of  the c++ KuduPartitioner API
introduced in KUDU-1713 (https://gerrit.cloudera.org/#/c/5775/).

The API allows a client to determine which partition a
row falls into without actually writing that row. This would
allow Spark and other Java integrations to repartition and
pre-sort the data before writing to Kudu.

Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java
A java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java
A java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java
M src/kudu/client/client-test.cc
M src/kudu/client/client.h
6 files changed, 357 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/12275/3
--
To view, visit http://gerrit.cloudera.org:8080/12275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
Gerrit-Change-Number: 12275
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API

2019-02-06 Thread Grant Henke (Code Review)
Hello Mike Percy, Kudu Jenkins, Adar Dembo, Todd Lipcon,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12275

to look at the new patch set (#2).

Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API
..

KUDU-2674: [java] Add a Java KuduPartitioner API

This patch is a Java port of  the c++ KuduPartitioner API
introduced in KUDU-1713 (https://gerrit.cloudera.org/#/c/5775/).

The API allows a client to determine which partition a
row falls into without actually writing that row. This would
allow Spark and other Java integrations to repartition and
pre-sort the data before writing to Kudu.

Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java
A java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java
A java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java
M src/kudu/client/client-test.cc
M src/kudu/client/client.h
6 files changed, 363 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/12275/2
--
To view, visit http://gerrit.cloudera.org:8080/12275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
Gerrit-Change-Number: 12275
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 


[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API

2019-02-06 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12275 )

Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API
..


Patch Set 1:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java:

http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@2167
PS1, Line 2167: long deadline) {
> Would the plumbing be a little cleaner if this were a DeadLineTracker rathe
I think it makes calling these methods a bit more of a pain plumbing the long 
deadline from the public methods as far down as possible is easier.


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java
File java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java:

http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@17
PS1, Line 17: it will not reflect any metadata changes to the table that have 
occurred
> duplicated
Done


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@32
PS1, Line 32: they
> Nit: "the user" or "the client"
Done


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@37
PS1, Line 37: while (true) {
> I don't think this should be done here; we don't typically do blocking oper
I can change over to a builder pattern.


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@47
PS1, Line 47: catch (KuduException ex) {
: throw new IllegalStateException(ex);
:   }
> Let's do this in a different method where we could throw the original KuduE
Done


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@76
PS1, Line 76:* @return The resulting partition index, or -1 if the row 
falls into a
:* non-covered range. The result will be less than 
numPartitions().
> Wouldn't it be more Java-rific to throw an exception if the row falls into
Done


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@87
PS1, Line 87: if (entry == null) {
:   throw new IllegalArgumentException();
: }
> Given the presence of the sentinel, how is this case even possible? If it's
Done


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java
File 
java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java:

http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java@51
PS1, Line 51:   public void testPartitioner() throws Exception {
> I think it'd also be good to test the partitioner in a situation where it's
I added a test to check for expected NonCoveredRangeExceptions. Beyond that a 
partitioner that fails cannot be built.


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java@95
PS1, Line 95:
: // We don't expect a completely even division of rows into 
partitions, but
: // we should be within 10% of that.
> Couldn't we just calculate the exact number of rows to fall into each parti
I primarily wrote the test this way to match the c++ tests and verify the Java 
implementation behaves the same way.

We could in hindsight with knowledge of the hash function used, but I didn't 
want this test to do that because "knowing" where the partition lands is 
basically what the partitioner code does. My test would become a 
re-implementation of the partitioner in a way.



--
To view, visit http://gerrit.cloudera.org:8080/12275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
Gerrit-Change-Number: 12275
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 06 Feb 2019 19:40:40 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API

2019-01-27 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12275 )

Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API
..


Patch Set 1:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
File java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java:

http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java@2167
PS1, Line 2167: long deadline) {
Would the plumbing be a little cleaner if this were a DeadLineTracker rather 
than a raw deadline value?


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java
File java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java:

PS1:
License header.


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@17
PS1, Line 17: it will not reflect any metadata changes to the table that have 
occurred
duplicated


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@32
PS1, Line 32: they
Nit: "the user" or "the client"


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@37
PS1, Line 37: while (true) {
I don't think this should be done here; we don't typically do blocking 
operations in constructors.


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@47
PS1, Line 47: catch (KuduException ex) {
: throw new IllegalStateException(ex);
:   }
Let's do this in a different method where we could throw the original 
KuduException.


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@76
PS1, Line 76:* @return The resulting partition index, or -1 if the row 
falls into a
:* non-covered range. The result will be less than 
numPartitions().
Wouldn't it be more Java-rific to throw an exception if the row falls into a 
non-covered range?


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java@87
PS1, Line 87: if (entry == null) {
:   throw new IllegalArgumentException();
: }
Given the presence of the sentinel, how is this case even possible? If it's 
not, I'd just combine the whole thing:

  return partitionByStartKey.floorEntry(partitionKey).getValue();


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java
File 
java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java:

http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java@51
PS1, Line 51:   public void testPartitioner() throws Exception {
I think it'd also be good to test the partitioner in a situation where it's not 
possible to perform the partitioning (i.e. masters down or something).


http://gerrit.cloudera.org:8080/#/c/12275/1/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java@95
PS1, Line 95:
: // We don't expect a completely even division of rows into 
partitions, but
: // we should be within 10% of that.
Couldn't we just calculate the exact number of rows to fall into each 
partition? There's no randomness here so the assertion fuzziness seems 
unwarranted.



--
To view, visit http://gerrit.cloudera.org:8080/12275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
Gerrit-Change-Number: 12275
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Sun, 27 Jan 2019 19:22:30 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-2674: [java] Add a Java KuduPartitioner API

2019-01-25 Thread Grant Henke (Code Review)
Grant Henke has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12275


Change subject: KUDU-2674: [java] Add a Java KuduPartitioner API
..

KUDU-2674: [java] Add a Java KuduPartitioner API

This patch is a Java port of  the c++ KuduPartitioner API
introduced in KUDU-1713 (https://gerrit.cloudera.org/#/c/5775/).

The API allows a client to determine which partition a
row falls into without actually writing that row. This would
allow Spark and other Java integrations to repartition and
pre-sort the data before writing to Kudu.

Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
---
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduClient.java
M java/kudu-client/src/main/java/org/apache/kudu/client/AsyncKuduSession.java
A java/kudu-client/src/main/java/org/apache/kudu/client/KuduPartitioner.java
A java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduPartitioner.java
M src/kudu/client/client-test.cc
M src/kudu/client/client.h
6 files changed, 279 insertions(+), 7 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/75/12275/1
--
To view, visit http://gerrit.cloudera.org:8080/12275
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7a2d47aab5318c0b6d29a8cb2b073c05bc1b6478
Gerrit-Change-Number: 12275
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke