[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift

2020-02-11 Thread Grant Henke (Code Review)
Grant Henke has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/15176 )

Change subject: KUDU-3049: [spark] Automatic handling of schema drift
..

KUDU-3049: [spark] Automatic handling of schema drift

This patch adds a new write option `kudu.handleSchemaDrift`.
If set to true, when new fields are encountered the Kudu table
will be altered to include new columns for those fields.

Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4
Reviewed-on: http://gerrit.cloudera.org:8080/15176
Reviewed-by: Adar Dembo 
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin 
---
M java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
M 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala
M 
java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
6 files changed, 204 insertions(+), 2 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Kudu Jenkins: Verified
  Alexey Serbin: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/15176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4
Gerrit-Change-Number: 15176
Gerrit-PatchSet: 7
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift

2020-02-11 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15176 )

Change subject: KUDU-3049: [spark] Automatic handling of schema drift
..


Patch Set 6: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15176/5/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
File 
java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala:

http://gerrit.cloudera.org:8080/#/c/15176/5/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala@576
PS5, Line 576: // Add a column not in the table schema by duplicating the 
val column.
> I added a step to the test to validate the `handleSchemaDrift = false` hand
That's perfect, thank you!



--
To view, visit http://gerrit.cloudera.org:8080/15176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4
Gerrit-Change-Number: 15176
Gerrit-PatchSet: 6
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 11 Feb 2020 23:18:09 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift

2020-02-11 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15176 )

Change subject: KUDU-3049: [spark] Automatic handling of schema drift
..


Patch Set 6: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/15176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4
Gerrit-Change-Number: 15176
Gerrit-PatchSet: 6
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 11 Feb 2020 22:55:02 +
Gerrit-HasComments: No


[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift

2020-02-11 Thread Grant Henke (Code Review)
Hello Alexey Serbin, Kudu Jenkins, Andrew Wong, Adar Dembo, Hao Hao, Bankim 
Bhavsar,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15176

to look at the new patch set (#6).

Change subject: KUDU-3049: [spark] Automatic handling of schema drift
..

KUDU-3049: [spark] Automatic handling of schema drift

This patch adds a new write option `kudu.handleSchemaDrift`.
If set to true, when new fields are encountered the Kudu table
will be altered to include new columns for those fields.

Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4
---
M java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
M 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala
M 
java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
6 files changed, 204 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/15176/6
--
To view, visit http://gerrit.cloudera.org:8080/15176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4
Gerrit-Change-Number: 15176
Gerrit-PatchSet: 6
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift

2020-02-11 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15176 )

Change subject: KUDU-3049: [spark] Automatic handling of schema drift
..


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/15176/5/java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java:

http://gerrit.cloudera.org:8080/#/c/15176/5/java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java@162
PS5, Line 162: NonRecoverableException thrown = 
Assert.assertThrows(NonRecoverableException.class, new ThrowingRunnable() {
> Too long here and L554.
Done


http://gerrit.cloudera.org:8080/#/c/15176/5/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

http://gerrit.cloudera.org:8080/#/c/15176/5/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@357
PS5, Line 357:   log.info(s"column already exists in table 
'$tableName' while handling schema drift")
> Would be nice to add a test to cover this. Might be something we can't guar
I have a hard time invoking this within a test scenario. It's also difficult to 
validate it occurred other than parsing logs. It's especially hard because we 
call syncClient.openTable right before we alter the table.

If it helps I added the call to syncClient.alterTable twice here and validated 
the correct logging behavior.


http://gerrit.cloudera.org:8080/#/c/15176/5/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
File 
java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala:

http://gerrit.cloudera.org:8080/#/c/15176/5/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala@572
PS5, Line 572: assertEquals(3, afterDf.schema.fields.length)
> paranoid nit: I didn't notice it in the first review round, but there isn't
Done


http://gerrit.cloudera.org:8080/#/c/15176/5/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala@576
PS5, Line 576:
> Do you think it's worth adding a couple more scenarios:
I added a step to the test to validate the `handleSchemaDrift = false` handling.

For the wrong type handling, that is a behavior/test in its own right 
regardless of drift. I added a test case for that.



--
To view, visit http://gerrit.cloudera.org:8080/15176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4
Gerrit-Change-Number: 15176
Gerrit-PatchSet: 5
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 11 Feb 2020 22:45:11 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift

2020-02-11 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15176 )

Change subject: KUDU-3049: [spark] Automatic handling of schema drift
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15176/5/java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java:

http://gerrit.cloudera.org:8080/#/c/15176/5/java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java@162
PS5, Line 162: NonRecoverableException thrown = 
Assert.assertThrows(NonRecoverableException.class, new ThrowingRunnable() {
Too long here and L554.


http://gerrit.cloudera.org:8080/#/c/15176/5/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

http://gerrit.cloudera.org:8080/#/c/15176/5/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@357
PS5, Line 357:   log.info(s"column already exists in table 
'$tableName' while handling schema drift")
Would be nice to add a test to cover this. Might be something we can't 
guarantee, but would be flaky if we didn't have this.



--
To view, visit http://gerrit.cloudera.org:8080/15176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4
Gerrit-Change-Number: 15176
Gerrit-PatchSet: 5
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 11 Feb 2020 21:37:21 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift

2020-02-11 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15176 )

Change subject: KUDU-3049: [spark] Automatic handling of schema drift
..


Patch Set 5: Code-Review+1

(2 comments)

Looks this good to me, I just noticed that a few cases are not covered by the 
tests in Scala sqlContext.  Not sure whether it's critical, but anyways wanted 
to point at that.

http://gerrit.cloudera.org:8080/#/c/15176/5/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
File 
java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala:

http://gerrit.cloudera.org:8080/#/c/15176/5/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala@572
PS5, Line 572: assertEquals(3, afterDf.schema.fields.length)
paranoid nit: I didn't notice it in the first review round, but there isn't a 
precondition on the number of the columns in the table before attempting to 
insert the row with the new column.  Of course, it's highly unlikely that the 
test table's structure ever changes to add 'val2' column, but to make things 
consistent in such unlikely case, maybe consider adding
assertEquals(2, df.schema.fields.length) before line 569?


http://gerrit.cloudera.org:8080/#/c/15176/5/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala@576
PS5, Line 576:
Do you think it's worth adding a couple more scenarios:

  * try to have the scenario similar to insert a row with not-yet-existing 
column where handleSchemaDrift is not set (i.e. testing the backward 
compatibility of the schema drift change)?

  * a scenario when one actor adds a new column with one type, while another 
actor (i.e. different sqlContext) tries to insert a row with new column of the 
same name, but of different type right after that?



--
To view, visit http://gerrit.cloudera.org:8080/15176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4
Gerrit-Change-Number: 15176
Gerrit-PatchSet: 5
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 11 Feb 2020 21:04:30 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift

2020-02-11 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15176 )

Change subject: KUDU-3049: [spark] Automatic handling of schema drift
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15176/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

http://gerrit.cloudera.org:8080/#/c/15176/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@357
PS3, Line 357: e.getMessage.contains("column already exists")
> Relying on the error message making this brittle.  Is it possible to use st
Done



--
To view, visit http://gerrit.cloudera.org:8080/15176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4
Gerrit-Change-Number: 15176
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 11 Feb 2020 20:21:12 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift

2020-02-11 Thread Grant Henke (Code Review)
Hello Alexey Serbin, Kudu Jenkins, Andrew Wong, Adar Dembo, Hao Hao, Bankim 
Bhavsar,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15176

to look at the new patch set (#5).

Change subject: KUDU-3049: [spark] Automatic handling of schema drift
..

KUDU-3049: [spark] Automatic handling of schema drift

This patch adds a new write option `kudu.handleSchemaDrift`.
If set to true, when new fields are encountered the Kudu table
will be altered to include new columns for those fields.

Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4
---
M java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
M 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala
M 
java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
6 files changed, 160 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/15176/5
--
To view, visit http://gerrit.cloudera.org:8080/15176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4
Gerrit-Change-Number: 15176
Gerrit-PatchSet: 5
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift

2020-02-11 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15176 )

Change subject: KUDU-3049: [spark] Automatic handling of schema drift
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15176/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

http://gerrit.cloudera.org:8080/#/c/15176/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@357
PS3, Line 357: e.getMessage.contains("column already exists")
Relying on the error message making this brittle.  Is it possible to use status 
instead, something like:

  Status exStatus = e.getStatus();
  if (exStatus.isAlreadyPresent()) {
...
  }



--
To view, visit http://gerrit.cloudera.org:8080/15176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4
Gerrit-Change-Number: 15176
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 11 Feb 2020 19:53:31 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift

2020-02-11 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15176 )

Change subject: KUDU-3049: [spark] Automatic handling of schema drift
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15176/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

http://gerrit.cloudera.org:8080/#/c/15176/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@350
PS3, Line 350: syncClient.alterTable(tableName, alter)
> Is this a duplicate of the statement at line 352?
Done



--
To view, visit http://gerrit.cloudera.org:8080/15176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4
Gerrit-Change-Number: 15176
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 11 Feb 2020 19:53:31 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift

2020-02-11 Thread Grant Henke (Code Review)
Hello Alexey Serbin, Kudu Jenkins, Andrew Wong, Adar Dembo, Hao Hao, Bankim 
Bhavsar,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15176

to look at the new patch set (#4).

Change subject: KUDU-3049: [spark] Automatic handling of schema drift
..

KUDU-3049: [spark] Automatic handling of schema drift

This patch adds a new write option `kudu.handleSchemaDrift`.
If set to true, when new fields are encountered the Kudu table
will be altered to include new columns for those fields.

Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4
---
M java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
M 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala
M 
java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
6 files changed, 158 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/15176/4
--
To view, visit http://gerrit.cloudera.org:8080/15176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4
Gerrit-Change-Number: 15176
Gerrit-PatchSet: 4
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift

2020-02-11 Thread Alexey Serbin (Code Review)
Alexey Serbin has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15176 )

Change subject: KUDU-3049: [spark] Automatic handling of schema drift
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15176/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

http://gerrit.cloudera.org:8080/#/c/15176/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@350
PS3, Line 350: syncClient.alterTable(tableName, alter)
Is this a duplicate of the statement at line 352?



--
To view, visit http://gerrit.cloudera.org:8080/15176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4
Gerrit-Change-Number: 15176
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 11 Feb 2020 19:48:32 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift

2020-02-11 Thread Grant Henke (Code Review)
Grant Henke has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15176 )

Change subject: KUDU-3049: [spark] Automatic handling of schema drift
..


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/15176/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java:

http://gerrit.cloudera.org:8080/#/c/15176/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java@540
PS2, Line 540:   @KuduTestHarness.MasterServerConfig(flags = { 
"--max_num_columns=10" })
 :   public void testAlterExceedsColumnLimit() throws Exception {
 : ArrayList columns = new ArrayList<>();
 : for (int i = 0; i < 10; i++) {
 :   columns.add(new 
ColumnSchema.ColumnSchemaBuilder(Integer.toString(i), Type.INT32)
 :   .key(i == 0)
 :   .build());
 : }
> Do the fancy JUnit exception test?
Done


http://gerrit.cloudera.org:8080/#/c/15176/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java:

http://gerrit.cloudera.org:8080/#/c/15176/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java@1320
PS2, Line 1320: blic void t
> while (true)
Done


http://gerrit.cloudera.org:8080/#/c/15176/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

http://gerrit.cloudera.org:8080/#/c/15176/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@349
PS2, Line 349: }
> Should we take any care to avoid failing if there are multiple concurrent S
Done


http://gerrit.cloudera.org:8080/#/c/15176/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala:

http://gerrit.cloudera.org:8080/#/c/15176/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala@35
PS2, Line 35: when fields with names that are not in
:  *  the target Kudu table are 
encountered, the Kudu table
:  *  will be alte
> May want to clarify that our test for "new" is done by column/field name.
Done



--
To view, visit http://gerrit.cloudera.org:8080/15176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4
Gerrit-Change-Number: 15176
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Grant Henke 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 11 Feb 2020 18:33:43 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift

2020-02-11 Thread Grant Henke (Code Review)
Hello Kudu Jenkins, Andrew Wong, Adar Dembo, Hao Hao, Bankim Bhavsar,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15176

to look at the new patch set (#3).

Change subject: KUDU-3049: [spark] Automatic handling of schema drift
..

KUDU-3049: [spark] Automatic handling of schema drift

This patch adds a new write option `kudu.handleSchemaDrift`.
If set to true, when new fields are encountered the Kudu table
will be altered to include new columns for those fields.

Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4
---
M java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
M 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala
M 
java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
6 files changed, 159 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/15176/3
--
To view, visit http://gerrit.cloudera.org:8080/15176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4
Gerrit-Change-Number: 15176
Gerrit-PatchSet: 3
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift

2020-02-10 Thread Adar Dembo (Code Review)
Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15176 )

Change subject: KUDU-3049: [spark] Automatic handling of schema drift
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/15176/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java:

http://gerrit.cloudera.org:8080/#/c/15176/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java@540
PS2, Line 540: try {
 :   client.alterTable(tableName,
 :   new AlterTableOptions().addNullableColumn("11", 
Type.INT32));
 :   fail();
 : } catch (NonRecoverableException e) {
 :   assertEquals("number of columns 11 is greater than the 
permitted maximum 10",
 :   e.getMessage());
 : }
Do the fancy JUnit exception test?


http://gerrit.cloudera.org:8080/#/c/15176/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java
File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java:

http://gerrit.cloudera.org:8080/#/c/15176/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java@1320
PS2, Line 1320: while(true)
while (true)


http://gerrit.cloudera.org:8080/#/c/15176/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

http://gerrit.cloudera.org:8080/#/c/15176/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@349
PS2, Line 349: syncClient.alterTable(tableName, alter)
Should we take any care to avoid failing if there are multiple concurrent Spark 
applications all with handleSchemaDrift=true? One will successfully add the 
columns while the others will fail in alter, right?


http://gerrit.cloudera.org:8080/#/c/15176/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala:

http://gerrit.cloudera.org:8080/#/c/15176/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala@35
PS2, Line 35: when new fields are encountered the
:  *  Kudu table will be altered to include 
new columns for
:  *  those fields
May want to clarify that our test for "new" is done by column/field name.



--
To view, visit http://gerrit.cloudera.org:8080/15176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4
Gerrit-Change-Number: 15176
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Adar Dembo 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Bankim Bhavsar 
Gerrit-Reviewer: Hao Hao 
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 11 Feb 2020 00:23:31 +
Gerrit-HasComments: Yes


[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift

2020-02-06 Thread Grant Henke (Code Review)
Hello Kudu Jenkins, Andrew Wong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/15176

to look at the new patch set (#2).

Change subject: KUDU-3049: [spark] Automatic handling of schema drift
..

KUDU-3049: [spark] Automatic handling of schema drift

This patch adds a new write option `kudu.handleSchemaDrift`.
If set to true, when new fields are encountered the Kudu table
will be altered to include new columns for those fields.

Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4
---
M java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
M 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala
M 
java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
6 files changed, 130 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/15176/2
--
To view, visit http://gerrit.cloudera.org:8080/15176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4
Gerrit-Change-Number: 15176
Gerrit-PatchSet: 2
Gerrit-Owner: Grant Henke 
Gerrit-Reviewer: Andrew Wong 
Gerrit-Reviewer: Kudu Jenkins (120)


[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift

2020-02-06 Thread Grant Henke (Code Review)
Grant Henke has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/15176


Change subject: KUDU-3049: [spark] Automatic handling of schema drift
..

KUDU-3049: [spark] Automatic handling of schema drift

This patch adds a new write option `kudu.handleSchemaDrift`.
If set to true, when new fields are encountered the Kudu table
will be altered to include new columns for those fields.

Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4
---
M java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java
M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala
M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
M 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala
M 
java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
6 files changed, 136 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/15176/1
--
To view, visit http://gerrit.cloudera.org:8080/15176
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4
Gerrit-Change-Number: 15176
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke