[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift
Grant Henke has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15176 ) Change subject: KUDU-3049: [spark] Automatic handling of schema drift .. KUDU-3049: [spark] Automatic handling of schema drift This patch adds a new write option `kudu.handleSchemaDrift`. If set to true, when new fields are encountered the Kudu table will be altered to include new columns for those fields. Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4 Reviewed-on: http://gerrit.cloudera.org:8080/15176 Reviewed-by: Adar Dembo Tested-by: Kudu Jenkins Reviewed-by: Alexey Serbin --- M java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala 6 files changed, 204 insertions(+), 2 deletions(-) Approvals: Adar Dembo: Looks good to me, approved Kudu Jenkins: Verified Alexey Serbin: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/15176 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4 Gerrit-Change-Number: 15176 Gerrit-PatchSet: 7 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/15176 ) Change subject: KUDU-3049: [spark] Automatic handling of schema drift .. Patch Set 6: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/15176/5/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala File java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala: http://gerrit.cloudera.org:8080/#/c/15176/5/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala@576 PS5, Line 576: // Add a column not in the table schema by duplicating the val column. > I added a step to the test to validate the `handleSchemaDrift = false` hand That's perfect, thank you! -- To view, visit http://gerrit.cloudera.org:8080/15176 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4 Gerrit-Change-Number: 15176 Gerrit-PatchSet: 6 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 11 Feb 2020 23:18:09 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/15176 ) Change subject: KUDU-3049: [spark] Automatic handling of schema drift .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/15176 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4 Gerrit-Change-Number: 15176 Gerrit-PatchSet: 6 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 11 Feb 2020 22:55:02 + Gerrit-HasComments: No
[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift
Hello Alexey Serbin, Kudu Jenkins, Andrew Wong, Adar Dembo, Hao Hao, Bankim Bhavsar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15176 to look at the new patch set (#6). Change subject: KUDU-3049: [spark] Automatic handling of schema drift .. KUDU-3049: [spark] Automatic handling of schema drift This patch adds a new write option `kudu.handleSchemaDrift`. If set to true, when new fields are encountered the Kudu table will be altered to include new columns for those fields. Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4 --- M java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala 6 files changed, 204 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/15176/6 -- To view, visit http://gerrit.cloudera.org:8080/15176 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4 Gerrit-Change-Number: 15176 Gerrit-PatchSet: 6 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/15176 ) Change subject: KUDU-3049: [spark] Automatic handling of schema drift .. Patch Set 5: (4 comments) http://gerrit.cloudera.org:8080/#/c/15176/5/java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java: http://gerrit.cloudera.org:8080/#/c/15176/5/java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java@162 PS5, Line 162: NonRecoverableException thrown = Assert.assertThrows(NonRecoverableException.class, new ThrowingRunnable() { > Too long here and L554. Done http://gerrit.cloudera.org:8080/#/c/15176/5/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala: http://gerrit.cloudera.org:8080/#/c/15176/5/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@357 PS5, Line 357: log.info(s"column already exists in table '$tableName' while handling schema drift") > Would be nice to add a test to cover this. Might be something we can't guar I have a hard time invoking this within a test scenario. It's also difficult to validate it occurred other than parsing logs. It's especially hard because we call syncClient.openTable right before we alter the table. If it helps I added the call to syncClient.alterTable twice here and validated the correct logging behavior. http://gerrit.cloudera.org:8080/#/c/15176/5/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala File java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala: http://gerrit.cloudera.org:8080/#/c/15176/5/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala@572 PS5, Line 572: assertEquals(3, afterDf.schema.fields.length) > paranoid nit: I didn't notice it in the first review round, but there isn't Done http://gerrit.cloudera.org:8080/#/c/15176/5/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala@576 PS5, Line 576: > Do you think it's worth adding a couple more scenarios: I added a step to the test to validate the `handleSchemaDrift = false` handling. For the wrong type handling, that is a behavior/test in its own right regardless of drift. I added a test case for that. -- To view, visit http://gerrit.cloudera.org:8080/15176 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4 Gerrit-Change-Number: 15176 Gerrit-PatchSet: 5 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 11 Feb 2020 22:45:11 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/15176 ) Change subject: KUDU-3049: [spark] Automatic handling of schema drift .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/15176/5/java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java: http://gerrit.cloudera.org:8080/#/c/15176/5/java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java@162 PS5, Line 162: NonRecoverableException thrown = Assert.assertThrows(NonRecoverableException.class, new ThrowingRunnable() { Too long here and L554. http://gerrit.cloudera.org:8080/#/c/15176/5/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala: http://gerrit.cloudera.org:8080/#/c/15176/5/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@357 PS5, Line 357: log.info(s"column already exists in table '$tableName' while handling schema drift") Would be nice to add a test to cover this. Might be something we can't guarantee, but would be flaky if we didn't have this. -- To view, visit http://gerrit.cloudera.org:8080/15176 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4 Gerrit-Change-Number: 15176 Gerrit-PatchSet: 5 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 11 Feb 2020 21:37:21 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/15176 ) Change subject: KUDU-3049: [spark] Automatic handling of schema drift .. Patch Set 5: Code-Review+1 (2 comments) Looks this good to me, I just noticed that a few cases are not covered by the tests in Scala sqlContext. Not sure whether it's critical, but anyways wanted to point at that. http://gerrit.cloudera.org:8080/#/c/15176/5/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala File java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala: http://gerrit.cloudera.org:8080/#/c/15176/5/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala@572 PS5, Line 572: assertEquals(3, afterDf.schema.fields.length) paranoid nit: I didn't notice it in the first review round, but there isn't a precondition on the number of the columns in the table before attempting to insert the row with the new column. Of course, it's highly unlikely that the test table's structure ever changes to add 'val2' column, but to make things consistent in such unlikely case, maybe consider adding assertEquals(2, df.schema.fields.length) before line 569? http://gerrit.cloudera.org:8080/#/c/15176/5/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala@576 PS5, Line 576: Do you think it's worth adding a couple more scenarios: * try to have the scenario similar to insert a row with not-yet-existing column where handleSchemaDrift is not set (i.e. testing the backward compatibility of the schema drift change)? * a scenario when one actor adds a new column with one type, while another actor (i.e. different sqlContext) tries to insert a row with new column of the same name, but of different type right after that? -- To view, visit http://gerrit.cloudera.org:8080/15176 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4 Gerrit-Change-Number: 15176 Gerrit-PatchSet: 5 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 11 Feb 2020 21:04:30 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/15176 ) Change subject: KUDU-3049: [spark] Automatic handling of schema drift .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/15176/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala: http://gerrit.cloudera.org:8080/#/c/15176/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@357 PS3, Line 357: e.getMessage.contains("column already exists") > Relying on the error message making this brittle. Is it possible to use st Done -- To view, visit http://gerrit.cloudera.org:8080/15176 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4 Gerrit-Change-Number: 15176 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 11 Feb 2020 20:21:12 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift
Hello Alexey Serbin, Kudu Jenkins, Andrew Wong, Adar Dembo, Hao Hao, Bankim Bhavsar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15176 to look at the new patch set (#5). Change subject: KUDU-3049: [spark] Automatic handling of schema drift .. KUDU-3049: [spark] Automatic handling of schema drift This patch adds a new write option `kudu.handleSchemaDrift`. If set to true, when new fields are encountered the Kudu table will be altered to include new columns for those fields. Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4 --- M java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala 6 files changed, 160 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/15176/5 -- To view, visit http://gerrit.cloudera.org:8080/15176 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4 Gerrit-Change-Number: 15176 Gerrit-PatchSet: 5 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/15176 ) Change subject: KUDU-3049: [spark] Automatic handling of schema drift .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/15176/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala: http://gerrit.cloudera.org:8080/#/c/15176/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@357 PS3, Line 357: e.getMessage.contains("column already exists") Relying on the error message making this brittle. Is it possible to use status instead, something like: Status exStatus = e.getStatus(); if (exStatus.isAlreadyPresent()) { ... } -- To view, visit http://gerrit.cloudera.org:8080/15176 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4 Gerrit-Change-Number: 15176 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 11 Feb 2020 19:53:31 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/15176 ) Change subject: KUDU-3049: [spark] Automatic handling of schema drift .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/15176/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala: http://gerrit.cloudera.org:8080/#/c/15176/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@350 PS3, Line 350: syncClient.alterTable(tableName, alter) > Is this a duplicate of the statement at line 352? Done -- To view, visit http://gerrit.cloudera.org:8080/15176 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4 Gerrit-Change-Number: 15176 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 11 Feb 2020 19:53:31 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift
Hello Alexey Serbin, Kudu Jenkins, Andrew Wong, Adar Dembo, Hao Hao, Bankim Bhavsar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15176 to look at the new patch set (#4). Change subject: KUDU-3049: [spark] Automatic handling of schema drift .. KUDU-3049: [spark] Automatic handling of schema drift This patch adds a new write option `kudu.handleSchemaDrift`. If set to true, when new fields are encountered the Kudu table will be altered to include new columns for those fields. Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4 --- M java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala 6 files changed, 158 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/15176/4 -- To view, visit http://gerrit.cloudera.org:8080/15176 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4 Gerrit-Change-Number: 15176 Gerrit-PatchSet: 4 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/15176 ) Change subject: KUDU-3049: [spark] Automatic handling of schema drift .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/15176/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala: http://gerrit.cloudera.org:8080/#/c/15176/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@350 PS3, Line 350: syncClient.alterTable(tableName, alter) Is this a duplicate of the statement at line 352? -- To view, visit http://gerrit.cloudera.org:8080/15176 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4 Gerrit-Change-Number: 15176 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 11 Feb 2020 19:48:32 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/15176 ) Change subject: KUDU-3049: [spark] Automatic handling of schema drift .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/15176/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java: http://gerrit.cloudera.org:8080/#/c/15176/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java@540 PS2, Line 540: @KuduTestHarness.MasterServerConfig(flags = { "--max_num_columns=10" }) : public void testAlterExceedsColumnLimit() throws Exception { : ArrayList columns = new ArrayList<>(); : for (int i = 0; i < 10; i++) { : columns.add(new ColumnSchema.ColumnSchemaBuilder(Integer.toString(i), Type.INT32) : .key(i == 0) : .build()); : } > Do the fancy JUnit exception test? Done http://gerrit.cloudera.org:8080/#/c/15176/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java: http://gerrit.cloudera.org:8080/#/c/15176/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java@1320 PS2, Line 1320: blic void t > while (true) Done http://gerrit.cloudera.org:8080/#/c/15176/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala: http://gerrit.cloudera.org:8080/#/c/15176/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@349 PS2, Line 349: } > Should we take any care to avoid failing if there are multiple concurrent S Done http://gerrit.cloudera.org:8080/#/c/15176/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala: http://gerrit.cloudera.org:8080/#/c/15176/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala@35 PS2, Line 35: when fields with names that are not in : * the target Kudu table are encountered, the Kudu table : * will be alte > May want to clarify that our test for "new" is done by column/field name. Done -- To view, visit http://gerrit.cloudera.org:8080/15176 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4 Gerrit-Change-Number: 15176 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Grant Henke Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 11 Feb 2020 18:33:43 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift
Hello Kudu Jenkins, Andrew Wong, Adar Dembo, Hao Hao, Bankim Bhavsar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15176 to look at the new patch set (#3). Change subject: KUDU-3049: [spark] Automatic handling of schema drift .. KUDU-3049: [spark] Automatic handling of schema drift This patch adds a new write option `kudu.handleSchemaDrift`. If set to true, when new fields are encountered the Kudu table will be altered to include new columns for those fields. Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4 --- M java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala 6 files changed, 159 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/15176/3 -- To view, visit http://gerrit.cloudera.org:8080/15176 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4 Gerrit-Change-Number: 15176 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/15176 ) Change subject: KUDU-3049: [spark] Automatic handling of schema drift .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/15176/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java: http://gerrit.cloudera.org:8080/#/c/15176/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java@540 PS2, Line 540: try { : client.alterTable(tableName, : new AlterTableOptions().addNullableColumn("11", Type.INT32)); : fail(); : } catch (NonRecoverableException e) { : assertEquals("number of columns 11 is greater than the permitted maximum 10", : e.getMessage()); : } Do the fancy JUnit exception test? http://gerrit.cloudera.org:8080/#/c/15176/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java File java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java: http://gerrit.cloudera.org:8080/#/c/15176/2/java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java@1320 PS2, Line 1320: while(true) while (true) http://gerrit.cloudera.org:8080/#/c/15176/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala: http://gerrit.cloudera.org:8080/#/c/15176/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@349 PS2, Line 349: syncClient.alterTable(tableName, alter) Should we take any care to avoid failing if there are multiple concurrent Spark applications all with handleSchemaDrift=true? One will successfully add the columns while the others will fail in alter, right? http://gerrit.cloudera.org:8080/#/c/15176/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala: http://gerrit.cloudera.org:8080/#/c/15176/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala@35 PS2, Line 35: when new fields are encountered the : * Kudu table will be altered to include new columns for : * those fields May want to clarify that our test for "new" is done by column/field name. -- To view, visit http://gerrit.cloudera.org:8080/15176 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4 Gerrit-Change-Number: 15176 Gerrit-PatchSet: 2 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Adar Dembo Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Bankim Bhavsar Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Tue, 11 Feb 2020 00:23:31 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift
Hello Kudu Jenkins, Andrew Wong, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/15176 to look at the new patch set (#2). Change subject: KUDU-3049: [spark] Automatic handling of schema drift .. KUDU-3049: [spark] Automatic handling of schema drift This patch adds a new write option `kudu.handleSchemaDrift`. If set to true, when new fields are encountered the Kudu table will be altered to include new columns for those fields. Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4 --- M java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala 6 files changed, 130 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/15176/2 -- To view, visit http://gerrit.cloudera.org:8080/15176 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4 Gerrit-Change-Number: 15176 Gerrit-PatchSet: 2 Gerrit-Owner: Grant Henke Gerrit-Reviewer: Andrew Wong Gerrit-Reviewer: Kudu Jenkins (120)
[kudu-CR] KUDU-3049: [spark] Automatic handling of schema drift
Grant Henke has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15176 Change subject: KUDU-3049: [spark] Automatic handling of schema drift .. KUDU-3049: [spark] Automatic handling of schema drift This patch adds a new write option `kudu.handleSchemaDrift`. If set to true, when new fields are encountered the Kudu table will be altered to include new columns for those fields. Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4 --- M java/kudu-client/src/test/java/org/apache/kudu/client/TestAlterTable.java M java/kudu-client/src/test/java/org/apache/kudu/client/TestKuduClient.java M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala 6 files changed, 136 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/76/15176/1 -- To view, visit http://gerrit.cloudera.org:8080/15176 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ib1edebb293d6ae79c26a0ecb9ce7755308f667f4 Gerrit-Change-Number: 15176 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke