[kudu-CR] Changes: In kudu-spark, added support to delete rows from Kudu table in DataSource API. This is done using the SaveMode.ErrorIfExists.
Ram Mettu has posted comments on this change. Change subject: Changes: In kudu-spark, added support to delete rows from Kudu table in DataSource API. This is done using the SaveMode.ErrorIfExists. .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/3552/2//COMMIT_MSG Commit Message: Line 7: Changes: In kudu-spark, added support to delete rows from Kudu table > please remove trailing whitespace here and below. Sure will fix it Line 14: .mode(SaveMode.ErrorIfExists).kudu > Why are you using 'ErrorIfExists' here? It's not what is documented for tha There is no delete mode in the data source api. So we are trying to fit it into one of the 4 defined options. Dan is suggesting a different approach will take a look Line 14: .mode(SaveMode.ErrorIfExists).kudu > Agreed, I think a better way to expose this would be an implicit method on Ok will look into it Line 21: * If there are additional columns along with primary columns, > I don't think it will cause problems to ignore them. Ran into issues having extra columns expect the primary cols in newDelete operation. I can post the error if delete is supposed to work fine with extra columns. -- To view, visit http://gerrit.cloudera.org:8080/3552 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibf7207d5ee525c07b54f7544e0ba63deb47e6acd Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Ram Mettu Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Ram Mettu Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Changes: In kudu-spark, added support to delete rows from Kudu table in DataSource API. This is done using the SaveMode.ErrorIfExists.
Dan Burkert has posted comments on this change. Change subject: Changes: In kudu-spark, added support to delete rows from Kudu table in DataSource API. This is done using the SaveMode.ErrorIfExists. .. Patch Set 2: (3 comments) http://gerrit.cloudera.org:8080/#/c/3552/2//COMMIT_MSG Commit Message: Line 7: Changes: In kudu-spark, added support to delete rows from Kudu table please remove trailing whitespace here and below. Line 14: .mode(SaveMode.ErrorIfExists).kudu > Why are you using 'ErrorIfExists' here? It's not what is documented for tha Agreed, I think a better way to expose this would be an implicit method on RDDs named something like "deleteFromKuduTable". See the Cassandra spark connector's "saveToCassandra" for something similar. Line 21: * If there are additional columns along with primary columns, > seems a little unintuitive here to ignore the extra columns, IMO. what do o I don't think it will cause problems to ignore them. -- To view, visit http://gerrit.cloudera.org:8080/3552 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibf7207d5ee525c07b54f7544e0ba63deb47e6acd Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Ram Mettu Gerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Changes: In kudu-spark, added support to delete rows from Kudu table in DataSource API. This is done using the SaveMode.ErrorIfExists.
Todd Lipcon has posted comments on this change. Change subject: Changes: In kudu-spark, added support to delete rows from Kudu table in DataSource API. This is done using the SaveMode.ErrorIfExists. .. Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/3552/2//COMMIT_MSG Commit Message: Line 7: Changes: In kudu-spark, added support to delete rows from Kudu table nit: please follow the standard commit message format (one-line short summary, followed by a blank line, followed by the list of changes). See http://chris.beams.io/posts/git-commit/ Line 11: Usage : this usage info is probably better to put in the doc for the data frame, or perhaos in docs/developing.adoc Line 14: .mode(SaveMode.ErrorIfExists).kudu Why are you using 'ErrorIfExists' here? It's not what is documented for that save mode ("ErrorIfExists mode means that when saving a DataFrame to a data source, if data already exists, an exception is expected to be thrown.") Line 21: * If there are additional columns along with primary columns, seems a little unintuitive here to ignore the extra columns, IMO. what do other people think? http://gerrit.cloudera.org:8080/#/c/3552/2/java/kudu-spark/src/main/scala/org/kududb/spark/kudu/KuduContext.scala File java/kudu-spark/src/main/scala/org/kududb/spark/kudu/KuduContext.scala: Line 229: for ((sparkIdx, kuduIdx) <- indices) { this stuff is all duplicated - can you refactor it into a utility method? -- To view, visit http://gerrit.cloudera.org:8080/3552 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibf7207d5ee525c07b54f7544e0ba63deb47e6acd Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Ram Mettu Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-HasComments: Yes
[kudu-CR] Changes: In kudu-spark, added support to delete rows from Kudu table in DataSource API. This is done using the SaveMode.ErrorIfExists.
Kudu Jenkins has posted comments on this change. Change subject: Changes: In kudu-spark, added support to delete rows from Kudu table in DataSource API. This is done using the SaveMode.ErrorIfExists. .. Patch Set 2: Build Started http://104.196.14.100/job/kudu-gerrit/2146/ -- To view, visit http://gerrit.cloudera.org:8080/3552 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibf7207d5ee525c07b54f7544e0ba63deb47e6acd Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Ram Mettu Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No
[kudu-CR] Changes: In kudu-spark, added support to delete rows from Kudu table in DataSource API. This is done using the SaveMode.ErrorIfExists.
Ram Mettu has uploaded a new patch set (#2). Change subject: Changes: In kudu-spark, added support to delete rows from Kudu table in DataSource API. This is done using the SaveMode.ErrorIfExists. .. Changes: In kudu-spark, added support to delete rows from Kudu table in DataSource API. This is done using the SaveMode.ErrorIfExists. Usage : tableDataFrame.write.options(kuduOptions) .mode(SaveMode.ErrorIfExists).kudu - this delete all rows in tableDataFrame - tableDataFrame MUST provide all the cols which are part of the primary key in Kudu table * If any column which is part of primary key is missing in the dataframe, it will give an error * If there are additional columns along with primary columns, it is okay, it will delete all the rows * If tableDataFrame contains any rows which are not present in the Kudu table, it will give an error for those missing rows Change-Id: Ibf7207d5ee525c07b54f7544e0ba63deb47e6acd --- M java/kudu-spark/src/main/scala/org/kududb/spark/kudu/DefaultSource.scala M java/kudu-spark/src/main/scala/org/kududb/spark/kudu/KuduContext.scala M java/kudu-spark/src/test/scala/org/kududb/spark/kudu/DefaultSourceTest.scala 3 files changed, 145 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/3552/2 -- To view, visit http://gerrit.cloudera.org:8080/3552 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibf7207d5ee525c07b54f7544e0ba63deb47e6acd Gerrit-PatchSet: 2 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Ram Mettu Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] Changes:
Ram Mettu has uploaded a new change for review. http://gerrit.cloudera.org:8080/3552 Change subject: Changes: .. Changes: In kudu-spark, added support to delete rows from Kudu table in DataSource API. This is done using the SaveMode.ErrorIfExists. Usage : tableDataFrame.write.options(kuduOptions).mode(SaveMode.ErrorIfExists).kudu - this delete all rows in tableDataFrame - tableDataFrame MUST provide all the columns which are part of the primary key in Kudu table * If any column which is part of primary key is missing in the dataframe, it will give an error * If there are additional columns along with primary columns, it is okay, it will delete all the rows * If tableDataFrame contains any rows which are not present in the Kudu table, it will give an error for those missing rows Change-Id: Ibf7207d5ee525c07b54f7544e0ba63deb47e6acd --- M java/kudu-spark/src/main/scala/org/kududb/spark/kudu/DefaultSource.scala M java/kudu-spark/src/main/scala/org/kududb/spark/kudu/KuduContext.scala M java/kudu-spark/src/test/scala/org/kududb/spark/kudu/DefaultSourceTest.scala 3 files changed, 145 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/52/3552/1 -- To view, visit http://gerrit.cloudera.org:8080/3552 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ibf7207d5ee525c07b54f7544e0ba63deb47e6acd Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Ram Mettu
[kudu-CR] Changes:
Kudu Jenkins has posted comments on this change. Change subject: Changes: .. Patch Set 1: Build Started http://104.196.14.100/job/kudu-gerrit/2145/ -- To view, visit http://gerrit.cloudera.org:8080/3552 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ibf7207d5ee525c07b54f7544e0ba63deb47e6acd Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Ram Mettu Gerrit-Reviewer: Kudu Jenkins Gerrit-HasComments: No