[kudu-CR] KUDU-2303: Add ignoreNull option to upsertRows
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 ) Change subject: KUDU-2303: Add ignoreNull option to upsertRows .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@7 PS2, Line 7: ignoreNull Curious to hear others' opinions, but to me a name like 'treatNullAsUnset' or 'nullIsUnset' is more clear than 'ignoreNull'. Thoughts? -- To view, visit http://gerrit.cloudera.org:8080/9834 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8 Gerrit-Change-Number: 9834 Gerrit-PatchSet: 2 Gerrit-Owner: Fengling WangGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Thu, 29 Mar 2018 16:56:28 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2303: Add ignoreNull option to upsertRows
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 ) Change subject: KUDU-2303: Add ignoreNull option to upsertRows .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@7 PS2, Line 7: KUDU-2303 I think you mean 2250? -- To view, visit http://gerrit.cloudera.org:8080/9834 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8 Gerrit-Change-Number: 9834 Gerrit-PatchSet: 2 Gerrit-Owner: Fengling WangGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Thu, 29 Mar 2018 16:48:19 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2303: Add ignoreNull option to upsertRows
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 ) Change subject: KUDU-2303: Add ignoreNull option to upsertRows .. Patch Set 2: (1 comment) > Patch Set 2: > > (5 comments) http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala: http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala@43 PS2, Line 43: private[kudu] case object UpsertIgnoreNull extends OperationType { > If we keep adding options to modify the behavior of these operations we're On second thought it may be cleaner to just introduce a WriteOptions class and allow extensibility via adding fields. This could be passed to the insert/upsert/etc calls in KuduContext. -- To view, visit http://gerrit.cloudera.org:8080/9834 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8 Gerrit-Change-Number: 9834 Gerrit-PatchSet: 2 Gerrit-Owner: Fengling WangGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Thu, 29 Mar 2018 14:45:11 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2303: Add ignoreNull option to upsertRows
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 ) Change subject: KUDU-2303: Add ignoreNull option to upsertRows .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@9 PS2, Line 9: upsert only non-Null nit: Try to wrap commit message lines at about 80 chars. http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@10 PS2, Line 10: nit: Add "the". http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@12 PS2, Line 12: This nit: "For example, this feature is useful..." http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@13 PS2, Line 13: where nit: s/where/because -- To view, visit http://gerrit.cloudera.org:8080/9834 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8 Gerrit-Change-Number: 9834 Gerrit-PatchSet: 2 Gerrit-Owner: Fengling WangGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Thu, 29 Mar 2018 08:14:54 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2303: Add ignoreNull option to upsertRows
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 ) Change subject: KUDU-2303: Add ignoreNull option to upsertRows .. Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@9 PS2, Line 9: upsertRows Ignoring nulls can apply equally well to any write type, not just upserts. How do you feel about making this a general option for all writes? http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@11 PS2, Line 11: This patch doesn't expose this functionality through DefaultSource.createRelation, which is another important way that Spark can do writes to Kudu. http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala: http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@259 PS2, Line 259: value s/value/values http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@326 PS2, Line 326: if (!operationType.ignoreNull) operation.getRow.setNull(kuduIdx) It's probably worth adding a check that the column isn't in the primary key. Even though it's an error to set a primary key to null (since PK columns are always non-nullable), I think the error message will be better as 'can't set non-nullable column to null' than 'primary key column not set'. http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala: http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala@43 PS2, Line 43: private[kudu] case object UpsertIgnoreNull extends OperationType { If we keep adding options to modify the behavior of these operations we're quickly going to have an explosion of operation types. Might be time to refactor along these lines: private[kudu] case class Insert(val ignoreDuplicateRowErrors: Boolean) extends OperationType { override def operation(table: KuduTable): Operation = table.newInsert() } private[kudu] case object Update extends OperationType { override def operation(table: KuduTable): Operation = table.newUpdate() } private[kudu] case class Upsert(val ignoreNull: Boolean) extends OperationType { override def operation(table: KuduTable): Operation = table.newUpsert() } private[kudu] case object Delete extends OperationType { override def operation(table: KuduTable): Operation = table.newDelete() } -- To view, visit http://gerrit.cloudera.org:8080/9834 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8 Gerrit-Change-Number: 9834 Gerrit-PatchSet: 2 Gerrit-Owner: Fengling WangGerrit-Reviewer: Dan Burkert Gerrit-Reviewer: Hao Hao Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley Gerrit-Comment-Date: Thu, 29 Mar 2018 03:12:42 + Gerrit-HasComments: Yes
[kudu-CR] KUDU-2303: Add ignoreNull option to upsertRows
Fengling Wang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9834 Change subject: KUDU-2303: Add ignoreNull option to upsertRows .. KUDU-2303: Add ignoreNull option to upsertRows This patch adds the ignoreNull option to upsertRows so that users can upsert only non-Null columns and leave the rest of columns unchanged. This feature is useful when users use Spark streaming to process JSON and upsert to Kudu, where missing column values from JSON are set to NULL, resulting in some existing row values being upserted to Null, which is not desired. Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8 --- M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala M java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/KuduContextTest.scala M java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/TestContext.scala 4 files changed, 44 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/34/9834/1 -- To view, visit http://gerrit.cloudera.org:8080/9834 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8 Gerrit-Change-Number: 9834 Gerrit-PatchSet: 1 Gerrit-Owner: Fengling Wang