Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 )
Change subject: KUDU-2303: Add ignoreNull option to upsertRows ...................................................................... Patch Set 2: (5 comments) http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@9 PS2, Line 9: upsertRows Ignoring nulls can apply equally well to any write type, not just upserts. How do you feel about making this a general option for all writes? http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@11 PS2, Line 11: This patch doesn't expose this functionality through DefaultSource.createRelation, which is another important way that Spark can do writes to Kudu. http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala: http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@259 PS2, Line 259: value s/value/values http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@326 PS2, Line 326: if (!operationType.ignoreNull) operation.getRow.setNull(kuduIdx) It's probably worth adding a check that the column isn't in the primary key. Even though it's an error to set a primary key to null (since PK columns are always non-nullable), I think the error message will be better as 'can't set non-nullable column to null' than 'primary key column not set'. http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala: http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala@43 PS2, Line 43: private[kudu] case object UpsertIgnoreNull extends OperationType { If we keep adding options to modify the behavior of these operations we're quickly going to have an explosion of operation types. Might be time to refactor along these lines: private[kudu] case class Insert(val ignoreDuplicateRowErrors: Boolean) extends OperationType { override def operation(table: KuduTable): Operation = table.newInsert() } private[kudu] case object Update extends OperationType { override def operation(table: KuduTable): Operation = table.newUpdate() } private[kudu] case class Upsert(val ignoreNull: Boolean) extends OperationType { override def operation(table: KuduTable): Operation = table.newUpsert() } private[kudu] case object Delete extends OperationType { override def operation(table: KuduTable): Operation = table.newDelete() } -- To view, visit http://gerrit.cloudera.org:8080/9834 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8 Gerrit-Change-Number: 9834 Gerrit-PatchSet: 2 Gerrit-Owner: Fengling Wang <[email protected]> Gerrit-Reviewer: Dan Burkert <[email protected]> Gerrit-Reviewer: Hao Hao <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Will Berkeley <[email protected]> Gerrit-Comment-Date: Thu, 29 Mar 2018 03:12:42 +0000 Gerrit-HasComments: Yes
