Dan Burkert has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9834 )

Change subject: KUDU-2303: Add ignoreNull option to upsertRows
......................................................................


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@9
PS2, Line 9: upsertRows
Ignoring nulls can apply equally well to any write type, not just upserts.  How 
do you feel about making this a general option for all writes?


http://gerrit.cloudera.org:8080/#/c/9834/2//COMMIT_MSG@11
PS2, Line 11:
This patch doesn't expose this functionality through 
DefaultSource.createRelation, which is another important way that Spark can do 
writes to Kudu.


http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@259
PS2, Line 259: value
s/value/values


http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@326
PS2, Line 326:             if (!operationType.ignoreNull) 
operation.getRow.setNull(kuduIdx)
It's probably worth adding a check that the column isn't in the primary key.  
Even though it's an error to set a primary key to null (since PK columns are 
always non-nullable), I think the error message will be better as 'can't set 
non-nullable column to null' than 'primary key column not set'.


http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala:

http://gerrit.cloudera.org:8080/#/c/9834/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala@43
PS2, Line 43: private[kudu] case object UpsertIgnoreNull extends OperationType {
If we keep adding options to modify the behavior of these operations we're 
quickly going to have an explosion of operation types.  Might be time to 
refactor along these lines:

private[kudu] case class Insert(val ignoreDuplicateRowErrors: Boolean) extends 
OperationType {
  override def operation(table: KuduTable): Operation = table.newInsert()
}
private[kudu] case object Update extends OperationType {
  override def operation(table: KuduTable): Operation = table.newUpdate()
}
private[kudu] case class Upsert(val ignoreNull: Boolean) extends OperationType {
  override def operation(table: KuduTable): Operation = table.newUpsert()
}
private[kudu] case object Delete extends OperationType {
  override def operation(table: KuduTable): Operation = table.newDelete()
}



--
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 2
Gerrit-Owner: Fengling Wang <[email protected]>
Gerrit-Reviewer: Dan Burkert <[email protected]>
Gerrit-Reviewer: Hao Hao <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <[email protected]>
Gerrit-Comment-Date: Thu, 29 Mar 2018 03:12:42 +0000
Gerrit-HasComments: Yes

Reply via email to