[kudu-CR] KUDU-2371: Add WriteOptions class and ignoreNull option to upsertRows

Will Berkeley (Code Review) Mon, 02 Apr 2018 15:36:57 -0700

Will Berkeley has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9834 )


Change subject: KUDU-2371: Add WriteOptions class and ignoreNull option to 
upsertRows
......................................................................


Patch Set 3:

(9 comments)

It looks pretty good on the KuduContextSide but I think we should also support 
the ignoreNull options on the DefaultSource side, and for that I think we will 
need to support new parameters. See createRelation in DefaultSource.scala, and 
also DefaultSourceTest, since the way DefaultSource works is kinda magic.

http://gerrit.cloudera.org:8080/#/c/9834/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9834/3//COMMIT_MSG@7
PS3, Line 7: KUDU-2371
nit: Could you also add a [spark] tag at the beginning of the subject line?


http://gerrit.cloudera.org:8080/#/c/9834/3//COMMIT_MSG@7
PS3, Line 7: WriteOptions
nit: KuduWriteOptions.


http://gerrit.cloudera.org:8080/#/c/9834/3//COMMIT_MSG@10
PS3, Line 10: writes to the Kudu table
It's unclear what this applies to. Could you mention this is referring to 
writing with Spark? See also my comment about tagging the subject line with 
[spark].


http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@310
PS3, Line 310: writeOptions:
             :                                  KuduWriteOptions = new 
KuduWriteOptions()):
We don't have a Scala style guide, perse, but I think we should keep the full 
parameter (name + type) on the same line if we can. Howabout an indentation 
strategy like

  private def foo(writeOptions: KuduWriteOptions = new KuduWriteOptions)
    : RowErrorsAndOverflowStatus = {

where the : <return type> is on a separate line and indented 2 spaces from the 
method name. See 
https://github.com/databricks/scala-style-guide#spacing-and-indentation.


http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@328
PS3, Line 328: Can't set non-nullable column to null
You mean "Can't set primary key column to null". The message should also 
include the name of the primary key column, e.g. "Can't set primary key column 
`key` to null".

Depending on the write op and options a non-nullable column might be set to 
null and then the null ignored, so it's specifically the key column situation 
that we're calling out.


http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala:

http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala@20
PS3, Line 20: case
This isn't a case class. A case class is meant as a immutable sum type (with 
pattern matching); the KuduWriteOptions() class is designed to be mutated to 
set options.

See https://docs.scala-lang.org/tour/case-classes.html if you are curious about 
case classes.


http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala@27
PS3, Line 27: def ignoreDuplicateRowErrors_
Looks like you may have been inspired by 
https://www.dustinmartin.net/getters-and-setters-in-scala/? So that means the 
name of the setter method is actually ignoreDuplicateRowErrors_=, and the 
"assignment"

  wo.ignoreDuplicateRowErrors = true

is actually a method call

  wo.ignoreDuplicateRowErrors_=(true)

Looking around more, it appears this is a standard recommendation of the Scala 
people? It seems too clever by half for me and I'd prefer

  w.setIgnoreDuplicateRows(true)

but if others are ok with it then I'd be ok with it.


http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala:

http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala@a32
PS3, Line 32:
            :
            :
            :
I think we need to leave InsertIgnore in for API compatibility.


http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala
File 
java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala:

http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala@174
PS3, Line 174: kuduWriteOptions.ignoreDuplicateRowErrors = true
             :     kuduContext.insertRows(updateDF, tableName, kuduWriteOptions)
We need to leave InsertIgnore in, but it would be good to do this test both 
with InsertIgnore and with KuduWriteOptions. We can mark InsertIgnore as 
deprecated in favor of KuduWriteOptions.ignoreDuplicateRowErrors.



--
To view, visit http://gerrit.cloudera.org:8080/9834
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8
Gerrit-Change-Number: 9834
Gerrit-PatchSet: 3
Gerrit-Owner: Fengling Wang <[email protected]>
Gerrit-Reviewer: Dan Burkert <[email protected]>
Gerrit-Reviewer: Fengling Wang <[email protected]>
Gerrit-Reviewer: Hao Hao <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Reviewer: Will Berkeley <[email protected]>
Gerrit-Comment-Date: Mon, 02 Apr 2018 22:36:02 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2371: Add WriteOptions class and ignoreNull option to upsertRows

Reply via email to