Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9834 )
Change subject: KUDU-2371: Add WriteOptions class and ignoreNull option to upsertRows ...................................................................... Patch Set 3: (9 comments) It looks pretty good on the KuduContextSide but I think we should also support the ignoreNull options on the DefaultSource side, and for that I think we will need to support new parameters. See createRelation in DefaultSource.scala, and also DefaultSourceTest, since the way DefaultSource works is kinda magic. http://gerrit.cloudera.org:8080/#/c/9834/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/9834/3//COMMIT_MSG@7 PS3, Line 7: KUDU-2371 nit: Could you also add a [spark] tag at the beginning of the subject line? http://gerrit.cloudera.org:8080/#/c/9834/3//COMMIT_MSG@7 PS3, Line 7: WriteOptions nit: KuduWriteOptions. http://gerrit.cloudera.org:8080/#/c/9834/3//COMMIT_MSG@10 PS3, Line 10: writes to the Kudu table It's unclear what this applies to. Could you mention this is referring to writing with Spark? See also my comment about tagging the subject line with [spark]. http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala: http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@310 PS3, Line 310: writeOptions: : KuduWriteOptions = new KuduWriteOptions()): We don't have a Scala style guide, perse, but I think we should keep the full parameter (name + type) on the same line if we can. Howabout an indentation strategy like private def foo(writeOptions: KuduWriteOptions = new KuduWriteOptions) : RowErrorsAndOverflowStatus = { where the : <return type> is on a separate line and indented 2 spaces from the method name. See https://github.com/databricks/scala-style-guide#spacing-and-indentation. http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@328 PS3, Line 328: Can't set non-nullable column to null You mean "Can't set primary key column to null". The message should also include the name of the primary key column, e.g. "Can't set primary key column `key` to null". Depending on the write op and options a non-nullable column might be set to null and then the null ignored, so it's specifically the key column situation that we're calling out. http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala: http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala@20 PS3, Line 20: case This isn't a case class. A case class is meant as a immutable sum type (with pattern matching); the KuduWriteOptions() class is designed to be mutated to set options. See https://docs.scala-lang.org/tour/case-classes.html if you are curious about case classes. http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduWriteOptions.scala@27 PS3, Line 27: def ignoreDuplicateRowErrors_ Looks like you may have been inspired by https://www.dustinmartin.net/getters-and-setters-in-scala/? So that means the name of the setter method is actually ignoreDuplicateRowErrors_=, and the "assignment" wo.ignoreDuplicateRowErrors = true is actually a method call wo.ignoreDuplicateRowErrors_=(true) Looking around more, it appears this is a standard recommendation of the Scala people? It seems too clever by half for me and I'd prefer w.setIgnoreDuplicateRows(true) but if others are ok with it then I'd be ok with it. http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala: http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/OperationType.scala@a32 PS3, Line 32: : : : I think we need to leave InsertIgnore in for API compatibility. http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala File java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala: http://gerrit.cloudera.org:8080/#/c/9834/3/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala@174 PS3, Line 174: kuduWriteOptions.ignoreDuplicateRowErrors = true : kuduContext.insertRows(updateDF, tableName, kuduWriteOptions) We need to leave InsertIgnore in, but it would be good to do this test both with InsertIgnore and with KuduWriteOptions. We can mark InsertIgnore as deprecated in favor of KuduWriteOptions.ignoreDuplicateRowErrors. -- To view, visit http://gerrit.cloudera.org:8080/9834 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ide908ea29f572849eca0ba850ee197c1b22a07c8 Gerrit-Change-Number: 9834 Gerrit-PatchSet: 3 Gerrit-Owner: Fengling Wang <fw...@cloudera.com> Gerrit-Reviewer: Dan Burkert <danburk...@apache.org> Gerrit-Reviewer: Fengling Wang <fw...@cloudera.com> Gerrit-Reviewer: Hao Hao <hao....@cloudera.com> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <t...@apache.org> Gerrit-Reviewer: Will Berkeley <wdberke...@gmail.com> Gerrit-Comment-Date: Mon, 02 Apr 2018 22:36:02 +0000 Gerrit-HasComments: Yes