Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/12087 )
Change subject: KUDU-2640: Add Spark Structured Streaming Sink ...................................................................... Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/12087/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12087/2//COMMIT_MSG@9 PS2, Line 9: patche > typo Done http://gerrit.cloudera.org:8080/#/c/12087/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala: http://gerrit.cloudera.org:8080/#/c/12087/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala@216 PS2, Line 216: private def getOperationType(parameters: Map[String, String]): OperationType = { : parameters.get(OPERATION).map(stringToOperationType).getOrElse(Upsert) : } > Hrm, I get why this is the case for KuduSink, but should it be the case for I didn't change this behavior. I just refactored it into the method from above. The reason upsert is the default is because in order to correctly handle Spark retires and upsert is need. A better choice might be insert ignore which is tracked by KUDU-1563. http://gerrit.cloudera.org:8080/#/c/12087/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala@466 PS2, Line 466: batchId: Long > May be obvious, but mind adding a small note on why we shouldn't use this? Done http://gerrit.cloudera.org:8080/#/c/12087/2/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/DefaultSource.scala@466 PS2, Line 466: batchId: Long > Yeah, a comment would be nice. I'm assuming this is for de-duplication in t Like Mike said the batchId is provided by spark so you can handle dedupes in the case of retries. Kudu doesn't have a way to leverage it currently. Today, we use upsert and in the future we could use insert ignore. -- To view, visit http://gerrit.cloudera.org:8080/12087 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I731e35f82c8cca7d911e4d879aa6853112132b17 Gerrit-Change-Number: 12087 Gerrit-PatchSet: 2 Gerrit-Owner: Grant Henke <granthe...@apache.org> Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com> Gerrit-Reviewer: Grant Henke <granthe...@apache.org> Gerrit-Reviewer: Hao Hao <hao....@cloudera.com> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy <mpe...@apache.org> Gerrit-Comment-Date: Wed, 09 Jan 2019 20:50:35 +0000 Gerrit-HasComments: Yes