Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/18211 )
Change subject: [java] KUDU-3350 add the support for deleteIgnoreRows ...................................................................... Patch Set 4: (4 comments) Thank you for the patch! http://gerrit.cloudera.org:8080/#/c/18211/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/18211/4//COMMIT_MSG@9 PS4, Line 9: Spark launches a speculative (duplicate) task for the long running task. If : the task runs deleting operations on kudu, it will cause 'key not found' : issue. This patch adds the basic functionality to support speculative : deleting tasks by adding deleteIgnoreRows. I'm not sure I understand how it's relevant to talk about duplicated tasks to delete same rows. There might be many other scenarios which could send in a DELETE operation when the target row isn't present in the table. Maybe, simply state that this patch adds a new deleteIgnoreRows() wrapper for DELETE_IGNORE operations introduced with KUDU-1563 (see https://github.com/apache/kudu/commit/7fbe341e51a9e4245d8b3017cecf11e393c3a22b)? http://gerrit.cloudera.org:8080/#/c/18211/4/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala: http://gerrit.cloudera.org:8080/#/c/18211/4/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@334 PS4, Line 334: that have already been deleted. nit: remove this -- those absent rows might never be there in the first place, right? http://gerrit.cloudera.org:8080/#/c/18211/4/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@347 PS4, Line 347: ${numDeletes.value.get(tableName)} Is this going to be reported as an actual number of deleted rows or just the number of issued DELETE_IGNORE operations? http://gerrit.cloudera.org:8080/#/c/18211/4/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala File java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala: http://gerrit.cloudera.org:8080/#/c/18211/4/java/kudu-spark/src/test/scala/org/apache/kudu/spark/kudu/DefaultSourceTest.scala@74 PS4, Line 74: testDuplicateDelete In addition to this testcase with duplicated delete operations, maybe it's worth adding a very simple test to make sure that deleting anything from an empty table using DELETE_IGNORE always succeeds? -- To view, visit http://gerrit.cloudera.org:8080/18211 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6f89ced9ffa4a79f46661873f01c38aefb1d78d5 Gerrit-Change-Number: 18211 Gerrit-PatchSet: 4 Gerrit-Owner: Hongjiang Zhang <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Hongjiang Zhang <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Comment-Date: Sat, 12 Feb 2022 01:33:21 +0000 Gerrit-HasComments: Yes
