Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12484 )
Change subject: KUDU-2672: [spark] Optionally repartition to match Kudu partitions ...................................................................... Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/12484/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala: http://gerrit.cloudera.org:8080/#/c/12484/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@386 PS1, Line 386: val keyedRdd = rdd.mapPartitions { rows => > I make these calls inside of mapPartitions, because each partition is a tas I'd be fine with an approach here that mimics whatever Impala does. Given that the existing partitioner code was for C++, it's probably an Impala backend operation. But does it run once "per task", or just once for the entire query, passing the partition index map down to each individual node? http://gerrit.cloudera.org:8080/#/c/12484/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@407 PS1, Line 407: val shuffledRDD = if (writeOptions.repartitionSort) { > Best I can tell we match Impala's functionality of sorting within a partiti Quoting from the second page: Starting from Impala 2.9, the INSERT or UPSERT operations into Kudu tables automatically add an exchange and a sort node to the plan that partitions and sorts the rows according to the partitioning/primary key scheme of the target table (unless the number of rows to be inserted is small enough to trigger single node execution). Where's the "unless the number of rows to be inserted is small enough to trigger single node execution" part? Moreover, shouldn't repartition default to true if Spark is to mimic this Impala behavior? -- To view, visit http://gerrit.cloudera.org:8080/12484 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8763615997bccc08901235841149fc3bacb321e7 Gerrit-Change-Number: 12484 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Will Berkeley <[email protected]> Gerrit-Comment-Date: Fri, 15 Feb 2019 18:17:53 +0000 Gerrit-HasComments: Yes
