Adar Dembo has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12484 )

Change subject:  KUDU-2672: [spark] Optionally repartition to match Kudu 
partitions
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12484/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala
File 
java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala:

http://gerrit.cloudera.org:8080/#/c/12484/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@386
PS1, Line 386:     val keyedRdd = rdd.mapPartitions { rows =>
> I make these calls inside of mapPartitions, because each partition is a tas
I'd be fine with an approach here that mimics whatever Impala does. Given that 
the existing partitioner code was for C++, it's probably an Impala backend 
operation. But does it run once "per task", or just once for the entire query, 
passing the partition index map down to each individual node?


http://gerrit.cloudera.org:8080/#/c/12484/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@407
PS1, Line 407:     val shuffledRDD = if (writeOptions.repartitionSort) {
> Best I can tell we match Impala's functionality of sorting within a partiti
Quoting from the second page:

Starting from Impala 2.9, the INSERT or UPSERT operations into Kudu tables 
automatically add an exchange and a sort node to the plan that partitions and 
sorts the rows according to the partitioning/primary key scheme of the target 
table (unless the number of rows to be inserted is small enough to trigger 
single node execution).

Where's the "unless the number of rows to be inserted is small enough to 
trigger single node execution" part? Moreover, shouldn't repartition default to 
true if Spark is to mimic this Impala behavior?



--
To view, visit http://gerrit.cloudera.org:8080/12484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8763615997bccc08901235841149fc3bacb321e7
Gerrit-Change-Number: 12484
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Grant Henke <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Will Berkeley <[email protected]>
Gerrit-Comment-Date: Fri, 15 Feb 2019 18:17:53 +0000
Gerrit-HasComments: Yes

Reply via email to