Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12484 )
Change subject: KUDU-2672: [spark] Optionally repartition to match Kudu partitions ...................................................................... Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/12484/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala File java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala: http://gerrit.cloudera.org:8080/#/c/12484/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@386 PS1, Line 386: val converter = new RowConverter(table.getSchema, schema, writeOptions.ignoreNull) > With a brief look at Impala's KuduPartitionExpr and ScalarExpr implementati I also talked to Thomas (an Impala dev) about their approach. He agreed that every Impala node participating in a query uses a KuduPartitioner to look up its own row information. He mentioned that some users had complained that this is slow, but that in practice it's more useful than not. So I think proceeding with what you have here is OK, at least for now. There's nothing here that precludes us from sharing the KuduPartitioner in the future, right? http://gerrit.cloudera.org:8080/#/c/12484/1/java/kudu-spark/src/main/scala/org/apache/kudu/spark/kudu/KuduContext.scala@407 PS1, Line 407: val shuffledRDD = if (writeOptions.repartitionSort) { > I don't think I have the capability in Spark to get an "estimate" of rows b OK, but should we at least change the default for repartitionSort to true? -- To view, visit http://gerrit.cloudera.org:8080/12484 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8763615997bccc08901235841149fc3bacb321e7 Gerrit-Change-Number: 12484 Gerrit-PatchSet: 3 Gerrit-Owner: Grant Henke <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Will Berkeley <[email protected]> Gerrit-Comment-Date: Mon, 25 Feb 2019 20:42:36 +0000 Gerrit-HasComments: Yes
