Fwd: [Spark Dataset]: How to conduct co-partition join in the new Dataset API in Spark 2.0

w.zhaokang Thu, 01 Dec 2016 21:26:36 -0800

Hi all,

In the old Spark RDD API, key-value PairRDDs can be co-partitioned to avoid
shuffle thus bringing us high join performance.


In the new Dataset API in Spark 2.0, is the high performance shuffle-free
join by co-partition mechanism still feasible? I have looked through the
API doc but failed. Will the Catalyst Optimizer handle the co-partition in
its query plan optimization process?

Thanks a lot if anyone can provide any clue on the problem :-)

Zhaokang(Dale) Wang




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Fwd-Spark-Dataset-How-to-conduct-co-partition-join-in-the-new-Dataset-API-in-Spark-2-0-tp28152.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Fwd: [Spark Dataset]: How to conduct co-partition join in the new Dataset API in Spark 2.0

Reply via email to