Hi all, In the old Spark RDD API, key-value PairRDDs can be co-partitioned to avoid shuffle thus bringing us high join performance.
In the new Dataset API in Spark 2.0, is the high performance shuffle-free join by co-partition mechanism still feasible? I have looked through the API doc but failed. Will the Catalyst Optimizer handle the co-partition in its query plan optimization process? Thanks a lot if anyone can provide any clue on the problem :-) Zhaokang(Dale) Wang