Hi, With RDDs it was possible to define a partitioner for two RDDS and given that two RDDs have the same partitioner, a join operation would be performed local to the partition without shuffling.
Can dataset joins be optimized in the same way ? Is it enough to repartition two datasets on the the same column? Thank you. Daniel