Hi,
With RDDs it was possible to define a partitioner for two RDDS and given
that two RDDs have the same partitioner, a join operation would be
performed local to the partition without shuffling.

Can dataset joins be optimized in the same way ?
Is it enough to repartition two datasets on the the same column?

Thank you.
Daniel

Reply via email to