Optimizing dataset joins

Daniel Haviv Thu, 18 May 2017 01:47:11 -0700

Hi,
With RDDs it was possible to define a partitioner for two RDDS and given
that two RDDs have the same partitioner, a join operation would be
performed local to the partition without shuffling.


Can dataset joins be optimized in the same way ?
Is it enough to repartition two datasets on the the same column?

Thank you.
Daniel

Optimizing dataset joins

Reply via email to