Yes, I think this is a known issue, that sortByKey actually runs a job
to assess the distribution of the data.
https://issues.apache.org/jira/browse/SPARK-1021 I think further eyes
on it would be welcome as it's not desirable.
On Fri, Apr 24, 2015 at 9:57 AM, Spico Florin wrote:
> I have tested s
I have tested sortByKey method with the following code and I have observed
that is triggering a new job when is called. I could find this in the
neither in API nor in the code. Is this an indented behavior? For example,
the RDD zipWithIndex method API specifies that will trigger a new job. But
what