Yes, I think this is a known issue, that sortByKey actually runs a job
to assess the distribution of the data.
https://issues.apache.org/jira/browse/SPARK-1021 I think further eyes
on it would be welcome as it's not desirable.

On Fri, Apr 24, 2015 at 9:57 AM, Spico Florin <spicoflo...@gmail.com> wrote:
> I have tested sortByKey method with the following code and I have observed
> that is triggering a new job when is called. I could find this in the
> neither in API nor in the code. Is this an indented behavior? For example,
> the RDD zipWithIndex method API specifies that will trigger a new job. But
> what about sortByKey?
>
> val sc = new SparkContext(new SparkConf().setAppName("Spark Count"))
> val l =sc.parallelize(List((5,'c'),(2,'d'),(1,'a'),(7,'e')), 3)
>
> l.sortByKey()
>
> Thanks for your answers.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to