Re: Spark RDD sortByKey triggering a new job

Sean Owen Fri, 24 Apr 2015 07:16:57 -0700

Yes, I think this is a known issue, that sortByKey actually runs a job
to assess the distribution of the data.
https://issues.apache.org/jira/browse/SPARK-1021 I think further eyes
on it would be welcome as it's not desirable.


On Fri, Apr 24, 2015 at 9:57 AM, Spico Florin <spicoflo...@gmail.com> wrote:
> I have tested sortByKey method with the following code and I have observed
> that is triggering a new job when is called. I could find this in the
> neither in API nor in the code. Is this an indented behavior? For example,
> the RDD zipWithIndex method API specifies that will trigger a new job. But
> what about sortByKey?
>
> val sc = new SparkContext(new SparkConf().setAppName("Spark Count"))
> val l =sc.parallelize(List((5,'c'),(2,'d'),(1,'a'),(7,'e')), 3)
>
> l.sortByKey()
>
> Thanks for your answers.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark RDD sortByKey triggering a new job

Reply via email to