Yes, I think this is a known issue, that sortByKey actually runs a job to assess the distribution of the data. https://issues.apache.org/jira/browse/SPARK-1021 I think further eyes on it would be welcome as it's not desirable.
On Fri, Apr 24, 2015 at 9:57 AM, Spico Florin <spicoflo...@gmail.com> wrote: > I have tested sortByKey method with the following code and I have observed > that is triggering a new job when is called. I could find this in the > neither in API nor in the code. Is this an indented behavior? For example, > the RDD zipWithIndex method API specifies that will trigger a new job. But > what about sortByKey? > > val sc = new SparkContext(new SparkConf().setAppName("Spark Count")) > val l =sc.parallelize(List((5,'c'),(2,'d'),(1,'a'),(7,'e')), 3) > > l.sortByKey() > > Thanks for your answers. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org