Hi, Answering my own question after...searching sortByKey in the mailing list archives and later in JIRA.
It turns out it's a known issue and filed under https://issues.apache.org/jira/browse/SPARK-1021 "sortByKey() launches a cluster job when it shouldn't". It's labelled "starter" that should not be that hard to fix. Does this still hold? I'd like to work on it if it's "simple" and doesn't get me swamped. Thanks! Pozdrawiam, Jacek -- Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl Follow me at https://twitter.com/jaceklaskowski Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski On Mon, Nov 2, 2015 at 2:34 PM, Jacek Laskowski <ja...@japila.pl> wrote: > Hi Sparkians, > > I use the latest Spark 1.6.0-SNAPSHOT in spark-shell with the default > local[*] master. > > I created an RDD of pairs using the following snippet: > > val rdd = sc.parallelize(0 to 5).map(n => (n, util.Random.nextBoolean)) > > It's all fine so far. The map transformation causes no computation. > > I thought all transformations are lazy and trigger no job until an > action's called. It seems I was wrong with sortByKey()! When I called > `rdd.sortByKey()`, it started a job: sortByKey at <console>:27 (!) > > Can anyone explain what makes for the different behaviour of sortByKey > since it is a transformation and hence should be lazy? Is this a > special transformation? > > Pozdrawiam, > Jacek > > -- > Jacek Laskowski | http://blog.japila.pl | http://blog.jaceklaskowski.pl > Follow me at https://twitter.com/jaceklaskowski > Upvote at http://stackoverflow.com/users/1305344/jacek-laskowski --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org