But my RDD is placed on the worker nodes. So how can driver perform the action by itself?
On Wed, Feb 19, 2014 at 10:57 AM, Aaron Davidson <ilike...@gmail.com> wrote: > first() is allowed to "run locally", which means that the driver will > execute the action itself without launching any tasks. This is also true of > take(n) for sufficiently small n, for instance. > > > On Wed, Feb 19, 2014 at 9:55 AM, David Thomas <dt5434...@gmail.com> wrote: > >> If I perform a 'collect' action on the RDD, I can see a new stage getting >> created in the spark web UI (http://master:4040/stages/), but when I do >> a 'first' action, I don't see any stage getting created. However on the >> console I see these lines: >> >> 14/02/19 10:51:31 INFO SparkContext: Starting job: first at xxx.scala:110 >> 14/02/19 10:51:31 INFO DAGScheduler: Got job 110 (first at xxx.scala:110) >> with 1 output partitions (allowLocal=true) >> 14/02/19 10:51:31 INFO DAGScheduler: Final stage: Stage 2 (first at >> xxx.scala:110) >> 14/02/19 10:51:31 INFO DAGScheduler: Parents of final stage: List() >> 14/02/19 10:51:31 INFO DAGScheduler: Missing parents: List() >> >> So why doesn't the webUI list the stages created when I run the 'first' >> action? >> > >