Re: Why collect() has a stage but first() not?

David Thomas Wed, 19 Feb 2014 10:01:38 -0800

But my RDD is placed on the worker nodes. So how can driver perform the
action by itself?



On Wed, Feb 19, 2014 at 10:57 AM, Aaron Davidson <ilike...@gmail.com> wrote:

> first() is allowed to "run locally", which means that the driver will
> execute the action itself without launching any tasks. This is also true of
> take(n) for sufficiently small n, for instance.
>
>
> On Wed, Feb 19, 2014 at 9:55 AM, David Thomas <dt5434...@gmail.com> wrote:
>
>> If I perform a 'collect' action on the RDD, I can see a new stage getting
>> created in the spark web UI (http://master:4040/stages/), but when I do
>> a 'first' action, I don't see any stage getting created. However on the
>> console I see these lines:
>>
>> 14/02/19 10:51:31 INFO SparkContext: Starting job: first at xxx.scala:110
>> 14/02/19 10:51:31 INFO DAGScheduler: Got job 110 (first at xxx.scala:110)
>> with 1 output partitions (allowLocal=true)
>> 14/02/19 10:51:31 INFO DAGScheduler: Final stage: Stage 2 (first at
>> xxx.scala:110)
>> 14/02/19 10:51:31 INFO DAGScheduler: Parents of final stage: List()
>> 14/02/19 10:51:31 INFO DAGScheduler: Missing parents: List()
>>
>> So why doesn't the webUI list the stages created when I run the 'first'
>> action?
>>
>
>

Re: Why collect() has a stage but first() not?

Reply via email to