Yes, RDD actions can be called only in the driver program, therefore only
in the driver node. However, they can be parallelized within the driver
program by calling multiple actions from multiple threads. The jobs
corresponding to each action will be executed simultaneously in the Spark
cluster, sharing the available resources.

TD



On Sat, Jan 18, 2014 at 10:34 PM, Manoj Samel <[email protected]>wrote:

> Are RDD actions like count etc. run only on driver node or can they be
> parallelized ?
>
> Thanks,
>

Reply via email to