Yes, RDD actions can be called only in the driver program, therefore only in the driver node. However, they can be parallelized within the driver program by calling multiple actions from multiple threads. The jobs corresponding to each action will be executed simultaneously in the Spark cluster, sharing the available resources.
TD On Sat, Jan 18, 2014 at 10:34 PM, Manoj Samel <[email protected]>wrote: > Are RDD actions like count etc. run only on driver node or can they be > parallelized ? > > Thanks, >
