Yes. However, those jobs will share the available cores in the N worker nodes. Depending on the resource requirements of the jobs, each job may run slower than what it would have if they were not sharing with other jobs. To ensure a fair share of resources between the concurrent jobs, you can turn on fair scheduler. Please see http://spark.incubator.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application
TD On Sun, Jan 19, 2014 at 10:01 AM, Manoj Samel <[email protected]>wrote: > So each action (in driver node) creates a job that can still be executed > by 1:N worker node(s) ? > > > On Sat, Jan 18, 2014 at 10:56 PM, Tathagata Das < > [email protected]> wrote: > >> Yes, RDD actions can be called only in the driver program, therefore only >> in the driver node. However, they can be parallelized within the driver >> program by calling multiple actions from multiple threads. The jobs >> corresponding to each action will be executed simultaneously in the Spark >> cluster, sharing the available resources. >> >> TD >> >> >> >> >> On Sat, Jan 18, 2014 at 10:34 PM, Manoj Samel >> <[email protected]>wrote: >> >>> Are RDD actions like count etc. run only on driver node or can they be >>> parallelized ? >>> >>> Thanks, >>> >> >> >
