Yes. However, those jobs will share the available cores in the N worker
nodes. Depending on the resource requirements of the jobs, each job may run
slower than what it would have if they were not sharing with other jobs. To
ensure a fair share of resources between the concurrent jobs, you can turn
on fair scheduler. Please see
http://spark.incubator.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application

TD


On Sun, Jan 19, 2014 at 10:01 AM, Manoj Samel <[email protected]>wrote:

> So each action (in driver node) creates a job that can still be executed
> by 1:N worker node(s) ?
>
>
> On Sat, Jan 18, 2014 at 10:56 PM, Tathagata Das <
> [email protected]> wrote:
>
>> Yes, RDD actions can be called only in the driver program, therefore only
>> in the driver node. However, they can be parallelized within the driver
>> program by calling multiple actions from multiple threads. The jobs
>> corresponding to each action will be executed simultaneously in the Spark
>> cluster, sharing the available resources.
>>
>> TD
>>
>>
>>
>>
>> On Sat, Jan 18, 2014 at 10:34 PM, Manoj Samel 
>> <[email protected]>wrote:
>>
>>> Are RDD actions like count etc. run only on driver node or can they be
>>> parallelized ?
>>>
>>> Thanks,
>>>
>>
>>
>

Reply via email to