Driver memory is default. Executor memory depends on job, the caller
decides how much memory to use. We don't specify --num-executors as we want
all cores assigned to the local master, since they were started by the
current user. No local executor.  --master=spark://localhost:someport. 1
core per executor.

On Thu, May 19, 2016 at 4:12 PM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Thanks Mathieu
>
> So it would be interesting to see what resources allocated in your case,
> especially the num-executors and executor-cores. I gather every node has
> enough memory and cores.
>
>
>
> ${SPARK_HOME}/bin/spark-submit \
>
>                 --master local[2] \
>
>                 --driver-memory 4g \
>
>                 --num-executors=1 \
>
>                 --executor-memory=4G \
>
>                 --executor-cores=2 \
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 19 May 2016 at 21:02, Mathieu Longtin <math...@closetwork.org> wrote:
>
>> The driver (the process started by spark-submit) runs locally. The
>> executors run on any of thousands of servers. So far, I haven't tried more
>> than 500 executors.
>>
>> Right now, I run a master on the same server as the driver.
>>
>> On Thu, May 19, 2016 at 3:49 PM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> ok so you are using some form of NFS mounted file system shared among
>>> the nodes and basically you start the processes through spark-submit.
>>>
>>> In Stand-alone mode, a simple cluster manager included with Spark. It
>>> does the management of resources so it is not clear to me what you are
>>> referring as worker manager here?
>>>
>>> This is my take from your model.
>>>  The application will go and grab all the cores in the cluster.
>>> You only have one worker that lives within the driver JVM process.
>>> The Driver node runs on the same host that the cluster manager is
>>> running. The Driver requests the Cluster Manager for resources to run
>>> tasks. In this case there is only one executor for the Driver? The Executor
>>> runs tasks for the Driver.
>>>
>>>
>>> HTH
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 19 May 2016 at 20:37, Mathieu Longtin <math...@closetwork.org> wrote:
>>>
>>>> No master and no node manager, just the processes that do actual work.
>>>>
>>>> We use the "stand alone" version because we have a shared file system
>>>> and a way of allocating computing resources already (Univa Grid Engine). If
>>>> an executor were to die, we have other ways of restarting it, we don't need
>>>> the worker manager to deal with it.
>>>>
>>>> On Thu, May 19, 2016 at 3:16 PM Mich Talebzadeh <
>>>> mich.talebza...@gmail.com> wrote:
>>>>
>>>>> Hi Mathieu
>>>>>
>>>>> What does this approach provide that the norm lacks?
>>>>>
>>>>> So basically each node has its master in this model.
>>>>>
>>>>> Are these supposed to be individual stand alone servers?
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * 
>>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>> On 19 May 2016 at 18:45, Mathieu Longtin <math...@closetwork.org>
>>>>> wrote:
>>>>>
>>>>>> First a bit of context:
>>>>>> We use Spark on a platform where each user start workers as needed.
>>>>>> This has the advantage that all permission management is handled by the 
>>>>>> OS,
>>>>>> so the users can only read files they have permission to.
>>>>>>
>>>>>> To do this, we have some utility that does the following:
>>>>>> - start a master
>>>>>> - start worker managers on a number of servers
>>>>>> - "submit" the Spark driver program
>>>>>> - the driver then talks to the master, tell it how many executors it
>>>>>> needs
>>>>>> - the master tell the worker nodes to start executors and talk to the
>>>>>> driver
>>>>>> - the executors are started
>>>>>>
>>>>>> From here on, the master doesn't do much, neither do the process
>>>>>> manager on the worker nodes.
>>>>>>
>>>>>> What I would like to do is simplify this to:
>>>>>> - Start the driver program
>>>>>> - Start executors on a number of servers, telling them where to find
>>>>>> the driver
>>>>>> - The executors connect directly to the driver
>>>>>>
>>>>>> Is there a way I could do this without the master and worker managers?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Mathieu Longtin
>>>>>> 1-514-803-8977
>>>>>>
>>>>>
>>>>> --
>>>> Mathieu Longtin
>>>> 1-514-803-8977
>>>>
>>>
>>> --
>> Mathieu Longtin
>> 1-514-803-8977
>>
>
> --
Mathieu Longtin
1-514-803-8977

Reply via email to