Thanks Mathieu

So it would be interesting to see what resources allocated in your case,
especially the num-executors and executor-cores. I gather every node has
enough memory and cores.



${SPARK_HOME}/bin/spark-submit \

                --master local[2] \

                --driver-memory 4g \

                --num-executors=1 \

                --executor-memory=4G \

                --executor-cores=2 \

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 19 May 2016 at 21:02, Mathieu Longtin <math...@closetwork.org> wrote:

> The driver (the process started by spark-submit) runs locally. The
> executors run on any of thousands of servers. So far, I haven't tried more
> than 500 executors.
>
> Right now, I run a master on the same server as the driver.
>
> On Thu, May 19, 2016 at 3:49 PM Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> ok so you are using some form of NFS mounted file system shared among the
>> nodes and basically you start the processes through spark-submit.
>>
>> In Stand-alone mode, a simple cluster manager included with Spark. It
>> does the management of resources so it is not clear to me what you are
>> referring as worker manager here?
>>
>> This is my take from your model.
>>  The application will go and grab all the cores in the cluster.
>> You only have one worker that lives within the driver JVM process.
>> The Driver node runs on the same host that the cluster manager is
>> running. The Driver requests the Cluster Manager for resources to run
>> tasks. In this case there is only one executor for the Driver? The Executor
>> runs tasks for the Driver.
>>
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 19 May 2016 at 20:37, Mathieu Longtin <math...@closetwork.org> wrote:
>>
>>> No master and no node manager, just the processes that do actual work.
>>>
>>> We use the "stand alone" version because we have a shared file system
>>> and a way of allocating computing resources already (Univa Grid Engine). If
>>> an executor were to die, we have other ways of restarting it, we don't need
>>> the worker manager to deal with it.
>>>
>>> On Thu, May 19, 2016 at 3:16 PM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>> Hi Mathieu
>>>>
>>>> What does this approach provide that the norm lacks?
>>>>
>>>> So basically each node has its master in this model.
>>>>
>>>> Are these supposed to be individual stand alone servers?
>>>>
>>>>
>>>> Thanks
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * 
>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 19 May 2016 at 18:45, Mathieu Longtin <math...@closetwork.org>
>>>> wrote:
>>>>
>>>>> First a bit of context:
>>>>> We use Spark on a platform where each user start workers as needed.
>>>>> This has the advantage that all permission management is handled by the 
>>>>> OS,
>>>>> so the users can only read files they have permission to.
>>>>>
>>>>> To do this, we have some utility that does the following:
>>>>> - start a master
>>>>> - start worker managers on a number of servers
>>>>> - "submit" the Spark driver program
>>>>> - the driver then talks to the master, tell it how many executors it
>>>>> needs
>>>>> - the master tell the worker nodes to start executors and talk to the
>>>>> driver
>>>>> - the executors are started
>>>>>
>>>>> From here on, the master doesn't do much, neither do the process
>>>>> manager on the worker nodes.
>>>>>
>>>>> What I would like to do is simplify this to:
>>>>> - Start the driver program
>>>>> - Start executors on a number of servers, telling them where to find
>>>>> the driver
>>>>> - The executors connect directly to the driver
>>>>>
>>>>> Is there a way I could do this without the master and worker managers?
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>> --
>>>>> Mathieu Longtin
>>>>> 1-514-803-8977
>>>>>
>>>>
>>>> --
>>> Mathieu Longtin
>>> 1-514-803-8977
>>>
>>
>> --
> Mathieu Longtin
> 1-514-803-8977
>

Reply via email to