Driver memory is default. Executor memory depends on job, the caller decides how much memory to use. We don't specify --num-executors as we want all cores assigned to the local master, since they were started by the current user. No local executor. --master=spark://localhost:someport. 1 core per executor.
On Thu, May 19, 2016 at 4:12 PM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Thanks Mathieu > > So it would be interesting to see what resources allocated in your case, > especially the num-executors and executor-cores. I gather every node has > enough memory and cores. > > > > ${SPARK_HOME}/bin/spark-submit \ > > --master local[2] \ > > --driver-memory 4g \ > > --num-executors=1 \ > > --executor-memory=4G \ > > --executor-cores=2 \ > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 19 May 2016 at 21:02, Mathieu Longtin <math...@closetwork.org> wrote: > >> The driver (the process started by spark-submit) runs locally. The >> executors run on any of thousands of servers. So far, I haven't tried more >> than 500 executors. >> >> Right now, I run a master on the same server as the driver. >> >> On Thu, May 19, 2016 at 3:49 PM Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> ok so you are using some form of NFS mounted file system shared among >>> the nodes and basically you start the processes through spark-submit. >>> >>> In Stand-alone mode, a simple cluster manager included with Spark. It >>> does the management of resources so it is not clear to me what you are >>> referring as worker manager here? >>> >>> This is my take from your model. >>> The application will go and grab all the cores in the cluster. >>> You only have one worker that lives within the driver JVM process. >>> The Driver node runs on the same host that the cluster manager is >>> running. The Driver requests the Cluster Manager for resources to run >>> tasks. In this case there is only one executor for the Driver? The Executor >>> runs tasks for the Driver. >>> >>> >>> HTH >>> >>> Dr Mich Talebzadeh >>> >>> >>> >>> LinkedIn * >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>> >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>> >>> On 19 May 2016 at 20:37, Mathieu Longtin <math...@closetwork.org> wrote: >>> >>>> No master and no node manager, just the processes that do actual work. >>>> >>>> We use the "stand alone" version because we have a shared file system >>>> and a way of allocating computing resources already (Univa Grid Engine). If >>>> an executor were to die, we have other ways of restarting it, we don't need >>>> the worker manager to deal with it. >>>> >>>> On Thu, May 19, 2016 at 3:16 PM Mich Talebzadeh < >>>> mich.talebza...@gmail.com> wrote: >>>> >>>>> Hi Mathieu >>>>> >>>>> What does this approach provide that the norm lacks? >>>>> >>>>> So basically each node has its master in this model. >>>>> >>>>> Are these supposed to be individual stand alone servers? >>>>> >>>>> >>>>> Thanks >>>>> >>>>> >>>>> Dr Mich Talebzadeh >>>>> >>>>> >>>>> >>>>> LinkedIn * >>>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>>>> >>>>> >>>>> >>>>> http://talebzadehmich.wordpress.com >>>>> >>>>> >>>>> >>>>> On 19 May 2016 at 18:45, Mathieu Longtin <math...@closetwork.org> >>>>> wrote: >>>>> >>>>>> First a bit of context: >>>>>> We use Spark on a platform where each user start workers as needed. >>>>>> This has the advantage that all permission management is handled by the >>>>>> OS, >>>>>> so the users can only read files they have permission to. >>>>>> >>>>>> To do this, we have some utility that does the following: >>>>>> - start a master >>>>>> - start worker managers on a number of servers >>>>>> - "submit" the Spark driver program >>>>>> - the driver then talks to the master, tell it how many executors it >>>>>> needs >>>>>> - the master tell the worker nodes to start executors and talk to the >>>>>> driver >>>>>> - the executors are started >>>>>> >>>>>> From here on, the master doesn't do much, neither do the process >>>>>> manager on the worker nodes. >>>>>> >>>>>> What I would like to do is simplify this to: >>>>>> - Start the driver program >>>>>> - Start executors on a number of servers, telling them where to find >>>>>> the driver >>>>>> - The executors connect directly to the driver >>>>>> >>>>>> Is there a way I could do this without the master and worker managers? >>>>>> >>>>>> Thanks! >>>>>> >>>>>> >>>>>> -- >>>>>> Mathieu Longtin >>>>>> 1-514-803-8977 >>>>>> >>>>> >>>>> -- >>>> Mathieu Longtin >>>> 1-514-803-8977 >>>> >>> >>> -- >> Mathieu Longtin >> 1-514-803-8977 >> > > -- Mathieu Longtin 1-514-803-8977