Re: Spark application doesn't scale to worker nodes

Jakub Stransky Mon, 04 Jul 2016 09:06:07 -0700

Hi Mich,

I have set up spark default configuration in conf directory
spark-defaults.conf where I specify master hence no need to put it in
command line
spark.master   spark://spark.master:7077


the same applies to driver memory which has been increased to 4GB
 and the same is for spark.executor.memory 12GB as machines have 16GB

Jakub




On 4 July 2016 at 17:44, Mich Talebzadeh <mich.talebza...@gmail.com> wrote:

> Hi Jakub,
>
> In standalone mode Spark does the resource management. Which version of
> Spark are you running?
>
> How do you define your SparkConf() parameters for example setMaster etc.
>
> From
>
> spark-submit --driver-class-path spark/sqljdbc4.jar --class DemoApp
> SparkPOC.jar 10 4.3
>
> I did not see any executor, memory allocation, so I assume you are
> allocating them somewhere else?
>
> HTH
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 4 July 2016 at 16:31, Jakub Stransky <stransky...@gmail.com> wrote:
>
>> Hello,
>>
>> I have a spark cluster consisting of 4 nodes in a standalone mode, master
>> + 3 workers nodes with configured available memory and cpus etc.
>>
>> I have an spark application which is essentially a MLlib pipeline for
>> training a classifier, in this case RandomForest  but could be a
>> DecesionTree just for the sake of simplicity.
>>
>> But when I submit the spark application to the cluster via spark submit
>> it is running out of memory. Even though the executors are "taken"/created
>> in the cluster they are esentially doing nothing ( poor cpu, nor memory
>> utilization) while the master seems to do all the work which finally
>> results in OOM.
>>
>> My submission is following:
>> spark-submit --driver-class-path spark/sqljdbc4.jar --class DemoApp
>> SparkPOC.jar 10 4.3
>>
>> I am submitting from the master node.
>>
>> By default it is running in client mode which the driver process is
>> attached to spark-shell.
>>
>> Do I need to set up some settings to make MLlib algos parallelized and
>> distributed as well or all is driven by parallel factor set on dataframe
>> with input data?
>>
>> Essentially it seems that all work is just done on master and the rest is
>> idle.
>> Any hints what to check?
>>
>> Thx
>> Jakub
>>
>>
>>
>>
>


-- 
Jakub Stransky
cz.linkedin.com/in/jakubstransky

Re: Spark application doesn't scale to worker nodes

Reply via email to