Re: Spark application doesn't scale to worker nodes

2016-07-05 Thread Mich Talebzadeh
Well that is what the OP stated.

I have a spark cluster consisting of 4 nodes in a standalone mode,..

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 5 July 2016 at 19:24, Michael Segel  wrote:

> Did the OP say he was running a stand alone cluster of Spark, or on Yarn?
>
>
> On Jul 5, 2016, at 10:22 AM, Mich Talebzadeh 
> wrote:
>
> Hi Jakub,
>
> Any reason why you are running in standalone mode, given that your are
> familiar with YARN?
>
> In theory your settings are correct. I checked your environment tab
> settings and they look correct.
>
> I assume you have checked this link
>
> http://spark.apache.org/docs/latest/spark-standalone.html
>
> BTW is this issue confined to ML or any other Spark application exhibits
> the same behaviour in standalone mode?
>
>
> HTH
>
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
> http://talebzadehmich.wordpress.com
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 5 July 2016 at 11:17, Jacek Laskowski  wrote:
>
>> Hi Jakub,
>>
>> You're correct - spark.masterspark://master.clust:7077 - proves your
>> point. You're running Spark Standalone that was set in
>> conf/spark-defaults.conf perhaps.
>>
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>> On Tue, Jul 5, 2016 at 12:04 PM, Jakub Stransky 
>> wrote:
>>
>>> Hello,
>>>
>>> I am convinced that we are not running in local mode:
>>>
>>> Runtime Information
>>>
>>> NameValue
>>> Java Home/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre
>>> Java Version1.7.0_65 (Oracle Corporation)
>>> Scala Versionversion 2.10.5
>>> Spark Properties
>>>
>>> NameValue
>>> spark.app.idapp-20160704121044-0003
>>> spark.app.nameDemoApp
>>> spark.driver.extraClassPath/home/sparkuser/sqljdbc4.jar
>>> spark.driver.host10.2.0.4
>>> spark.driver.memory4g
>>> spark.driver.port59493
>>> spark.executor.extraClassPath/usr/local/spark-1.6.1/sqljdbc4.jar
>>> spark.executor.iddriver
>>> spark.executor.memory12g
>>> spark.externalBlockStore.folderName
>>>  spark-5630dd34-4267-462e-882e-b382832bb500
>>> spark.jarsfile:/home/sparkuser/SparkPOC.jar
>>> spark.masterspark://master.clust:7077
>>> spark.scheduler.modeFIFO
>>> spark.submit.deployModeclient
>>> System Properties
>>>
>>> NameValue
>>> SPARK_SUBMITtrue
>>> awt.toolkitsun.awt.X11.XToolkit
>>> file.encodingUTF-8
>>> file.encoding.pkgsun.io
>>> file.separator/
>>> java.awt.graphicsenvsun.awt.X11GraphicsEnvironment
>>> java.awt.printerjobsun.print.PSPrinterJob
>>> java.class.version51.0
>>> java.endorsed.dirs
>>>  /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/endorsed
>>> java.ext.dirs
>>>  
>>> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/ext:/usr/java/packages/lib/ext
>>> java.home/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre
>>> java.io.tmpdir/tmp
>>> java.library.path
>>>  /usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
>>> java.runtime.nameOpenJDK Runtime Environment
>>> java.runtime.version1.7.0_65-mockbuild_2014_07_16_06_06-b00
>>> java.specification.nameJava Platform API Specification
>>> java.specification.vendorOracle Corporation
>>> java.specification.version1.7
>>> java.vendorOracle Corporation
>>> java.vendor.urlhttp://java.oracle.com/
>>> java.vendor.url.bughttp://bugreport.sun.com/bugreport/
>>> java.version1.7.0_65
>>> java.vm.infomixed mode
>>> java.vm.nameOpenJDK 64-Bit Server VM
>>> java.vm.specification.nameJava Virtual Machine Specification
>>> java.vm.specification.vendorOracle Corporation
>>> java.vm.specification.version1.7
>>> java.vm.vendorOracle Corporation
>>> java.vm.version24.65-b04
>>> line.separator
>>> 

Re: Spark application doesn't scale to worker nodes

2016-07-05 Thread Michael Segel
Did the OP say he was running a stand alone cluster of Spark, or on Yarn? 


> On Jul 5, 2016, at 10:22 AM, Mich Talebzadeh  
> wrote:
> 
> Hi Jakub,
> 
> Any reason why you are running in standalone mode, given that your are 
> familiar with YARN?
> 
> In theory your settings are correct. I checked your environment tab settings 
> and they look correct.
> 
> I assume you have checked this link
> 
> http://spark.apache.org/docs/latest/spark-standalone.html 
> 
> 
> BTW is this issue confined to ML or any other Spark application exhibits the 
> same behaviour in standalone mode?
> 
> 
> HTH
> 
> 
> 
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> 
>  
> http://talebzadehmich.wordpress.com 
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destruction of data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction.
>  
> 
> On 5 July 2016 at 11:17, Jacek Laskowski  > wrote:
> Hi Jakub,
> 
> You're correct - spark.masterspark://master.clust:7077 - proves your 
> point. You're running Spark Standalone that was set in 
> conf/spark-defaults.conf perhaps.
> 
> 
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/ 
> Mastering Apache Spark http://bit.ly/mastering-apache-spark 
> 
> Follow me at https://twitter.com/jaceklaskowski 
> 
> 
> On Tue, Jul 5, 2016 at 12:04 PM, Jakub Stransky  > wrote:
> Hello,
> 
> I am convinced that we are not running in local mode:
> 
> Runtime Information
> 
> NameValue
> Java Home/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre
> Java Version1.7.0_65 (Oracle Corporation)
> Scala Versionversion 2.10.5
> Spark Properties
> 
> NameValue
> spark.app.id app-20160704121044-0003
> spark.app.name DemoApp
> spark.driver.extraClassPath/home/sparkuser/sqljdbc4.jar
> spark.driver.host10.2.0.4
> spark.driver.memory4g
> spark.driver.port59493
> spark.executor.extraClassPath/usr/local/spark-1.6.1/sqljdbc4.jar
> spark.executor.id driver
> spark.executor.memory12g
> spark.externalBlockStore.folderName
> spark-5630dd34-4267-462e-882e-b382832bb500
> spark.jarsfile:/home/sparkuser/SparkPOC.jar
> spark.masterspark://master.clust:7077
> spark.scheduler.modeFIFO
> spark.submit.deployModeclient
> System Properties
> 
> NameValue
> SPARK_SUBMITtrue
> awt.toolkitsun.awt.X11.XToolkit
> file.encodingUTF-8
> file.encoding.pkgsun.io 
> file.separator/
> java.awt.graphicsenvsun.awt.X11GraphicsEnvironment
> java.awt.printerjobsun.print.PSPrinterJob
> java.class.version51.0
> java.endorsed.dirs
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/endorsed
> java.ext.dirs
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/ext:/usr/java/packages/lib/ext
> java.home/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre
> java.io.tmpdir/tmp
> java.library.path
> /usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
> java.runtime.name OpenJDK Runtime Environment
> java.runtime.version1.7.0_65-mockbuild_2014_07_16_06_06-b00
> java.specification.name Java Platform 
> API Specification
> java.specification.vendorOracle Corporation
> java.specification.version1.7
> java.vendorOracle Corporation
> java.vendor.urlhttp://java.oracle.com/ 
> java.vendor.url.bughttp://bugreport.sun.com/bugreport/ 
> 
> java.version1.7.0_65
> java.vm.info mixed mode
> java.vm.name OpenJDK 64-Bit Server VM
> java.vm.specification.name Java 
> Virtual Machine Specification
> java.vm.specification.vendorOracle Corporation
> java.vm.specification.version1.7
> java.vm.vendorOracle Corporation
> java.vm.version24.65-b04
> line.separator
> os.archamd64
> os.name Linux
> os.version2.6.32-431.29.2.el6.x86_64
> path.separator:
> sun.arch.data.model64
> sun.boot.class.path
> 

Re: Spark application doesn't scale to worker nodes

2016-07-05 Thread Mathieu Longtin
>From experience, here's the kind of things that cause the driver to run out
of memory:
- Way too many partitions (1 and up)
- Something like this:
data = load_large_data()
rdd = sc.parallelize(data)

- Any call to rdd.collect() or rdd.take(N) where the resulting data is
bigger than driver memory.
- rdd.limit(N) seems to crash on large N.

Btw, in this context, Java's memory requirement are in the order of 10X
what the raw data requires. So if you have a CSV with 1 million lines of
1KB each, expect the JVM to require 10GB to load it, not 1GB. This is not
an exact number, just an impression from observing what crashes the driver
when doing rdd.collect().

On Tue, Jul 5, 2016 at 11:19 AM Jakub Stransky 
wrote:

> So now that we clarified that all is submitted at cluster standalone mode
> what is left when the application (ML pipeline) doesn't take advantage of
> full cluster power but essentially running just on master node until
> resources are exhausted. Why training ml Decesion Tree doesn't scale to the
> rest of the cluster?
>
> On 5 July 2016 at 12:17, Jacek Laskowski  wrote:
>
>> Hi Jakub,
>>
>> You're correct - spark.masterspark://master.clust:7077 - proves your
>> point. You're running Spark Standalone that was set in
>> conf/spark-defaults.conf perhaps.
>>
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> 
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>> On Tue, Jul 5, 2016 at 12:04 PM, Jakub Stransky 
>> wrote:
>>
>>> Hello,
>>>
>>> I am convinced that we are not running in local mode:
>>>
>>> Runtime Information
>>>
>>> NameValue
>>> Java Home/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre
>>> Java Version1.7.0_65 (Oracle Corporation)
>>> Scala Versionversion 2.10.5
>>> Spark Properties
>>>
>>> NameValue
>>> spark.app.idapp-20160704121044-0003
>>> spark.app.nameDemoApp
>>> spark.driver.extraClassPath/home/sparkuser/sqljdbc4.jar
>>> spark.driver.host10.2.0.4
>>> spark.driver.memory4g
>>> spark.driver.port59493
>>> spark.executor.extraClassPath/usr/local/spark-1.6.1/sqljdbc4.jar
>>> spark.executor.iddriver
>>> spark.executor.memory12g
>>> spark.externalBlockStore.folderName
>>>  spark-5630dd34-4267-462e-882e-b382832bb500
>>> spark.jarsfile:/home/sparkuser/SparkPOC.jar
>>> spark.masterspark://master.clust:7077
>>> spark.scheduler.modeFIFO
>>> spark.submit.deployModeclient
>>> System Properties
>>>
>>> NameValue
>>> SPARK_SUBMITtrue
>>> awt.toolkitsun.awt.X11.XToolkit
>>> file.encodingUTF-8
>>> file.encoding.pkgsun.io
>>> file.separator/
>>> java.awt.graphicsenvsun.awt.X11GraphicsEnvironment
>>> java.awt.printerjobsun.print.PSPrinterJob
>>> java.class.version51.0
>>> java.endorsed.dirs
>>>  /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/endorsed
>>> java.ext.dirs
>>>  
>>> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/ext:/usr/java/packages/lib/ext
>>> java.home/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre
>>> java.io.tmpdir/tmp
>>> java.library.path
>>>  /usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
>>> java.runtime.nameOpenJDK Runtime Environment
>>> java.runtime.version1.7.0_65-mockbuild_2014_07_16_06_06-b00
>>> java.specification.nameJava Platform API Specification
>>> java.specification.vendorOracle Corporation
>>> java.specification.version1.7
>>> java.vendorOracle Corporation
>>> java.vendor.urlhttp://java.oracle.com/
>>> java.vendor.url.bughttp://bugreport.sun.com/bugreport/
>>> java.version1.7.0_65
>>> java.vm.infomixed mode
>>> java.vm.nameOpenJDK 64-Bit Server VM
>>> java.vm.specification.nameJava Virtual Machine Specification
>>> java.vm.specification.vendorOracle Corporation
>>> java.vm.specification.version1.7
>>> java.vm.vendorOracle Corporation
>>> java.vm.version24.65-b04
>>> line.separator
>>> os.archamd64
>>> os.nameLinux
>>> os.version2.6.32-431.29.2.el6.x86_64
>>> path.separator:
>>> sun.arch.data.model64
>>> sun.boot.class.path
>>>  
>>> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/resources.jar:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/rt.jar:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/sunrsasign.jar:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/jsse.jar:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/jce.jar:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/charsets.jar:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/rhino.jar:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/jfr.jar:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/classes
>>> sun.boot.library.path
>>>  /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/amd64
>>> sun.cpu.endianlittle
>>> sun.cpu.isalist
>>> 

Re: Spark application doesn't scale to worker nodes

2016-07-05 Thread Mich Talebzadeh
Hi Jakub,

Any reason why you are running in standalone mode, given that your are
familiar with YARN?

In theory your settings are correct. I checked your environment tab
settings and they look correct.

I assume you have checked this link

http://spark.apache.org/docs/latest/spark-standalone.html

BTW is this issue confined to ML or any other Spark application exhibits
the same behaviour in standalone mode?


HTH






Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 5 July 2016 at 11:17, Jacek Laskowski  wrote:

> Hi Jakub,
>
> You're correct - spark.masterspark://master.clust:7077 - proves your
> point. You're running Spark Standalone that was set in
> conf/spark-defaults.conf perhaps.
>
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
> On Tue, Jul 5, 2016 at 12:04 PM, Jakub Stransky 
> wrote:
>
>> Hello,
>>
>> I am convinced that we are not running in local mode:
>>
>> Runtime Information
>>
>> NameValue
>> Java Home/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre
>> Java Version1.7.0_65 (Oracle Corporation)
>> Scala Versionversion 2.10.5
>> Spark Properties
>>
>> NameValue
>> spark.app.idapp-20160704121044-0003
>> spark.app.nameDemoApp
>> spark.driver.extraClassPath/home/sparkuser/sqljdbc4.jar
>> spark.driver.host10.2.0.4
>> spark.driver.memory4g
>> spark.driver.port59493
>> spark.executor.extraClassPath/usr/local/spark-1.6.1/sqljdbc4.jar
>> spark.executor.iddriver
>> spark.executor.memory12g
>> spark.externalBlockStore.folderName
>>  spark-5630dd34-4267-462e-882e-b382832bb500
>> spark.jarsfile:/home/sparkuser/SparkPOC.jar
>> spark.masterspark://master.clust:7077
>> spark.scheduler.modeFIFO
>> spark.submit.deployModeclient
>> System Properties
>>
>> NameValue
>> SPARK_SUBMITtrue
>> awt.toolkitsun.awt.X11.XToolkit
>> file.encodingUTF-8
>> file.encoding.pkgsun.io
>> file.separator/
>> java.awt.graphicsenvsun.awt.X11GraphicsEnvironment
>> java.awt.printerjobsun.print.PSPrinterJob
>> java.class.version51.0
>> java.endorsed.dirs
>>  /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/endorsed
>> java.ext.dirs
>>  
>> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/ext:/usr/java/packages/lib/ext
>> java.home/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre
>> java.io.tmpdir/tmp
>> java.library.path
>>  /usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
>> java.runtime.nameOpenJDK Runtime Environment
>> java.runtime.version1.7.0_65-mockbuild_2014_07_16_06_06-b00
>> java.specification.nameJava Platform API Specification
>> java.specification.vendorOracle Corporation
>> java.specification.version1.7
>> java.vendorOracle Corporation
>> java.vendor.urlhttp://java.oracle.com/
>> java.vendor.url.bughttp://bugreport.sun.com/bugreport/
>> java.version1.7.0_65
>> java.vm.infomixed mode
>> java.vm.nameOpenJDK 64-Bit Server VM
>> java.vm.specification.nameJava Virtual Machine Specification
>> java.vm.specification.vendorOracle Corporation
>> java.vm.specification.version1.7
>> java.vm.vendorOracle Corporation
>> java.vm.version24.65-b04
>> line.separator
>> os.archamd64
>> os.nameLinux
>> os.version2.6.32-431.29.2.el6.x86_64
>> path.separator:
>> sun.arch.data.model64
>> sun.boot.class.path
>>  
>> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/resources.jar:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/rt.jar:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/sunrsasign.jar:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/jsse.jar:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/jce.jar:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/charsets.jar:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/rhino.jar:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/jfr.jar:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/classes
>> sun.boot.library.path
>>  /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/amd64
>> sun.cpu.endianlittle
>> sun.cpu.isalist
>> sun.io.unicode.encodingUnicodeLittle
>> sun.java.commandorg.apache.spark.deploy.SparkSubmit --conf
>> spark.driver.extraClassPath=/home/sparkuser/sqljdbc4.jar --class  

Re: Spark application doesn't scale to worker nodes

2016-07-05 Thread Jakub Stransky
So now that we clarified that all is submitted at cluster standalone mode
what is left when the application (ML pipeline) doesn't take advantage of
full cluster power but essentially running just on master node until
resources are exhausted. Why training ml Decesion Tree doesn't scale to the
rest of the cluster?

On 5 July 2016 at 12:17, Jacek Laskowski  wrote:

> Hi Jakub,
>
> You're correct - spark.masterspark://master.clust:7077 - proves your
> point. You're running Spark Standalone that was set in
> conf/spark-defaults.conf perhaps.
>
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
> On Tue, Jul 5, 2016 at 12:04 PM, Jakub Stransky 
> wrote:
>
>> Hello,
>>
>> I am convinced that we are not running in local mode:
>>
>> Runtime Information
>>
>> NameValue
>> Java Home/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre
>> Java Version1.7.0_65 (Oracle Corporation)
>> Scala Versionversion 2.10.5
>> Spark Properties
>>
>> NameValue
>> spark.app.idapp-20160704121044-0003
>> spark.app.nameDemoApp
>> spark.driver.extraClassPath/home/sparkuser/sqljdbc4.jar
>> spark.driver.host10.2.0.4
>> spark.driver.memory4g
>> spark.driver.port59493
>> spark.executor.extraClassPath/usr/local/spark-1.6.1/sqljdbc4.jar
>> spark.executor.iddriver
>> spark.executor.memory12g
>> spark.externalBlockStore.folderName
>>  spark-5630dd34-4267-462e-882e-b382832bb500
>> spark.jarsfile:/home/sparkuser/SparkPOC.jar
>> spark.masterspark://master.clust:7077
>> spark.scheduler.modeFIFO
>> spark.submit.deployModeclient
>> System Properties
>>
>> NameValue
>> SPARK_SUBMITtrue
>> awt.toolkitsun.awt.X11.XToolkit
>> file.encodingUTF-8
>> file.encoding.pkgsun.io
>> file.separator/
>> java.awt.graphicsenvsun.awt.X11GraphicsEnvironment
>> java.awt.printerjobsun.print.PSPrinterJob
>> java.class.version51.0
>> java.endorsed.dirs
>>  /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/endorsed
>> java.ext.dirs
>>  
>> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/ext:/usr/java/packages/lib/ext
>> java.home/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre
>> java.io.tmpdir/tmp
>> java.library.path
>>  /usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
>> java.runtime.nameOpenJDK Runtime Environment
>> java.runtime.version1.7.0_65-mockbuild_2014_07_16_06_06-b00
>> java.specification.nameJava Platform API Specification
>> java.specification.vendorOracle Corporation
>> java.specification.version1.7
>> java.vendorOracle Corporation
>> java.vendor.urlhttp://java.oracle.com/
>> java.vendor.url.bughttp://bugreport.sun.com/bugreport/
>> java.version1.7.0_65
>> java.vm.infomixed mode
>> java.vm.nameOpenJDK 64-Bit Server VM
>> java.vm.specification.nameJava Virtual Machine Specification
>> java.vm.specification.vendorOracle Corporation
>> java.vm.specification.version1.7
>> java.vm.vendorOracle Corporation
>> java.vm.version24.65-b04
>> line.separator
>> os.archamd64
>> os.nameLinux
>> os.version2.6.32-431.29.2.el6.x86_64
>> path.separator:
>> sun.arch.data.model64
>> sun.boot.class.path
>>  
>> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/resources.jar:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/rt.jar:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/sunrsasign.jar:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/jsse.jar:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/jce.jar:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/charsets.jar:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/rhino.jar:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/jfr.jar:/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/classes
>> sun.boot.library.path
>>  /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre/lib/amd64
>> sun.cpu.endianlittle
>> sun.cpu.isalist
>> sun.io.unicode.encodingUnicodeLittle
>> sun.java.commandorg.apache.spark.deploy.SparkSubmit --conf
>> spark.driver.extraClassPath=/home/sparkuser/sqljdbc4.jar --class  --class
>> DemoApp SparkPOC.jar 10 4.3
>> sun.java.launcherSUN_STANDARD
>> sun.jnu.encodingUTF-8
>> sun.management.compilerHotSpot 64-Bit Tiered Compilers
>> sun.nio.ch.bugLevel
>> sun.os.patch.levelunknown
>> user.countryUS
>> user.dir/home/sparkuser
>> user.home/home/sparkuser
>> user.languageen
>> user.namesparkuser
>> user.timezoneEtc/UTC
>> Classpath Entries
>>
>> ResourceSource
>> /home/sparkuser/sqljdbc4.jarSystem Classpath
>> /usr/local/spark-1.6.1/assembly/target/scala-2.10/spark-assembly-1.6.1-hadoop2.2.0.jar
>>System Classpath
>> /usr/local/spark-1.6.1/conf/System Classpath
>> 

Re: Spark application doesn't scale to worker nodes

2016-07-04 Thread Mich Talebzadeh
well this will be apparent from the Environment tab of GUI. It will show
how the job is actually running.

Jacek's point is correct. I suspect this is actually running in Local mode
as it looks consuming all from the master node.

HTH







Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 4 July 2016 at 20:35, Jacek Laskowski  wrote:

> On Mon, Jul 4, 2016 at 8:36 PM, Mathieu Longtin 
> wrote:
>
>> Are you using a --master argument, or equivalent config, when calling
>> spark-submit?
>>
>> If you don't, it runs in standalone mode.
>>
>
> s/standalone/local[*]
>
> Jacek
>


Re: Spark application doesn't scale to worker nodes

2016-07-04 Thread Jacek Laskowski
On Mon, Jul 4, 2016 at 8:36 PM, Mathieu Longtin 
wrote:

> Are you using a --master argument, or equivalent config, when calling
> spark-submit?
>
> If you don't, it runs in standalone mode.
>

s/standalone/local[*]

Jacek


Re: Spark application doesn't scale to worker nodes

2016-07-04 Thread Mathieu Longtin
Are you using a --master argument, or equivalent config, when calling
spark-submit?

If you don't, it runs in standalone mode.

On Mon, Jul 4, 2016 at 2:27 PM Jakub Stransky  wrote:

> Hi Mich,
>
> sure that workers are mentioned in slaves file. I can see them in spark
> master UI and even after start they are "blocked" for this application but
> the cpu and memory consumption is close to nothing.
>
> Thanks
> Jakub
>
> On 4 July 2016 at 18:36, Mich Talebzadeh 
> wrote:
>
>> Silly question. Have you added your workers to sbin/slaves file and have
>> you started start-slaves.sh.
>>
>> on master node when you type jps what do you see?
>>
>> The problem seems to be that workers are ignored and spark is essentially
>> running in Local mode
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 4 July 2016 at 17:05, Jakub Stransky  wrote:
>>
>>> Hi Mich,
>>>
>>> I have set up spark default configuration in conf directory
>>> spark-defaults.conf where I specify master hence no need to put it in
>>> command line
>>> spark.master   spark://spark.master:7077
>>>
>>> the same applies to driver memory which has been increased to 4GB
>>>  and the same is for spark.executor.memory 12GB as machines have 16GB
>>>
>>> Jakub
>>>
>>>
>>>
>>>
>>> On 4 July 2016 at 17:44, Mich Talebzadeh 
>>> wrote:
>>>
 Hi Jakub,

 In standalone mode Spark does the resource management. Which version of
 Spark are you running?

 How do you define your SparkConf() parameters for example setMaster
 etc.

 From

 spark-submit --driver-class-path spark/sqljdbc4.jar --class DemoApp
 SparkPOC.jar 10 4.3

 I did not see any executor, memory allocation, so I assume you are
 allocating them somewhere else?

 HTH



 Dr Mich Talebzadeh



 LinkedIn * 
 https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 *



 http://talebzadehmich.wordpress.com


 *Disclaimer:* Use it at your own risk. Any and all responsibility for
 any loss, damage or destruction of data or any other property which may
 arise from relying on this email's technical content is explicitly
 disclaimed. The author will in no case be liable for any monetary damages
 arising from such loss, damage or destruction.



 On 4 July 2016 at 16:31, Jakub Stransky  wrote:

> Hello,
>
> I have a spark cluster consisting of 4 nodes in a standalone mode,
> master + 3 workers nodes with configured available memory and cpus etc.
>
> I have an spark application which is essentially a MLlib pipeline for
> training a classifier, in this case RandomForest  but could be a
> DecesionTree just for the sake of simplicity.
>
> But when I submit the spark application to the cluster via spark
> submit it is running out of memory. Even though the executors are
> "taken"/created in the cluster they are esentially doing nothing ( poor
> cpu, nor memory utilization) while the master seems to do all the work
> which finally results in OOM.
>
> My submission is following:
> spark-submit --driver-class-path spark/sqljdbc4.jar --class DemoApp
> SparkPOC.jar 10 4.3
>
> I am submitting from the master node.
>
> By default it is running in client mode which the driver process is
> attached to spark-shell.
>
> Do I need to set up some settings to make MLlib algos parallelized and
> distributed as well or all is driven by parallel factor set on dataframe
> with input data?
>
> Essentially it seems that all work is just done on master and the rest
> is idle.
> Any hints what to check?
>
> Thx
> Jakub
>
>
>
>

>>>
>>>
>>> --
>>> Jakub Stransky
>>> cz.linkedin.com/in/jakubstransky
>>>
>>>
>>
>
>
> --
> Jakub Stransky
> cz.linkedin.com/in/jakubstransky
>
> --
Mathieu Longtin
1-514-803-8977


Re: Spark application doesn't scale to worker nodes

2016-07-04 Thread Mich Talebzadeh
OK spark-submit by default starts its GUI at port :4040. You can
change that using --conf "spark.ui.port=" or any other port.

In GUI what do you see under Environment and Executors tabs. Can you send
the snapshot?

HTH



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 4 July 2016 at 19:27, Jakub Stransky  wrote:

> Hi Mich,
>
> sure that workers are mentioned in slaves file. I can see them in spark
> master UI and even after start they are "blocked" for this application but
> the cpu and memory consumption is close to nothing.
>
> Thanks
> Jakub
>
> On 4 July 2016 at 18:36, Mich Talebzadeh 
> wrote:
>
>> Silly question. Have you added your workers to sbin/slaves file and have
>> you started start-slaves.sh.
>>
>> on master node when you type jps what do you see?
>>
>> The problem seems to be that workers are ignored and spark is essentially
>> running in Local mode
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 4 July 2016 at 17:05, Jakub Stransky  wrote:
>>
>>> Hi Mich,
>>>
>>> I have set up spark default configuration in conf directory
>>> spark-defaults.conf where I specify master hence no need to put it in
>>> command line
>>> spark.master   spark://spark.master:7077
>>>
>>> the same applies to driver memory which has been increased to 4GB
>>>  and the same is for spark.executor.memory 12GB as machines have 16GB
>>>
>>> Jakub
>>>
>>>
>>>
>>>
>>> On 4 July 2016 at 17:44, Mich Talebzadeh 
>>> wrote:
>>>
 Hi Jakub,

 In standalone mode Spark does the resource management. Which version of
 Spark are you running?

 How do you define your SparkConf() parameters for example setMaster
 etc.

 From

 spark-submit --driver-class-path spark/sqljdbc4.jar --class DemoApp
 SparkPOC.jar 10 4.3

 I did not see any executor, memory allocation, so I assume you are
 allocating them somewhere else?

 HTH



 Dr Mich Talebzadeh



 LinkedIn * 
 https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 *



 http://talebzadehmich.wordpress.com


 *Disclaimer:* Use it at your own risk. Any and all responsibility for
 any loss, damage or destruction of data or any other property which may
 arise from relying on this email's technical content is explicitly
 disclaimed. The author will in no case be liable for any monetary damages
 arising from such loss, damage or destruction.



 On 4 July 2016 at 16:31, Jakub Stransky  wrote:

> Hello,
>
> I have a spark cluster consisting of 4 nodes in a standalone mode,
> master + 3 workers nodes with configured available memory and cpus etc.
>
> I have an spark application which is essentially a MLlib pipeline for
> training a classifier, in this case RandomForest  but could be a
> DecesionTree just for the sake of simplicity.
>
> But when I submit the spark application to the cluster via spark
> submit it is running out of memory. Even though the executors are
> "taken"/created in the cluster they are esentially doing nothing ( poor
> cpu, nor memory utilization) while the master seems to do all the work
> which finally results in OOM.
>
> My submission is following:
> spark-submit --driver-class-path spark/sqljdbc4.jar --class DemoApp
> SparkPOC.jar 10 4.3
>
> I am submitting from the master node.
>
> By default it is running in client mode which the driver process is
> attached to spark-shell.
>
> Do I need to set up some settings to make MLlib algos parallelized and
> distributed 

Re: Spark application doesn't scale to worker nodes

2016-07-04 Thread Jakub Stransky
Hi Mich,

sure that workers are mentioned in slaves file. I can see them in spark
master UI and even after start they are "blocked" for this application but
the cpu and memory consumption is close to nothing.

Thanks
Jakub

On 4 July 2016 at 18:36, Mich Talebzadeh  wrote:

> Silly question. Have you added your workers to sbin/slaves file and have
> you started start-slaves.sh.
>
> on master node when you type jps what do you see?
>
> The problem seems to be that workers are ignored and spark is essentially
> running in Local mode
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 4 July 2016 at 17:05, Jakub Stransky  wrote:
>
>> Hi Mich,
>>
>> I have set up spark default configuration in conf directory
>> spark-defaults.conf where I specify master hence no need to put it in
>> command line
>> spark.master   spark://spark.master:7077
>>
>> the same applies to driver memory which has been increased to 4GB
>>  and the same is for spark.executor.memory 12GB as machines have 16GB
>>
>> Jakub
>>
>>
>>
>>
>> On 4 July 2016 at 17:44, Mich Talebzadeh 
>> wrote:
>>
>>> Hi Jakub,
>>>
>>> In standalone mode Spark does the resource management. Which version of
>>> Spark are you running?
>>>
>>> How do you define your SparkConf() parameters for example setMaster etc.
>>>
>>> From
>>>
>>> spark-submit --driver-class-path spark/sqljdbc4.jar --class DemoApp
>>> SparkPOC.jar 10 4.3
>>>
>>> I did not see any executor, memory allocation, so I assume you are
>>> allocating them somewhere else?
>>>
>>> HTH
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>> On 4 July 2016 at 16:31, Jakub Stransky  wrote:
>>>
 Hello,

 I have a spark cluster consisting of 4 nodes in a standalone mode,
 master + 3 workers nodes with configured available memory and cpus etc.

 I have an spark application which is essentially a MLlib pipeline for
 training a classifier, in this case RandomForest  but could be a
 DecesionTree just for the sake of simplicity.

 But when I submit the spark application to the cluster via spark submit
 it is running out of memory. Even though the executors are "taken"/created
 in the cluster they are esentially doing nothing ( poor cpu, nor memory
 utilization) while the master seems to do all the work which finally
 results in OOM.

 My submission is following:
 spark-submit --driver-class-path spark/sqljdbc4.jar --class DemoApp
 SparkPOC.jar 10 4.3

 I am submitting from the master node.

 By default it is running in client mode which the driver process is
 attached to spark-shell.

 Do I need to set up some settings to make MLlib algos parallelized and
 distributed as well or all is driven by parallel factor set on dataframe
 with input data?

 Essentially it seems that all work is just done on master and the rest
 is idle.
 Any hints what to check?

 Thx
 Jakub




>>>
>>
>>
>> --
>> Jakub Stransky
>> cz.linkedin.com/in/jakubstransky
>>
>>
>


-- 
Jakub Stransky
cz.linkedin.com/in/jakubstransky


Re: Spark application doesn't scale to worker nodes

2016-07-04 Thread Mich Talebzadeh
Silly question. Have you added your workers to sbin/slaves file and have
you started start-slaves.sh.

on master node when you type jps what do you see?

The problem seems to be that workers are ignored and spark is essentially
running in Local mode

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 4 July 2016 at 17:05, Jakub Stransky  wrote:

> Hi Mich,
>
> I have set up spark default configuration in conf directory
> spark-defaults.conf where I specify master hence no need to put it in
> command line
> spark.master   spark://spark.master:7077
>
> the same applies to driver memory which has been increased to 4GB
>  and the same is for spark.executor.memory 12GB as machines have 16GB
>
> Jakub
>
>
>
>
> On 4 July 2016 at 17:44, Mich Talebzadeh 
> wrote:
>
>> Hi Jakub,
>>
>> In standalone mode Spark does the resource management. Which version of
>> Spark are you running?
>>
>> How do you define your SparkConf() parameters for example setMaster etc.
>>
>> From
>>
>> spark-submit --driver-class-path spark/sqljdbc4.jar --class DemoApp
>> SparkPOC.jar 10 4.3
>>
>> I did not see any executor, memory allocation, so I assume you are
>> allocating them somewhere else?
>>
>> HTH
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 4 July 2016 at 16:31, Jakub Stransky  wrote:
>>
>>> Hello,
>>>
>>> I have a spark cluster consisting of 4 nodes in a standalone mode,
>>> master + 3 workers nodes with configured available memory and cpus etc.
>>>
>>> I have an spark application which is essentially a MLlib pipeline for
>>> training a classifier, in this case RandomForest  but could be a
>>> DecesionTree just for the sake of simplicity.
>>>
>>> But when I submit the spark application to the cluster via spark submit
>>> it is running out of memory. Even though the executors are "taken"/created
>>> in the cluster they are esentially doing nothing ( poor cpu, nor memory
>>> utilization) while the master seems to do all the work which finally
>>> results in OOM.
>>>
>>> My submission is following:
>>> spark-submit --driver-class-path spark/sqljdbc4.jar --class DemoApp
>>> SparkPOC.jar 10 4.3
>>>
>>> I am submitting from the master node.
>>>
>>> By default it is running in client mode which the driver process is
>>> attached to spark-shell.
>>>
>>> Do I need to set up some settings to make MLlib algos parallelized and
>>> distributed as well or all is driven by parallel factor set on dataframe
>>> with input data?
>>>
>>> Essentially it seems that all work is just done on master and the rest
>>> is idle.
>>> Any hints what to check?
>>>
>>> Thx
>>> Jakub
>>>
>>>
>>>
>>>
>>
>
>
> --
> Jakub Stransky
> cz.linkedin.com/in/jakubstransky
>
>


Re: Spark application doesn't scale to worker nodes

2016-07-04 Thread Jakub Stransky
Mathieu,

there is no rocket science there. Essentially creates dataframe and then
call fit from ML pipeline. The thing which I do not understand is how the
parallelization is done in terms of ML algorithm. Is it based on parallel
factor of the dataframe? Because ML algorithm doesn't offer such setting
asfaik. There is only notion of max depth, prunning etc. bot none concerns
with parallelization.

On 4 July 2016 at 17:51, Mathieu Longtin  wrote:

> When the driver is running out of memory, it usually means you're loading
> data in a non parallel way (without using RDD). Make sure anything that
> requires non trivial amount of memory is done by an RDD. Also, the default
> memory for everything is 1GB, which may not be enough for you.
>
> On Mon, Jul 4, 2016 at 11:44 AM Mich Talebzadeh 
> wrote:
>
>> Hi Jakub,
>>
>> In standalone mode Spark does the resource management. Which version of
>> Spark are you running?
>>
>> How do you define your SparkConf() parameters for example setMaster etc.
>>
>> From
>>
>> spark-submit --driver-class-path spark/sqljdbc4.jar --class DemoApp
>> SparkPOC.jar 10 4.3
>>
>> I did not see any executor, memory allocation, so I assume you are
>> allocating them somewhere else?
>>
>> HTH
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 4 July 2016 at 16:31, Jakub Stransky  wrote:
>>
>>> Hello,
>>>
>>> I have a spark cluster consisting of 4 nodes in a standalone mode,
>>> master + 3 workers nodes with configured available memory and cpus etc.
>>>
>>> I have an spark application which is essentially a MLlib pipeline for
>>> training a classifier, in this case RandomForest  but could be a
>>> DecesionTree just for the sake of simplicity.
>>>
>>> But when I submit the spark application to the cluster via spark submit
>>> it is running out of memory. Even though the executors are "taken"/created
>>> in the cluster they are esentially doing nothing ( poor cpu, nor memory
>>> utilization) while the master seems to do all the work which finally
>>> results in OOM.
>>>
>>> My submission is following:
>>> spark-submit --driver-class-path spark/sqljdbc4.jar --class DemoApp
>>> SparkPOC.jar 10 4.3
>>>
>>> I am submitting from the master node.
>>>
>>> By default it is running in client mode which the driver process is
>>> attached to spark-shell.
>>>
>>> Do I need to set up some settings to make MLlib algos parallelized and
>>> distributed as well or all is driven by parallel factor set on dataframe
>>> with input data?
>>>
>>> Essentially it seems that all work is just done on master and the rest
>>> is idle.
>>> Any hints what to check?
>>>
>>> Thx
>>> Jakub
>>>
>>>
>>>
>>>
>> --
> Mathieu Longtin
> 1-514-803-8977
>



-- 
Jakub Stransky
cz.linkedin.com/in/jakubstransky


Re: Spark application doesn't scale to worker nodes

2016-07-04 Thread Jakub Stransky
Hi Mich,

I have set up spark default configuration in conf directory
spark-defaults.conf where I specify master hence no need to put it in
command line
spark.master   spark://spark.master:7077

the same applies to driver memory which has been increased to 4GB
 and the same is for spark.executor.memory 12GB as machines have 16GB

Jakub




On 4 July 2016 at 17:44, Mich Talebzadeh  wrote:

> Hi Jakub,
>
> In standalone mode Spark does the resource management. Which version of
> Spark are you running?
>
> How do you define your SparkConf() parameters for example setMaster etc.
>
> From
>
> spark-submit --driver-class-path spark/sqljdbc4.jar --class DemoApp
> SparkPOC.jar 10 4.3
>
> I did not see any executor, memory allocation, so I assume you are
> allocating them somewhere else?
>
> HTH
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 4 July 2016 at 16:31, Jakub Stransky  wrote:
>
>> Hello,
>>
>> I have a spark cluster consisting of 4 nodes in a standalone mode, master
>> + 3 workers nodes with configured available memory and cpus etc.
>>
>> I have an spark application which is essentially a MLlib pipeline for
>> training a classifier, in this case RandomForest  but could be a
>> DecesionTree just for the sake of simplicity.
>>
>> But when I submit the spark application to the cluster via spark submit
>> it is running out of memory. Even though the executors are "taken"/created
>> in the cluster they are esentially doing nothing ( poor cpu, nor memory
>> utilization) while the master seems to do all the work which finally
>> results in OOM.
>>
>> My submission is following:
>> spark-submit --driver-class-path spark/sqljdbc4.jar --class DemoApp
>> SparkPOC.jar 10 4.3
>>
>> I am submitting from the master node.
>>
>> By default it is running in client mode which the driver process is
>> attached to spark-shell.
>>
>> Do I need to set up some settings to make MLlib algos parallelized and
>> distributed as well or all is driven by parallel factor set on dataframe
>> with input data?
>>
>> Essentially it seems that all work is just done on master and the rest is
>> idle.
>> Any hints what to check?
>>
>> Thx
>> Jakub
>>
>>
>>
>>
>


-- 
Jakub Stransky
cz.linkedin.com/in/jakubstransky


Re: Spark application doesn't scale to worker nodes

2016-07-04 Thread Mathieu Longtin
When the driver is running out of memory, it usually means you're loading
data in a non parallel way (without using RDD). Make sure anything that
requires non trivial amount of memory is done by an RDD. Also, the default
memory for everything is 1GB, which may not be enough for you.

On Mon, Jul 4, 2016 at 11:44 AM Mich Talebzadeh 
wrote:

> Hi Jakub,
>
> In standalone mode Spark does the resource management. Which version of
> Spark are you running?
>
> How do you define your SparkConf() parameters for example setMaster etc.
>
> From
>
> spark-submit --driver-class-path spark/sqljdbc4.jar --class DemoApp
> SparkPOC.jar 10 4.3
>
> I did not see any executor, memory allocation, so I assume you are
> allocating them somewhere else?
>
> HTH
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 4 July 2016 at 16:31, Jakub Stransky  wrote:
>
>> Hello,
>>
>> I have a spark cluster consisting of 4 nodes in a standalone mode, master
>> + 3 workers nodes with configured available memory and cpus etc.
>>
>> I have an spark application which is essentially a MLlib pipeline for
>> training a classifier, in this case RandomForest  but could be a
>> DecesionTree just for the sake of simplicity.
>>
>> But when I submit the spark application to the cluster via spark submit
>> it is running out of memory. Even though the executors are "taken"/created
>> in the cluster they are esentially doing nothing ( poor cpu, nor memory
>> utilization) while the master seems to do all the work which finally
>> results in OOM.
>>
>> My submission is following:
>> spark-submit --driver-class-path spark/sqljdbc4.jar --class DemoApp
>> SparkPOC.jar 10 4.3
>>
>> I am submitting from the master node.
>>
>> By default it is running in client mode which the driver process is
>> attached to spark-shell.
>>
>> Do I need to set up some settings to make MLlib algos parallelized and
>> distributed as well or all is driven by parallel factor set on dataframe
>> with input data?
>>
>> Essentially it seems that all work is just done on master and the rest is
>> idle.
>> Any hints what to check?
>>
>> Thx
>> Jakub
>>
>>
>>
>>
> --
Mathieu Longtin
1-514-803-8977


Re: Spark application doesn't scale to worker nodes

2016-07-04 Thread Mich Talebzadeh
Hi Jakub,

In standalone mode Spark does the resource management. Which version of
Spark are you running?

How do you define your SparkConf() parameters for example setMaster etc.

From

spark-submit --driver-class-path spark/sqljdbc4.jar --class DemoApp
SparkPOC.jar 10 4.3

I did not see any executor, memory allocation, so I assume you are
allocating them somewhere else?

HTH



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 4 July 2016 at 16:31, Jakub Stransky  wrote:

> Hello,
>
> I have a spark cluster consisting of 4 nodes in a standalone mode, master
> + 3 workers nodes with configured available memory and cpus etc.
>
> I have an spark application which is essentially a MLlib pipeline for
> training a classifier, in this case RandomForest  but could be a
> DecesionTree just for the sake of simplicity.
>
> But when I submit the spark application to the cluster via spark submit it
> is running out of memory. Even though the executors are "taken"/created in
> the cluster they are esentially doing nothing ( poor cpu, nor memory
> utilization) while the master seems to do all the work which finally
> results in OOM.
>
> My submission is following:
> spark-submit --driver-class-path spark/sqljdbc4.jar --class DemoApp
> SparkPOC.jar 10 4.3
>
> I am submitting from the master node.
>
> By default it is running in client mode which the driver process is
> attached to spark-shell.
>
> Do I need to set up some settings to make MLlib algos parallelized and
> distributed as well or all is driven by parallel factor set on dataframe
> with input data?
>
> Essentially it seems that all work is just done on master and the rest is
> idle.
> Any hints what to check?
>
> Thx
> Jakub
>
>
>
>