Re: Executors assigned to STS and number of workers in Stand Alone Mode

2016-08-03 Thread Sun Rui
--num-executors does not work for Standalone mode. Try --total-executor-cores
> On Jul 26, 2016, at 00:17, Mich Talebzadeh  wrote:
> 
> Hi,
> 
> 
> I am doing some tests
> 
> I have started Spark in Standalone mode.
> 
> For simplicity I am using one node only with 8 works and I have 12 cores
> 
> In spark-env.sh I set this
> 
> # Options for the daemons used in the standalone deploy mode
> export SPARK_WORKER_CORES=1 ##, total number of cores to be used by executors 
> by each worker
> export SPARK_WORKER_MEMORY=1g ##, to set how much total memory workers have 
> to give executors (e.g. 1000m, 2g)
> the worker
> export SPARK_WORKER_INSTANCES=8 ##, to set the number of worker processes per 
> node
> 
> So it is pretty straight forward with 8 works and each worker assigned one 
> core
> 
> jps|grep Worker
> 15297 Worker
> 14794 Worker
> 15374 Worker
> 14998 Worker
> 15198 Worker
> 15465 Worker
> 14897 Worker
> 15099 Worker
> 
> I start Spark Thrift Server with the following parameters (using standalone 
> mode)
> 
> ${SPARK_HOME}/sbin/start-thriftserver.sh \
> --master spark://50.140.197.217:7077 
>  \
> --hiveconf hive.server2.thrift.port=10055 \
> --driver-memory 1G \
> --num-executors 1 \
> --executor-cores 1 \
> --executor-memory 1G \
> --conf "spark.scheduler.mode=FIFO" \
> 
> With one executor allocated 1 core
> 
> However, I can see both in the OS and UI that it starts with 8 executors, the 
> same number of workers on this node!
> 
> jps|egrep 'SparkSubmit|CoarseGrainedExecutorBackend'|sort
> 32711 SparkSubmit
> 369 CoarseGrainedExecutorBackend
> 370 CoarseGrainedExecutorBackend
> 371 CoarseGrainedExecutorBackend
> 376 CoarseGrainedExecutorBackend
> 387 CoarseGrainedExecutorBackend
> 395 CoarseGrainedExecutorBackend
> 419 CoarseGrainedExecutorBackend
> 420 CoarseGrainedExecutorBackend
> 
> 
> I fail to see why this is happening. Nothing else is running Spark wise. The 
> cause?
> 
>  How can I stop STS going and using all available workers?
> 
> Thanks
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> 
>  
> http://talebzadehmich.wordpress.com 
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destruction of data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction.
>  



Re: Executors assigned to STS and number of workers in Stand Alone Mode

2016-08-03 Thread Michael Gummelt
> but Spark on Mesos is certainly lagging behind Spark on YARN regarding
the features Spark uses off the scheduler backends -- security, data
locality, queues, etc.

If by security you mean Kerberos, we'll be upstreaming that to Apache Spark
soon.  It's been in DC/OS Spark for a while:
https://github.com/mesosphere/spark/commit/73ba2ab8d97510d5475ef9a48c673ce34f7173fa

Locality is implemented in a scheduler independent way:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala#L327,
but it is possible that the offer model could result in different
placement.  I haven't seen any analysis to that effect.

YARN queues are very similar to Mesos quota and roles, which Spark
supports.  We'll also be adding support for revocable resource support
sometime soon, which solves the HoL blocking problem, where one Spark app
eats up your cluster while others wait.  I don't think YARN has a solution
for this, but I could be wrong.

So, yea, there are some differences, but I think the biggest feature gap
right now is really just Kerberos, which will be added soon.

There are also other Mesos-specific features we'll be adding soon, such as
GPU, CNI, and virtual network but the biggest advantage for running on
Mesos is that you can run multi-tenant alongside other Mesos frameworks.








On Mon, Jul 25, 2016 at 2:04 PM, Jacek Laskowski  wrote:

> On Mon, Jul 25, 2016 at 10:57 PM, Mich Talebzadeh
>  wrote:
>
> > Yarn promises the best resource management I believe. Having said that I
> have not used Mesos myself.
>
> I'm glad you've mentioned it.
>
> I think Cloudera (and Hortonworks?) guys are doing a great job with
> bringing all the features of YARN to Spark and I think Spark on YARN
> shines features-wise.
>
> I'm not in a position to compare YARN vs Mesos for their resource
> management, but Spark on Mesos is certainly lagging behind Spark on
> YARN regarding the features Spark uses off the scheduler backends --
> security, data locality, queues, etc. (or I might be simply biased
> after having spent months with Spark on YARN mostly?).
>
> Jacek
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


-- 
Michael Gummelt
Software Engineer
Mesosphere


Re: Executors assigned to STS and number of workers in Stand Alone Mode

2016-07-25 Thread ayan guha
STS works on YARN, as a yarn-client application.

One issue: STS is not HA-supported, though there was some discussion to
make it HA similar to Hive Server. So what we did is to run sts on multiple
nodes and tie them to a load balancer. .

On Tue, Jul 26, 2016 at 8:06 AM, Mich Talebzadeh 
wrote:

> Correction.
>
> STS uses the same UI to display details about all processes running
> against it which is helpful but gets crowded
>
> :)
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 25 July 2016 at 22:26, Mich Talebzadeh 
> wrote:
>
>> We also should remember that STS is a pretty useful tool. With JDBC you
>> can use beeline, Zeppelin, Squirrel and other tools against it.
>>
>> One thing I like to change is the UI port that the thrift server listens
>> and you can change it at startup using spark.ui.port. This is fixed at
>> thrift startup and can only display one sql query at a time which is kind
>> not useful.
>>
>> As one can run multiple clients against STS, it is a
>> limitation that one cannot change the UI port at runtime.
>>
>> Cheers
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 25 July 2016 at 22:04, Jacek Laskowski  wrote:
>>
>>> On Mon, Jul 25, 2016 at 10:57 PM, Mich Talebzadeh
>>>  wrote:
>>>
>>> > Yarn promises the best resource management I believe. Having said that
>>> I have not used Mesos myself.
>>>
>>> I'm glad you've mentioned it.
>>>
>>> I think Cloudera (and Hortonworks?) guys are doing a great job with
>>> bringing all the features of YARN to Spark and I think Spark on YARN
>>> shines features-wise.
>>>
>>> I'm not in a position to compare YARN vs Mesos for their resource
>>> management, but Spark on Mesos is certainly lagging behind Spark on
>>> YARN regarding the features Spark uses off the scheduler backends --
>>> security, data locality, queues, etc. (or I might be simply biased
>>> after having spent months with Spark on YARN mostly?).
>>>
>>> Jacek
>>>
>>
>>
>


-- 
Best Regards,
Ayan Guha


Re: Executors assigned to STS and number of workers in Stand Alone Mode

2016-07-25 Thread Mich Talebzadeh
Correction.

STS uses the same UI to display details about all processes running against
it which is helpful but gets crowded

:)

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 25 July 2016 at 22:26, Mich Talebzadeh  wrote:

> We also should remember that STS is a pretty useful tool. With JDBC you
> can use beeline, Zeppelin, Squirrel and other tools against it.
>
> One thing I like to change is the UI port that the thrift server listens
> and you can change it at startup using spark.ui.port. This is fixed at
> thrift startup and can only display one sql query at a time which is kind
> not useful.
>
> As one can run multiple clients against STS, it is a
> limitation that one cannot change the UI port at runtime.
>
> Cheers
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 25 July 2016 at 22:04, Jacek Laskowski  wrote:
>
>> On Mon, Jul 25, 2016 at 10:57 PM, Mich Talebzadeh
>>  wrote:
>>
>> > Yarn promises the best resource management I believe. Having said that
>> I have not used Mesos myself.
>>
>> I'm glad you've mentioned it.
>>
>> I think Cloudera (and Hortonworks?) guys are doing a great job with
>> bringing all the features of YARN to Spark and I think Spark on YARN
>> shines features-wise.
>>
>> I'm not in a position to compare YARN vs Mesos for their resource
>> management, but Spark on Mesos is certainly lagging behind Spark on
>> YARN regarding the features Spark uses off the scheduler backends --
>> security, data locality, queues, etc. (or I might be simply biased
>> after having spent months with Spark on YARN mostly?).
>>
>> Jacek
>>
>
>


Re: Executors assigned to STS and number of workers in Stand Alone Mode

2016-07-25 Thread Mich Talebzadeh
We also should remember that STS is a pretty useful tool. With JDBC you can
use beeline, Zeppelin, Squirrel and other tools against it.

One thing I like to change is the UI port that the thrift server listens
and you can change it at startup using spark.ui.port. This is fixed at
thrift startup and can only display one sql query at a time which is kind
not useful.

As one can run multiple clients against STS, it is a
limitation that one cannot change the UI port at runtime.

Cheers

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 25 July 2016 at 22:04, Jacek Laskowski  wrote:

> On Mon, Jul 25, 2016 at 10:57 PM, Mich Talebzadeh
>  wrote:
>
> > Yarn promises the best resource management I believe. Having said that I
> have not used Mesos myself.
>
> I'm glad you've mentioned it.
>
> I think Cloudera (and Hortonworks?) guys are doing a great job with
> bringing all the features of YARN to Spark and I think Spark on YARN
> shines features-wise.
>
> I'm not in a position to compare YARN vs Mesos for their resource
> management, but Spark on Mesos is certainly lagging behind Spark on
> YARN regarding the features Spark uses off the scheduler backends --
> security, data locality, queues, etc. (or I might be simply biased
> after having spent months with Spark on YARN mostly?).
>
> Jacek
>


Re: Executors assigned to STS and number of workers in Stand Alone Mode

2016-07-25 Thread Jacek Laskowski
On Mon, Jul 25, 2016 at 10:57 PM, Mich Talebzadeh
 wrote:

> Yarn promises the best resource management I believe. Having said that I have 
> not used Mesos myself.

I'm glad you've mentioned it.

I think Cloudera (and Hortonworks?) guys are doing a great job with
bringing all the features of YARN to Spark and I think Spark on YARN
shines features-wise.

I'm not in a position to compare YARN vs Mesos for their resource
management, but Spark on Mesos is certainly lagging behind Spark on
YARN regarding the features Spark uses off the scheduler backends --
security, data locality, queues, etc. (or I might be simply biased
after having spent months with Spark on YARN mostly?).

Jacek

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Executors assigned to STS and number of workers in Stand Alone Mode

2016-07-25 Thread Mich Talebzadeh
Hi,

Actually I started STS in local mode and that works.

I have not tested yarn modes for STS but certainly it appears that one can
run these in any mode one wishes.

local mode has its limitation (all in one JPS and not taking advantage of
scaling out)  but one can run STS in local mode on the same host on
different ports without this centralised resource management that
standalone offers and certainly there are some issues with it as I have
seen. in local mode we are just scaling up

Let us see how it goes. Yarn promises the best resource management I
believe. Having said that I have not used Mesos myself.

HTH



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 25 July 2016 at 21:37, Jacek Laskowski  wrote:

> Hi,
>
> That's interesting...What holds STS back from working on the other
> scheduler backends, e.g. YARN or Mesos? I haven't spent much time with
> it, but thought it's a mere Spark application.
>
> The property is spark.deploy.spreadOut = Whether the standalone
> cluster manager should spread applications out across nodes or try to
> consolidate them onto as few nodes as possible. Spreading out is
> usually better for data locality in HDFS, but consolidating is more
> efficient for compute-intensive workloads.
>
> See https://spark.apache.org/docs/latest/spark-standalone.html
>
> Pozdrawiam,
> Jacek Laskowski
> 
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Mon, Jul 25, 2016 at 9:24 PM, Mich Talebzadeh
>  wrote:
> > Thanks. As I understand STS only works in Standalone mode :(
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn
> >
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > Disclaimer: Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> The
> > author will in no case be liable for any monetary damages arising from
> such
> > loss, damage or destruction.
> >
> >
> >
> >
> > On 25 July 2016 at 19:34, Jacek Laskowski  wrote:
> >>
> >> Hi,
> >>
> >> My vague understanding of Spark Standalone is that it will take up all
> >> available workers for a Spark application (despite the cmd options).
> There
> >> was a property to disable it. Can't remember it now though.
> >>
> >> Ps. Yet another reason for YARN ;-)
> >>
> >> Jacek
> >>
> >>
> >> On 25 Jul 2016 6:17 p.m., "Mich Talebzadeh" 
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>>
> >>> I am doing some tests
> >>>
> >>> I have started Spark in Standalone mode.
> >>>
> >>> For simplicity I am using one node only with 8 works and I have 12
> cores
> >>>
> >>> In spark-env.sh I set this
> >>>
> >>> # Options for the daemons used in the standalone deploy mode
> >>> export SPARK_WORKER_CORES=1 ##, total number of cores to be used by
> >>> executors by each worker
> >>> export SPARK_WORKER_MEMORY=1g ##, to set how much total memory workers
> >>> have to give executors (e.g. 1000m, 2g)
> >>> the worker
> >>> export SPARK_WORKER_INSTANCES=8 ##, to set the number of worker
> processes
> >>> per node
> >>>
> >>> So it is pretty straight forward with 8 works and each worker assigned
> >>> one core
> >>>
> >>> jps|grep Worker
> >>> 15297 Worker
> >>> 14794 Worker
> >>> 15374 Worker
> >>> 14998 Worker
> >>> 15198 Worker
> >>> 15465 Worker
> >>> 14897 Worker
> >>> 15099 Worker
> >>>
> >>> I start Spark Thrift Server with the following parameters (using
> >>> standalone mode)
> >>>
> >>> ${SPARK_HOME}/sbin/start-thriftserver.sh \
> >>> --master spark://50.140.197.217:7077 \
> >>> --hiveconf hive.server2.thrift.port=10055 \
> >>> --driver-memory 1G \
> >>> --num-executors 1 \
> >>> --executor-cores 1 \
> >>> --executor-memory 1G \
> >>> --conf "spark.scheduler.mode=FIFO" \
> >>>
> >>> With one executor allocated 1 core
> >>>
> >>> However, I can see both in the OS and UI that it starts with 8
> executors,
> >>> the same number of workers on this node!
> >>>
> >>> jps|egrep 'SparkSubmit|CoarseGrainedExecutorBackend'|sort
> >>> 32711 SparkSubmit
> >>> 369 

Re: Executors assigned to STS and number of workers in Stand Alone Mode

2016-07-25 Thread Jacek Laskowski
Hi,

That's interesting...What holds STS back from working on the other
scheduler backends, e.g. YARN or Mesos? I haven't spent much time with
it, but thought it's a mere Spark application.

The property is spark.deploy.spreadOut = Whether the standalone
cluster manager should spread applications out across nodes or try to
consolidate them onto as few nodes as possible. Spreading out is
usually better for data locality in HDFS, but consolidating is more
efficient for compute-intensive workloads.

See https://spark.apache.org/docs/latest/spark-standalone.html

Pozdrawiam,
Jacek Laskowski

https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Mon, Jul 25, 2016 at 9:24 PM, Mich Talebzadeh
 wrote:
> Thanks. As I understand STS only works in Standalone mode :(
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> Disclaimer: Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed. The
> author will in no case be liable for any monetary damages arising from such
> loss, damage or destruction.
>
>
>
>
> On 25 July 2016 at 19:34, Jacek Laskowski  wrote:
>>
>> Hi,
>>
>> My vague understanding of Spark Standalone is that it will take up all
>> available workers for a Spark application (despite the cmd options). There
>> was a property to disable it. Can't remember it now though.
>>
>> Ps. Yet another reason for YARN ;-)
>>
>> Jacek
>>
>>
>> On 25 Jul 2016 6:17 p.m., "Mich Talebzadeh" 
>> wrote:
>>>
>>> Hi,
>>>
>>>
>>> I am doing some tests
>>>
>>> I have started Spark in Standalone mode.
>>>
>>> For simplicity I am using one node only with 8 works and I have 12 cores
>>>
>>> In spark-env.sh I set this
>>>
>>> # Options for the daemons used in the standalone deploy mode
>>> export SPARK_WORKER_CORES=1 ##, total number of cores to be used by
>>> executors by each worker
>>> export SPARK_WORKER_MEMORY=1g ##, to set how much total memory workers
>>> have to give executors (e.g. 1000m, 2g)
>>> the worker
>>> export SPARK_WORKER_INSTANCES=8 ##, to set the number of worker processes
>>> per node
>>>
>>> So it is pretty straight forward with 8 works and each worker assigned
>>> one core
>>>
>>> jps|grep Worker
>>> 15297 Worker
>>> 14794 Worker
>>> 15374 Worker
>>> 14998 Worker
>>> 15198 Worker
>>> 15465 Worker
>>> 14897 Worker
>>> 15099 Worker
>>>
>>> I start Spark Thrift Server with the following parameters (using
>>> standalone mode)
>>>
>>> ${SPARK_HOME}/sbin/start-thriftserver.sh \
>>> --master spark://50.140.197.217:7077 \
>>> --hiveconf hive.server2.thrift.port=10055 \
>>> --driver-memory 1G \
>>> --num-executors 1 \
>>> --executor-cores 1 \
>>> --executor-memory 1G \
>>> --conf "spark.scheduler.mode=FIFO" \
>>>
>>> With one executor allocated 1 core
>>>
>>> However, I can see both in the OS and UI that it starts with 8 executors,
>>> the same number of workers on this node!
>>>
>>> jps|egrep 'SparkSubmit|CoarseGrainedExecutorBackend'|sort
>>> 32711 SparkSubmit
>>> 369 CoarseGrainedExecutorBackend
>>> 370 CoarseGrainedExecutorBackend
>>> 371 CoarseGrainedExecutorBackend
>>> 376 CoarseGrainedExecutorBackend
>>> 387 CoarseGrainedExecutorBackend
>>> 395 CoarseGrainedExecutorBackend
>>> 419 CoarseGrainedExecutorBackend
>>> 420 CoarseGrainedExecutorBackend
>>>
>>>
>>> I fail to see why this is happening. Nothing else is running Spark wise.
>>> The cause?
>>>
>>>  How can I stop STS going and using all available workers?
>>>
>>> Thanks
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> Disclaimer: Use it at your own risk. Any and all responsibility for any
>>> loss, damage or destruction of data or any other property which may arise
>>> from relying on this email's technical content is explicitly disclaimed. The
>>> author will in no case be liable for any monetary damages arising from such
>>> loss, damage or destruction.
>>>
>>>
>
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Executors assigned to STS and number of workers in Stand Alone Mode

2016-07-25 Thread Mich Talebzadeh
Thanks. As I understand STS only works in Standalone mode :(

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 25 July 2016 at 19:34, Jacek Laskowski  wrote:

> Hi,
>
> My vague understanding of Spark Standalone is that it will take up all
> available workers for a Spark application (despite the cmd options). There
> was a property to disable it. Can't remember it now though.
>
> Ps. Yet another reason for YARN ;-)
>
> Jacek
>
> On 25 Jul 2016 6:17 p.m., "Mich Talebzadeh" 
> wrote:
>
>> Hi,
>>
>>
>> I am doing some tests
>>
>> I have started Spark in Standalone mode.
>>
>> For simplicity I am using one node only with 8 works and I have 12 cores
>>
>> In spark-env.sh I set this
>>
>> # Options for the daemons used in the standalone deploy mode
>> export SPARK_WORKER_CORES=1 ##, total number of cores to be used by
>> executors by each worker
>> export SPARK_WORKER_MEMORY=1g ##, to set how much total memory workers
>> have to give executors (e.g. 1000m, 2g)
>> the worker
>> export SPARK_WORKER_INSTANCES=8 ##, to set the number of worker processes
>> per node
>>
>> So it is pretty straight forward with 8 works and each worker assigned
>> one core
>>
>> jps|grep Worker
>> 15297 Worker
>> 14794 Worker
>> 15374 Worker
>> 14998 Worker
>> 15198 Worker
>> 15465 Worker
>> 14897 Worker
>> 15099 Worker
>>
>> I start Spark Thrift Server with the following parameters (using
>> standalone mode)
>>
>> ${SPARK_HOME}/sbin/start-thriftserver.sh \
>> --master spark://50.140.197.217:7077 \
>> --hiveconf hive.server2.thrift.port=10055 \
>> --driver-memory 1G \
>> --num-executors 1 \
>> --executor-cores 1 \
>> --executor-memory 1G \
>> --conf "spark.scheduler.mode=FIFO" \
>>
>> With one executor allocated 1 core
>>
>> However, I can see both in the OS and UI that it starts with 8 executors,
>> the same number of workers on this node!
>>
>> jps|egrep 'SparkSubmit|CoarseGrainedExecutorBackend'|sort
>> 32711 SparkSubmit
>> 369 CoarseGrainedExecutorBackend
>> 370 CoarseGrainedExecutorBackend
>> 371 CoarseGrainedExecutorBackend
>> 376 CoarseGrainedExecutorBackend
>> 387 CoarseGrainedExecutorBackend
>> 395 CoarseGrainedExecutorBackend
>> 419 CoarseGrainedExecutorBackend
>> 420 CoarseGrainedExecutorBackend
>>
>>
>> I fail to see why this is happening. Nothing else is running Spark wise.
>> The cause?
>>
>>  How can I stop STS going and using all available workers?
>>
>> Thanks
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>


Re: Executors assigned to STS and number of workers in Stand Alone Mode

2016-07-25 Thread Jacek Laskowski
Hi,

My vague understanding of Spark Standalone is that it will take up all
available workers for a Spark application (despite the cmd options). There
was a property to disable it. Can't remember it now though.

Ps. Yet another reason for YARN ;-)

Jacek

On 25 Jul 2016 6:17 p.m., "Mich Talebzadeh" 
wrote:

> Hi,
>
>
> I am doing some tests
>
> I have started Spark in Standalone mode.
>
> For simplicity I am using one node only with 8 works and I have 12 cores
>
> In spark-env.sh I set this
>
> # Options for the daemons used in the standalone deploy mode
> export SPARK_WORKER_CORES=1 ##, total number of cores to be used by
> executors by each worker
> export SPARK_WORKER_MEMORY=1g ##, to set how much total memory workers
> have to give executors (e.g. 1000m, 2g)
> the worker
> export SPARK_WORKER_INSTANCES=8 ##, to set the number of worker processes
> per node
>
> So it is pretty straight forward with 8 works and each worker assigned one
> core
>
> jps|grep Worker
> 15297 Worker
> 14794 Worker
> 15374 Worker
> 14998 Worker
> 15198 Worker
> 15465 Worker
> 14897 Worker
> 15099 Worker
>
> I start Spark Thrift Server with the following parameters (using
> standalone mode)
>
> ${SPARK_HOME}/sbin/start-thriftserver.sh \
> --master spark://50.140.197.217:7077 \
> --hiveconf hive.server2.thrift.port=10055 \
> --driver-memory 1G \
> --num-executors 1 \
> --executor-cores 1 \
> --executor-memory 1G \
> --conf "spark.scheduler.mode=FIFO" \
>
> With one executor allocated 1 core
>
> However, I can see both in the OS and UI that it starts with 8 executors,
> the same number of workers on this node!
>
> jps|egrep 'SparkSubmit|CoarseGrainedExecutorBackend'|sort
> 32711 SparkSubmit
> 369 CoarseGrainedExecutorBackend
> 370 CoarseGrainedExecutorBackend
> 371 CoarseGrainedExecutorBackend
> 376 CoarseGrainedExecutorBackend
> 387 CoarseGrainedExecutorBackend
> 395 CoarseGrainedExecutorBackend
> 419 CoarseGrainedExecutorBackend
> 420 CoarseGrainedExecutorBackend
>
>
> I fail to see why this is happening. Nothing else is running Spark wise.
> The cause?
>
>  How can I stop STS going and using all available workers?
>
> Thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>


Executors assigned to STS and number of workers in Stand Alone Mode

2016-07-25 Thread Mich Talebzadeh
Hi,


I am doing some tests

I have started Spark in Standalone mode.

For simplicity I am using one node only with 8 works and I have 12 cores

In spark-env.sh I set this

# Options for the daemons used in the standalone deploy mode
export SPARK_WORKER_CORES=1 ##, total number of cores to be used by
executors by each worker
export SPARK_WORKER_MEMORY=1g ##, to set how much total memory workers have
to give executors (e.g. 1000m, 2g)
the worker
export SPARK_WORKER_INSTANCES=8 ##, to set the number of worker processes
per node

So it is pretty straight forward with 8 works and each worker assigned one
core

jps|grep Worker
15297 Worker
14794 Worker
15374 Worker
14998 Worker
15198 Worker
15465 Worker
14897 Worker
15099 Worker

I start Spark Thrift Server with the following parameters (using standalone
mode)

${SPARK_HOME}/sbin/start-thriftserver.sh \
--master spark://50.140.197.217:7077 \
--hiveconf hive.server2.thrift.port=10055 \
--driver-memory 1G \
--num-executors 1 \
--executor-cores 1 \
--executor-memory 1G \
--conf "spark.scheduler.mode=FIFO" \

With one executor allocated 1 core

However, I can see both in the OS and UI that it starts with 8 executors,
the same number of workers on this node!

jps|egrep 'SparkSubmit|CoarseGrainedExecutorBackend'|sort
32711 SparkSubmit
369 CoarseGrainedExecutorBackend
370 CoarseGrainedExecutorBackend
371 CoarseGrainedExecutorBackend
376 CoarseGrainedExecutorBackend
387 CoarseGrainedExecutorBackend
395 CoarseGrainedExecutorBackend
419 CoarseGrainedExecutorBackend
420 CoarseGrainedExecutorBackend


I fail to see why this is happening. Nothing else is running Spark wise.
The cause?

 How can I stop STS going and using all available workers?

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.