Re: Correct way of setting executor numbers and executor cores in Spark 1.6.1 for non-clustered mode ?

Mich Talebzadeh Sun, 08 May 2016 08:52:46 -0700

Hi Karen,

You mentioned:


"So if I'm reading your email correctly it sounds like I should be able to
increase the number of executors on local mode by adding hostnames for
localhost.
and cores per executor with SPARK_EXECUTOR_CORES.
And by starting master/slave(s) for local host I can access webui to see
whats going on."

The problem is that Spark is memory hungry so to speak. Now you can add
more workers (slaves) to the list but if you don't have enough memory it
will just queue up and won't do much I am afraid.

So out of 16GB RAM you have 6GB free.

For example in mine I have 8.9GB free (out of 24GB.


*free*             total       used       free     shared    buffers
cached
Mem:      24546308   23653300     893008          0    1163364   15293448
-/+ buffers/cache:    7196488   17349820
Swap:      2031608    2029184       2424

Now you can get more details of resources by doing the following


*cat /proc/meminfo*MemTotal:     24546308 kB
MemFree:        898488 kB
Buffers:       1163732 kB
Cached:       15293628 kB
SwapCached:     231924 kB
Active:       17129696 kB
Inactive:      5553284 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:     24546308 kB
LowFree:        898488 kB
SwapTotal:     2031608 kB
SwapFree:         2424 kB
Dirty:             408 kB
Writeback:           0 kB
AnonPages:     5693420 kB
Mapped:        7643580 kB
Slab:           702452 kB
PageTables:     182760 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:  14304760 kB
Committed_AS: 36210252 kB
VmallocTotal: 34359738367 kB
VmallocUsed:      1760 kB
VmallocChunk: 34359736355 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

So you can see where you are. To be honest with you from my experience it
sounds like you have enough resources for one container in other words one
job, one executor, so you may have to live with it.

also the command

cat /proc/cpuinfo

will also tell you how many cores you have

and of course the summary will be in *top* command

So I am not sure there is much you can do about except trying to get as
much as you can.

BTW are you also running hive on this host or any other database?

*ipcs -m*
------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status
0x00000000 32768      gdm       600        393216     2          dest
0x00000000 3080193    oracle    640        4517888    212
0x00000000 3112962    oracle    640        7566524416 106
0x00000000 3145731    oracle    640        12259328   106
0x5b266174 3178500    oracle    640        36864      106
0xe916010c 103841797  sybase    600        15728640000 1

Anyway post the output and see how we can all help you.

HTH


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 8 May 2016 at 12:34, Karen Murphy <k.l.mur...@qub.ac.uk> wrote:

>
> Hi Mich,
>
> I have just seen your reply for first time.  Don't know why I can't see it
> on the online mailing list, possibly just delayed.  I kept checking it for
> replies rather than logging in for emails.  Thanks I will try what you
> suggest.
>
> So if I'm reading your email correctly it sounds like I should be able to
> increase the number of executors on local mode by adding hostnames for
> localhost.
> and cores per executor with SPARK_EXECUTOR_CORES.
> And by starting master/slave(s) for local host I can access webui to see
> whats going on.
>
> I did check 'free' and looks like there is a lot less free memory (from
> the total of 16GB) than I had thought.  In fact just under 6GB, no jobs
> currently running, and just hadoop-daemons.
>
> Will reply to your email on the list when it appears to report results
> when I have them,
>
> Thanks for help and sorry for confusion,
> Karen
>
>
> ________________________________________
> From: Mich Talebzadeh [mich.talebza...@gmail.com]
> Sent: 07 May 2016 15:01
> To: Karen Murphy
> Cc: user @spark
> Subject: Re: Correct way of setting executor numbers and executor cores in
> Spark 1.6.1 for non-clustered mode ?
>
> Check how much free memory you have on your hosr
>
> /usr/bin/free
>
>
> as a heuristic values start with these in
>
> export SPARK_EXECUTOR_CORES=4 ##, Number of cores for the workers
> (Default: 1).
> export SPARK_EXECUTOR_MEMORY=8G ## , Memory per Worker (e.g. 1000M, 2G)
> (Default: 1G)
> export SPARK_DRIVER_MEMORY=1G ## , Memory for Master (e.g. 1000M, 2G)
> (Default: 512 Mb)
>
> in conf/spark-env.sh  and then increase another worker processes by adding
> you standalone hostname to /conf/slaves. So that will create two worker
> processes on that hostnames.
>
> do sbin/start-master.sh (if not started) and do start-slaves.sh.
>
> Log in to spark GUI websites for job on hostname:4040/executors/
>
> And test your jobs for timing and completion. Adjust parameters
> accordingly. Also make sure that memory/core ratio is reasonable.
> Regardless of what you are using these are general confiiguration for
> Spark. The important think for Spark is memory. Without it your application
> will start spilling to disk and performance will suffer. Ensure that yu do
> not starve OS from memory and cores.
>
> HTH
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
>
>
> http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>
>
>
>
> On 7 May 2016 at 12:03, kmurph <k.l.mur...@qub.ac.uk<mailto:
> k.l.mur...@qub.ac.uk>> wrote:
>
> Hi,
>
> I'm running spark 1.6.1 on a single machine, initially a small one (8
> cores,
> 16GB ram) using "--master local[*]" to spark-submit and I'm trying to see
> scaling with increasing cores, unsuccessfully.
> Initially I'm setting SPARK_EXECUTOR_INSTANCES=1, and increasing cores for
> each executor.  The way I'm setting cores per executor is either with
> "SPARK_EXECUTOR_CORES=1" (up to 4) and I also tried with " --conf
> "spark.executor.cores=1 spark.executor.memory=9g".
> I'm repartitioning the RDD of the large dataset into 4/8/10 partitions for
> different runs.
>
> Am I setting executors/cores correctly for running Spark 1.6
> locally/Standalone mode ?
> The logs show the same overall  timings for execution of the key stages
> (within a stage I see the number of tasks match the data partitioning
> value)
> whether I'm setting for 1, 4 or 8 cores per executor.  And the process
> table
> looks like the requested cores aren't being used.
>
> I know eg. "--num.executors=X" is only an argument to Yarn.  I can't find
> specific instructions in one place for settings these params
> (executors/cores) on Spark running on one machine.
>
> An example of my full spark-submit command is:
>
> SPARK_EXECUTOR_INSTANCES=1 SPARK_EXECUTOR_CORES=4 spark-submit --master
> local[*] --conf "spark.executor.cores=4 spark.executor.memory=9g" --class
> asap.examples.mllib.TfIdfExample
>
> /home/ubuntu/spark-1.6.1-bin-hadoop2.6/asap_ml/target/scala-2.10/ml-operators_2.10-1.0.jar
>
> Duplicated settings here but it shows the different ways I've been setting
> the parameters.
>
> Thanks
> Karen
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Correct-way-of-setting-executor-numbers-and-executor-cores-in-Spark-1-6-1-for-non-clustered-mode-tp26894.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org<mailto:
> user-unsubscr...@spark.apache.org>
> For additional commands, e-mail: user-h...@spark.apache.org<mailto:
> user-h...@spark.apache.org>
>
>
>

Re: Correct way of setting executor numbers and executor cores in Spark 1.6.1 for non-clustered mode ?

Reply via email to