Hi Karen, You mentioned:
"So if I'm reading your email correctly it sounds like I should be able to increase the number of executors on local mode by adding hostnames for localhost. and cores per executor with SPARK_EXECUTOR_CORES. And by starting master/slave(s) for local host I can access webui to see whats going on." The problem is that Spark is memory hungry so to speak. Now you can add more workers (slaves) to the list but if you don't have enough memory it will just queue up and won't do much I am afraid. So out of 16GB RAM you have 6GB free. For example in mine I have 8.9GB free (out of 24GB. *free* total used free shared buffers cached Mem: 24546308 23653300 893008 0 1163364 15293448 -/+ buffers/cache: 7196488 17349820 Swap: 2031608 2029184 2424 Now you can get more details of resources by doing the following *cat /proc/meminfo*MemTotal: 24546308 kB MemFree: 898488 kB Buffers: 1163732 kB Cached: 15293628 kB SwapCached: 231924 kB Active: 17129696 kB Inactive: 5553284 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 24546308 kB LowFree: 898488 kB SwapTotal: 2031608 kB SwapFree: 2424 kB Dirty: 408 kB Writeback: 0 kB AnonPages: 5693420 kB Mapped: 7643580 kB Slab: 702452 kB PageTables: 182760 kB NFS_Unstable: 0 kB Bounce: 0 kB CommitLimit: 14304760 kB Committed_AS: 36210252 kB VmallocTotal: 34359738367 kB VmallocUsed: 1760 kB VmallocChunk: 34359736355 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 Hugepagesize: 2048 kB So you can see where you are. To be honest with you from my experience it sounds like you have enough resources for one container in other words one job, one executor, so you may have to live with it. also the command cat /proc/cpuinfo will also tell you how many cores you have and of course the summary will be in *top* command So I am not sure there is much you can do about except trying to get as much as you can. BTW are you also running hive on this host or any other database? *ipcs -m* ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x00000000 32768 gdm 600 393216 2 dest 0x00000000 3080193 oracle 640 4517888 212 0x00000000 3112962 oracle 640 7566524416 106 0x00000000 3145731 oracle 640 12259328 106 0x5b266174 3178500 oracle 640 36864 106 0xe916010c 103841797 sybase 600 15728640000 1 Anyway post the output and see how we can all help you. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 8 May 2016 at 12:34, Karen Murphy <k.l.mur...@qub.ac.uk> wrote: > > Hi Mich, > > I have just seen your reply for first time. Don't know why I can't see it > on the online mailing list, possibly just delayed. I kept checking it for > replies rather than logging in for emails. Thanks I will try what you > suggest. > > So if I'm reading your email correctly it sounds like I should be able to > increase the number of executors on local mode by adding hostnames for > localhost. > and cores per executor with SPARK_EXECUTOR_CORES. > And by starting master/slave(s) for local host I can access webui to see > whats going on. > > I did check 'free' and looks like there is a lot less free memory (from > the total of 16GB) than I had thought. In fact just under 6GB, no jobs > currently running, and just hadoop-daemons. > > Will reply to your email on the list when it appears to report results > when I have them, > > Thanks for help and sorry for confusion, > Karen > > > ________________________________________ > From: Mich Talebzadeh [mich.talebza...@gmail.com] > Sent: 07 May 2016 15:01 > To: Karen Murphy > Cc: user @spark > Subject: Re: Correct way of setting executor numbers and executor cores in > Spark 1.6.1 for non-clustered mode ? > > Check how much free memory you have on your hosr > > /usr/bin/free > > > as a heuristic values start with these in > > export SPARK_EXECUTOR_CORES=4 ##, Number of cores for the workers > (Default: 1). > export SPARK_EXECUTOR_MEMORY=8G ## , Memory per Worker (e.g. 1000M, 2G) > (Default: 1G) > export SPARK_DRIVER_MEMORY=1G ## , Memory for Master (e.g. 1000M, 2G) > (Default: 512 Mb) > > in conf/spark-env.sh and then increase another worker processes by adding > you standalone hostname to /conf/slaves. So that will create two worker > processes on that hostnames. > > do sbin/start-master.sh (if not started) and do start-slaves.sh. > > Log in to spark GUI websites for job on hostname:4040/executors/ > > And test your jobs for timing and completion. Adjust parameters > accordingly. Also make sure that memory/core ratio is reasonable. > Regardless of what you are using these are general confiiguration for > Spark. The important think for Spark is memory. Without it your application > will start spilling to disk and performance will suffer. Ensure that yu do > not starve OS from memory and cores. > > HTH > > > > Dr Mich Talebzadeh > > > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > > > http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/> > > > > On 7 May 2016 at 12:03, kmurph <k.l.mur...@qub.ac.uk<mailto: > k.l.mur...@qub.ac.uk>> wrote: > > Hi, > > I'm running spark 1.6.1 on a single machine, initially a small one (8 > cores, > 16GB ram) using "--master local[*]" to spark-submit and I'm trying to see > scaling with increasing cores, unsuccessfully. > Initially I'm setting SPARK_EXECUTOR_INSTANCES=1, and increasing cores for > each executor. The way I'm setting cores per executor is either with > "SPARK_EXECUTOR_CORES=1" (up to 4) and I also tried with " --conf > "spark.executor.cores=1 spark.executor.memory=9g". > I'm repartitioning the RDD of the large dataset into 4/8/10 partitions for > different runs. > > Am I setting executors/cores correctly for running Spark 1.6 > locally/Standalone mode ? > The logs show the same overall timings for execution of the key stages > (within a stage I see the number of tasks match the data partitioning > value) > whether I'm setting for 1, 4 or 8 cores per executor. And the process > table > looks like the requested cores aren't being used. > > I know eg. "--num.executors=X" is only an argument to Yarn. I can't find > specific instructions in one place for settings these params > (executors/cores) on Spark running on one machine. > > An example of my full spark-submit command is: > > SPARK_EXECUTOR_INSTANCES=1 SPARK_EXECUTOR_CORES=4 spark-submit --master > local[*] --conf "spark.executor.cores=4 spark.executor.memory=9g" --class > asap.examples.mllib.TfIdfExample > > /home/ubuntu/spark-1.6.1-bin-hadoop2.6/asap_ml/target/scala-2.10/ml-operators_2.10-1.0.jar > > Duplicated settings here but it shows the different ways I've been setting > the parameters. > > Thanks > Karen > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Correct-way-of-setting-executor-numbers-and-executor-cores-in-Spark-1-6-1-for-non-clustered-mode-tp26894.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org<mailto: > user-unsubscr...@spark.apache.org> > For additional commands, e-mail: user-h...@spark.apache.org<mailto: > user-h...@spark.apache.org> > > >