Aaron, spark.executor.memory is set to 2454m in my spark-defaults.conf, which is a reasonable value for EC2 instances which I use (they are m3.medium machines). However, it doesn't help and each executor uses only 512 MB of memory. To figure out why, I examined spark-submit and spark-class scripts and found that java options and java memory size are computed in the spark-class script (see OUR_JAVA_OPTS and OUR_JAVA_MEM variables in that script). Then these values are used to compose the following string:
JAVA_OPTS="$JAVA_OPTS -Xms$OUR_JAVA_MEM -Xmx$OUR_JAVA_MEM" Note that OUR_JAVA_MEM is appended to the end of the string. For some reason which I haven't found yet, OUR_JAVA_MEM is set to its default value - 512 MB. I was able to fix it only by setting the SPARM_MEM variable in the spark-env.sh file: export SPARK_MEM=2g However, this variable is deprecated, so my solution doesn't seem to be good :) On Thu, Jun 12, 2014 at 10:16 PM, Aaron Davidson <ilike...@gmail.com> wrote: > The scripts for Spark 1.0 actually specify this property in > /root/spark/conf/spark-defaults.conf > > I didn't know that this would override the --executor-memory flag, though, > that's pretty odd. > > > On Thu, Jun 12, 2014 at 6:02 PM, Aliaksei Litouka < > aliaksei.lito...@gmail.com> wrote: > >> Yes, I am launching a cluster with the spark_ec2 script. I checked >> /root/spark/conf/spark-env.sh on the master node and on slaves and it looks >> like this: >> >> #!/usr/bin/env bash >>> export SPARK_LOCAL_DIRS="/mnt/spark" >>> # Standalone cluster options >>> export SPARK_MASTER_OPTS="" >>> export SPARK_WORKER_INSTANCES=1 >>> export SPARK_WORKER_CORES=1 >>> export HADOOP_HOME="/root/ephemeral-hdfs" >>> export SPARK_MASTER_IP=ec2-54-89-95-238.compute-1.amazonaws.com >>> export MASTER=`cat /root/spark-ec2/cluster-url` >>> export >>> SPARK_SUBMIT_LIBRARY_PATH="$SPARK_SUBMIT_LIBRARY_PATH:/root/ephemeral-hdfs/lib/native/" >>> export >>> SPARK_SUBMIT_CLASSPATH="$SPARK_CLASSPATH:$SPARK_SUBMIT_CLASSPATH:/root/ephemeral-hdfs/conf" >>> # Bind Spark's web UIs to this machine's public EC2 hostname: >>> export SPARK_PUBLIC_DNS=`wget -q -O - >>> http://169.254.169.254/latest/meta-data/public-hostname` >>> <http://169.254.169.254/latest/meta-data/public-hostname> >>> # Set a high ulimit for large shuffles >>> ulimit -n 1000000 >> >> >> None of these variables seem to be related to memory size. Let me know if >> I am missing something. >> >> >> On Thu, Jun 12, 2014 at 7:17 PM, Matei Zaharia <matei.zaha...@gmail.com> >> wrote: >> >>> Are you launching this using our EC2 scripts? Or have you set up a >>> cluster by hand? >>> >>> Matei >>> >>> On Jun 12, 2014, at 2:32 PM, Aliaksei Litouka < >>> aliaksei.lito...@gmail.com> wrote: >>> >>> spark-env.sh doesn't seem to contain any settings related to memory size >>> :( I will continue searching for a solution and will post it if I find it :) >>> Thank you, anyway >>> >>> >>> On Wed, Jun 11, 2014 at 12:19 AM, Matei Zaharia <matei.zaha...@gmail.com >>> > wrote: >>> >>>> It might be that conf/spark-env.sh on EC2 is configured to set it to >>>> 512, and is overriding the application’s settings. Take a look in there and >>>> delete that line if possible. >>>> >>>> Matei >>>> >>>> On Jun 10, 2014, at 2:38 PM, Aliaksei Litouka < >>>> aliaksei.lito...@gmail.com> wrote: >>>> >>>> > I am testing my application in EC2 cluster of m3.medium machines. By >>>> default, only 512 MB of memory on each machine is used. I want to increase >>>> this amount and I'm trying to do it by passing --executor-memory 2G option >>>> to the spark-submit script, but it doesn't seem to work - each machine uses >>>> only 512 MB instead of 2 gigabytes. What am I doing wrong? How do I >>>> increase the amount of memory? >>>> >>>> >>> >>> >> >