Is spark-env.sh being sourced prior to running your job? The spark-shell script handles this automatically, but you may need to `source spark-env.sh` in the shell that runs your driver program in order for these environment variables to be set.
On Tue, Nov 19, 2013 at 4:25 PM, Gary Malouf <[email protected]> wrote: > In our spark-env.sh, we have: > > export MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos.so > > export ADD_JARS=/opt/spark/mx-lib/verrazano_2.9.3-0.1-SNAPSHOT-assembly.jar > > > if [ -z "$SPARK_JAVA_OPTS" ] ; then > > SPARK_JAVA_OPTS="-Xss20m -Dspark.local.dir=/opt/spark/tmp > -Dspark.executor.memory=3g > -Dspark.serializer=org.apache.spark.serializer.KryoSerializer > -Dspark.kryo.registrator=com.mediacrossing.verrazano.kryo.MxDataRegistrator" > > fi > > > # This is a workaround for > https://spark-project.atlassian.net/browse/SPARK-896 > > if [ -z "$SPARK_CLASSPATH" ] ; then > > SPARK_CLASSPATH=$ADD_JARS > > fi > > This is set on both the shell and all of the slaves. > > > On Tue, Nov 19, 2013 at 7:09 PM, Ewen Cheslack-Postava <[email protected]>wrote: > >> This line >> >> > 13/11/19 23:17:20 INFO MemoryStore: MemoryStore started with capacity >> 323.9 MB >> >> looks like what you'd get if you haven't set spark.executor.memory (or >> SPARK_MEM). Without setting it you'll get the default to 512m per >> executor and .66 of that for the cache. >> >> -Ewen >> ----- >> Ewen Cheslack-Postava >> StraightUp | http://readstraightup.com >> [email protected] >> (201) 286-7785 >> >> >> On Tue, Nov 19, 2013 at 3:54 PM, Gary Malouf <[email protected]> >> wrote: >> > To explain more, we upgraded from 0.7.3 to 0.9 incubating snapshot >> today and >> > are getting out of memory errors very quickly even though our cluster >> has >> > plenty of RAM and the data is relatively small: >> > >> > Using Scala version 2.9.3 (Java HotSpot(TM) 64-Bit Server VM, Java >> 1.7.0_21) >> > Initializing interpreter... >> > Creating SparkContext... >> > 13/11/19 23:17:20 INFO Slf4jEventHandler: Slf4jEventHandler started >> > 13/11/19 23:17:20 INFO SparkEnv: Registering BlockManagerMaster >> > 13/11/19 23:17:20 INFO DiskBlockManager: Created local directory at >> > /opt/spark/tmp/spark-local-20131119231720-a023 >> > 13/11/19 23:17:20 INFO MemoryStore: MemoryStore started with capacity >> 323.9 >> > MB. >> > 13/11/19 23:17:20 INFO ConnectionManager: Bound socket to port 11240 >> with id >> > = ConnectionManagerId(spark-shell-01,11240) >> > 13/11/19 23:17:20 INFO BlockManagerMaster: Trying to register >> BlockManager >> > 13/11/19 23:17:20 INFO BlockManagerMasterActor$BlockManagerInfo: >> Registering >> > block manager spark-shell-01:11240 with 323.9 MB RAM >> > 13/11/19 23:25:17 INFO BlockManagerMasterActor$BlockManagerInfo: >> Registering >> > block manager dn-02:50623 with 1943.0 MB RAM >> > 13/11/19 23:25:17 INFO BlockManagerMasterActor$BlockManagerInfo: >> Registering >> > block manager dn-01:61960 with 1943.0 MB RAM >> > 13/11/19 23:25:18 INFO BlockManagerMasterActor$BlockManagerInfo: >> Registering >> > block manager dn-03:45775 with 1943.0 MB RAM >> > >> > I've included memory store output for more information: >> > >> > >> > 13/11/19 23:40:38 INFO MemoryStore: ensureFreeSpace(113598) called with >> > curMem=0, maxMem=339585269 >> > 13/11/19 23:40:38 INFO MemoryStore: Block broadcast_0 stored as values >> to >> > memory (estimated size 110.9 KB, free 323.7 MB) >> > 13/11/19 23:40:38 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=113598, maxMem=339585269 >> > 13/11/19 23:40:38 INFO MemoryStore: Block broadcast_1 stored as values >> to >> > memory (estimated size 111.0 KB, free 323.6 MB) >> > 13/11/19 23:40:38 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=227244, maxMem=339585269 >> > 13/11/19 23:40:38 INFO MemoryStore: Block broadcast_2 stored as values >> to >> > memory (estimated size 111.0 KB, free 323.5 MB) >> > 13/11/19 23:40:38 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=340890, maxMem=339585269 >> > 13/11/19 23:40:38 INFO MemoryStore: Block broadcast_3 stored as values >> to >> > memory (estimated size 111.0 KB, free 323.4 MB) >> > 13/11/19 23:40:38 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=454536, maxMem=339585269 >> > 13/11/19 23:40:38 INFO MemoryStore: Block broadcast_4 stored as values >> to >> > memory (estimated size 111.0 KB, free 323.3 MB) >> > 13/11/19 23:40:39 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=568182, maxMem=339585269 >> > 13/11/19 23:40:39 INFO MemoryStore: Block broadcast_5 stored as values >> to >> > memory (estimated size 111.0 KB, free 323.2 MB) >> > 13/11/19 23:40:39 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=681828, maxMem=339585269 >> > 13/11/19 23:40:39 INFO MemoryStore: Block broadcast_6 stored as values >> to >> > memory (estimated size 111.0 KB, free 323.1 MB) >> > 13/11/19 23:40:39 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=795474, maxMem=339585269 >> > 13/11/19 23:40:39 INFO MemoryStore: Block broadcast_7 stored as values >> to >> > memory (estimated size 111.0 KB, free 323.0 MB) >> > 13/11/19 23:40:39 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=909120, maxMem=339585269 >> > 13/11/19 23:40:39 INFO MemoryStore: Block broadcast_8 stored as values >> to >> > memory (estimated size 111.0 KB, free 322.9 MB) >> > 13/11/19 23:40:39 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=1022766, maxMem=339585269 >> > 13/11/19 23:40:39 INFO MemoryStore: Block broadcast_9 stored as values >> to >> > memory (estimated size 111.0 KB, free 322.8 MB) >> > 13/11/19 23:40:39 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=1136412, maxMem=339585269 >> > 13/11/19 23:40:39 INFO MemoryStore: Block broadcast_10 stored as values >> to >> > memory (estimated size 111.0 KB, free 322.7 MB) >> > 13/11/19 23:40:39 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=1250058, maxMem=339585269 >> > 13/11/19 23:40:39 INFO MemoryStore: Block broadcast_11 stored as values >> to >> > memory (estimated size 111.0 KB, free 322.6 MB) >> > 13/11/19 23:40:39 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=1363704, maxMem=339585269 >> > 13/11/19 23:40:39 INFO MemoryStore: Block broadcast_12 stored as values >> to >> > memory (estimated size 111.0 KB, free 322.4 MB) >> > 13/11/19 23:40:39 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=1477350, maxMem=339585269 >> > 13/11/19 23:40:39 INFO MemoryStore: Block broadcast_13 stored as values >> to >> > memory (estimated size 111.0 KB, free 322.3 MB) >> > 13/11/19 23:40:39 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=1590996, maxMem=339585269 >> > 13/11/19 23:40:39 INFO MemoryStore: Block broadcast_14 stored as values >> to >> > memory (estimated size 111.0 KB, free 322.2 MB) >> > 13/11/19 23:40:39 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=1704642, maxMem=339585269 >> > 13/11/19 23:40:39 INFO MemoryStore: Block broadcast_15 stored as values >> to >> > memory (estimated size 111.0 KB, free 322.1 MB) >> > 13/11/19 23:40:39 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=1818288, maxMem=339585269 >> > 13/11/19 23:40:39 INFO MemoryStore: Block broadcast_16 stored as values >> to >> > memory (estimated size 111.0 KB, free 322.0 MB) >> > 13/11/19 23:40:39 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=1931934, maxMem=339585269 >> > 13/11/19 23:40:39 INFO MemoryStore: Block broadcast_17 stored as values >> to >> > memory (estimated size 111.0 KB, free 321.9 MB) >> > 13/11/19 23:40:39 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=2045580, maxMem=339585269 >> > 13/11/19 23:40:39 INFO MemoryStore: Block broadcast_18 stored as values >> to >> > memory (estimated size 111.0 KB, free 321.8 MB) >> > 13/11/19 23:40:39 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=2159226, maxMem=339585269 >> > 13/11/19 23:40:39 INFO MemoryStore: Block broadcast_19 stored as values >> to >> > memory (estimated size 111.0 KB, free 321.7 MB) >> > 13/11/19 23:40:39 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=2272872, maxMem=339585269 >> > 13/11/19 23:40:39 INFO MemoryStore: Block broadcast_20 stored as values >> to >> > memory (estimated size 111.0 KB, free 321.6 MB) >> > 13/11/19 23:40:40 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=2386518, maxMem=339585269 >> > 13/11/19 23:40:40 INFO MemoryStore: Block broadcast_21 stored as values >> to >> > memory (estimated size 111.0 KB, free 321.5 MB) >> > 13/11/19 23:40:40 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=2500164, maxMem=339585269 >> > 13/11/19 23:40:40 INFO MemoryStore: Block broadcast_22 stored as values >> to >> > memory (estimated size 111.0 KB, free 321.4 MB) >> > 13/11/19 23:40:40 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=2613810, maxMem=339585269 >> > 13/11/19 23:40:40 INFO MemoryStore: Block broadcast_23 stored as values >> to >> > memory (estimated size 111.0 KB, free 321.3 MB) >> > 13/11/19 23:40:40 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=2727456, maxMem=339585269 >> > 13/11/19 23:40:40 INFO MemoryStore: Block broadcast_24 stored as values >> to >> > memory (estimated size 111.0 KB, free 321.1 MB) >> > 13/11/19 23:40:40 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=2841102, maxMem=339585269 >> > 13/11/19 23:40:40 INFO MemoryStore: Block broadcast_25 stored as values >> to >> > memory (estimated size 111.0 KB, free 321.0 MB) >> > 13/11/19 23:40:40 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=2954748, maxMem=339585269 >> > 13/11/19 23:40:40 INFO MemoryStore: Block broadcast_26 stored as values >> to >> > memory (estimated size 111.0 KB, free 320.9 MB) >> > 13/11/19 23:40:40 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=3068394, maxMem=339585269 >> > 13/11/19 23:40:40 INFO MemoryStore: Block broadcast_27 stored as values >> to >> > memory (estimated size 111.0 KB, free 320.8 MB) >> > 13/11/19 23:40:40 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=3182040, maxMem=339585269 >> > 13/11/19 23:40:40 INFO MemoryStore: Block broadcast_28 stored as values >> to >> > memory (estimated size 111.0 KB, free 320.7 MB) >> > 13/11/19 23:40:40 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=3295686, maxMem=339585269 >> > 13/11/19 23:40:40 INFO MemoryStore: Block broadcast_29 stored as values >> to >> > memory (estimated size 111.0 KB, free 320.6 MB) >> > 13/11/19 23:40:40 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=3409332, maxMem=339585269 >> > 13/11/19 23:40:40 INFO MemoryStore: Block broadcast_30 stored as values >> to >> > memory (estimated size 111.0 KB, free 320.5 MB) >> > 13/11/19 23:40:40 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=3522978, maxMem=339585269 >> > 13/11/19 23:40:40 INFO MemoryStore: Block broadcast_31 stored as values >> to >> > memory (estimated size 111.0 KB, free 320.4 MB) >> > 13/11/19 23:40:40 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=3636624, maxMem=339585269 >> > 13/11/19 23:40:40 INFO MemoryStore: Block broadcast_32 stored as values >> to >> > memory (estimated size 111.0 KB, free 320.3 MB) >> > 13/11/19 23:40:40 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=3750270, maxMem=339585269 >> > 13/11/19 23:40:40 INFO MemoryStore: Block broadcast_33 stored as values >> to >> > memory (estimated size 111.0 KB, free 320.2 MB) >> > 13/11/19 23:40:40 INFO MemoryStore: ensureFreeSpace(113646) called with >> > curMem=3863916, maxMem=339585269 >> > 13/11/19 23:40:40 INFO MemoryStore: Block broadcast_34 stored as values >> to >> > memory (estimated size 111.0 KB, free 320.1 MB) >> > 13/11/19 23:40:40 INFO MemoryStore.. >> > >> > >> > >> > Thanks, >> > >> > Gary >> > >> > >> > On Tue, Nov 19, 2013 at 6:22 PM, Gary Malouf <[email protected]> >> wrote: >> >> >> >> We have a 4 node Spark cluster with 3 gigs of ram available per >> executor >> >> (via the spark.executor.memory setting). When we run a Spark job, we >> see >> >> the following output: >> >> >> >> Using Scala version 2.9.3 (Java HotSpot(TM) 64-Bit Server VM, Java >> >> 1.7.0_21) >> >> Initializing interpreter... >> >> Creating SparkContext... >> >> 13/11/19 23:17:20 INFO Slf4jEventHandler: Slf4jEventHandler started >> >> 13/11/19 23:17:20 INFO SparkEnv: Registering BlockManagerMaster >> >> 13/11/19 23:17:20 INFO DiskBlockManager: Created local directory at >> >> /opt/spark/tmp/spark-local-20131119231720-a023 >> >> 13/11/19 23:17:20 INFO MemoryStore: MemoryStore started with capacity >> >> 323.9 MB. >> >> 13/11/19 23:17:20 INFO ConnectionManager: Bound socket to port 11240 >> with >> >> id = ConnectionManagerId(spark-shell-01,11240) >> >> 13/11/19 23:17:20 INFO BlockManagerMaster: Trying to register >> BlockManager >> >> 13/11/19 23:17:20 INFO BlockManagerMasterActor$BlockManagerInfo: >> >> Registering block manager spark-shell-01:11240 with 323.9 MB RAM >> >> >> >> Is this right? I feel like much more RAM should be available. >> > >> > >> > >
