Hi Josh, We are running our job within the spark shell, so I think it is being sourced. We are on Mesos 0.13 and Spark 0.9 SNAPSHOT taken from around 9am eastern this morning.
Any other possible culprits? On Tue, Nov 19, 2013 at 7:30 PM, Josh Rosen <[email protected]> wrote: > Is spark-env.sh being sourced prior to running your job? The spark-shell > script handles this automatically, but you may need to `source > spark-env.sh` in the shell that runs your driver program in order for these > environment variables to be set. > > > On Tue, Nov 19, 2013 at 4:25 PM, Gary Malouf <[email protected]>wrote: > >> In our spark-env.sh, we have: >> >> export MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos.so >> >> export >> ADD_JARS=/opt/spark/mx-lib/verrazano_2.9.3-0.1-SNAPSHOT-assembly.jar >> >> >> if [ -z "$SPARK_JAVA_OPTS" ] ; then >> >> SPARK_JAVA_OPTS="-Xss20m -Dspark.local.dir=/opt/spark/tmp >> -Dspark.executor.memory=3g >> -Dspark.serializer=org.apache.spark.serializer.KryoSerializer >> -Dspark.kryo.registrator=com.mediacrossing.verrazano.kryo.MxDataRegistrator" >> >> fi >> >> >> # This is a workaround for >> https://spark-project.atlassian.net/browse/SPARK-896 >> >> if [ -z "$SPARK_CLASSPATH" ] ; then >> >> SPARK_CLASSPATH=$ADD_JARS >> >> fi >> >> This is set on both the shell and all of the slaves. >> >> >> On Tue, Nov 19, 2013 at 7:09 PM, Ewen Cheslack-Postava >> <[email protected]>wrote: >> >>> This line >>> >>> > 13/11/19 23:17:20 INFO MemoryStore: MemoryStore started with capacity >>> 323.9 MB >>> >>> looks like what you'd get if you haven't set spark.executor.memory (or >>> SPARK_MEM). Without setting it you'll get the default to 512m per >>> executor and .66 of that for the cache. >>> >>> -Ewen >>> ----- >>> Ewen Cheslack-Postava >>> StraightUp | http://readstraightup.com >>> [email protected] >>> (201) 286-7785 >>> >>> >>> On Tue, Nov 19, 2013 at 3:54 PM, Gary Malouf <[email protected]> >>> wrote: >>> > To explain more, we upgraded from 0.7.3 to 0.9 incubating snapshot >>> today and >>> > are getting out of memory errors very quickly even though our cluster >>> has >>> > plenty of RAM and the data is relatively small: >>> > >>> > Using Scala version 2.9.3 (Java HotSpot(TM) 64-Bit Server VM, Java >>> 1.7.0_21) >>> > Initializing interpreter... >>> > Creating SparkContext... >>> > 13/11/19 23:17:20 INFO Slf4jEventHandler: Slf4jEventHandler started >>> > 13/11/19 23:17:20 INFO SparkEnv: Registering BlockManagerMaster >>> > 13/11/19 23:17:20 INFO DiskBlockManager: Created local directory at >>> > /opt/spark/tmp/spark-local-20131119231720-a023 >>> > 13/11/19 23:17:20 INFO MemoryStore: MemoryStore started with capacity >>> 323.9 >>> > MB. >>> > 13/11/19 23:17:20 INFO ConnectionManager: Bound socket to port 11240 >>> with id >>> > = ConnectionManagerId(spark-shell-01,11240) >>> > 13/11/19 23:17:20 INFO BlockManagerMaster: Trying to register >>> BlockManager >>> > 13/11/19 23:17:20 INFO BlockManagerMasterActor$BlockManagerInfo: >>> Registering >>> > block manager spark-shell-01:11240 with 323.9 MB RAM >>> > 13/11/19 23:25:17 INFO BlockManagerMasterActor$BlockManagerInfo: >>> Registering >>> > block manager dn-02:50623 with 1943.0 MB RAM >>> > 13/11/19 23:25:17 INFO BlockManagerMasterActor$BlockManagerInfo: >>> Registering >>> > block manager dn-01:61960 with 1943.0 MB RAM >>> > 13/11/19 23:25:18 INFO BlockManagerMasterActor$BlockManagerInfo: >>> Registering >>> > block manager dn-03:45775 with 1943.0 MB RAM >>> > >>> > I've included memory store output for more information: >>> > >>> > >>> > 13/11/19 23:40:38 INFO MemoryStore: ensureFreeSpace(113598) called with >>> > curMem=0, maxMem=339585269 >>> > 13/11/19 23:40:38 INFO MemoryStore: Block broadcast_0 stored as values >>> to >>> > memory (estimated size 110.9 KB, free 323.7 MB) >>> > 13/11/19 23:40:38 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=113598, maxMem=339585269 >>> > 13/11/19 23:40:38 INFO MemoryStore: Block broadcast_1 stored as values >>> to >>> > memory (estimated size 111.0 KB, free 323.6 MB) >>> > 13/11/19 23:40:38 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=227244, maxMem=339585269 >>> > 13/11/19 23:40:38 INFO MemoryStore: Block broadcast_2 stored as values >>> to >>> > memory (estimated size 111.0 KB, free 323.5 MB) >>> > 13/11/19 23:40:38 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=340890, maxMem=339585269 >>> > 13/11/19 23:40:38 INFO MemoryStore: Block broadcast_3 stored as values >>> to >>> > memory (estimated size 111.0 KB, free 323.4 MB) >>> > 13/11/19 23:40:38 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=454536, maxMem=339585269 >>> > 13/11/19 23:40:38 INFO MemoryStore: Block broadcast_4 stored as values >>> to >>> > memory (estimated size 111.0 KB, free 323.3 MB) >>> > 13/11/19 23:40:39 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=568182, maxMem=339585269 >>> > 13/11/19 23:40:39 INFO MemoryStore: Block broadcast_5 stored as values >>> to >>> > memory (estimated size 111.0 KB, free 323.2 MB) >>> > 13/11/19 23:40:39 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=681828, maxMem=339585269 >>> > 13/11/19 23:40:39 INFO MemoryStore: Block broadcast_6 stored as values >>> to >>> > memory (estimated size 111.0 KB, free 323.1 MB) >>> > 13/11/19 23:40:39 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=795474, maxMem=339585269 >>> > 13/11/19 23:40:39 INFO MemoryStore: Block broadcast_7 stored as values >>> to >>> > memory (estimated size 111.0 KB, free 323.0 MB) >>> > 13/11/19 23:40:39 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=909120, maxMem=339585269 >>> > 13/11/19 23:40:39 INFO MemoryStore: Block broadcast_8 stored as values >>> to >>> > memory (estimated size 111.0 KB, free 322.9 MB) >>> > 13/11/19 23:40:39 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=1022766, maxMem=339585269 >>> > 13/11/19 23:40:39 INFO MemoryStore: Block broadcast_9 stored as values >>> to >>> > memory (estimated size 111.0 KB, free 322.8 MB) >>> > 13/11/19 23:40:39 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=1136412, maxMem=339585269 >>> > 13/11/19 23:40:39 INFO MemoryStore: Block broadcast_10 stored as >>> values to >>> > memory (estimated size 111.0 KB, free 322.7 MB) >>> > 13/11/19 23:40:39 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=1250058, maxMem=339585269 >>> > 13/11/19 23:40:39 INFO MemoryStore: Block broadcast_11 stored as >>> values to >>> > memory (estimated size 111.0 KB, free 322.6 MB) >>> > 13/11/19 23:40:39 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=1363704, maxMem=339585269 >>> > 13/11/19 23:40:39 INFO MemoryStore: Block broadcast_12 stored as >>> values to >>> > memory (estimated size 111.0 KB, free 322.4 MB) >>> > 13/11/19 23:40:39 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=1477350, maxMem=339585269 >>> > 13/11/19 23:40:39 INFO MemoryStore: Block broadcast_13 stored as >>> values to >>> > memory (estimated size 111.0 KB, free 322.3 MB) >>> > 13/11/19 23:40:39 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=1590996, maxMem=339585269 >>> > 13/11/19 23:40:39 INFO MemoryStore: Block broadcast_14 stored as >>> values to >>> > memory (estimated size 111.0 KB, free 322.2 MB) >>> > 13/11/19 23:40:39 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=1704642, maxMem=339585269 >>> > 13/11/19 23:40:39 INFO MemoryStore: Block broadcast_15 stored as >>> values to >>> > memory (estimated size 111.0 KB, free 322.1 MB) >>> > 13/11/19 23:40:39 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=1818288, maxMem=339585269 >>> > 13/11/19 23:40:39 INFO MemoryStore: Block broadcast_16 stored as >>> values to >>> > memory (estimated size 111.0 KB, free 322.0 MB) >>> > 13/11/19 23:40:39 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=1931934, maxMem=339585269 >>> > 13/11/19 23:40:39 INFO MemoryStore: Block broadcast_17 stored as >>> values to >>> > memory (estimated size 111.0 KB, free 321.9 MB) >>> > 13/11/19 23:40:39 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=2045580, maxMem=339585269 >>> > 13/11/19 23:40:39 INFO MemoryStore: Block broadcast_18 stored as >>> values to >>> > memory (estimated size 111.0 KB, free 321.8 MB) >>> > 13/11/19 23:40:39 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=2159226, maxMem=339585269 >>> > 13/11/19 23:40:39 INFO MemoryStore: Block broadcast_19 stored as >>> values to >>> > memory (estimated size 111.0 KB, free 321.7 MB) >>> > 13/11/19 23:40:39 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=2272872, maxMem=339585269 >>> > 13/11/19 23:40:39 INFO MemoryStore: Block broadcast_20 stored as >>> values to >>> > memory (estimated size 111.0 KB, free 321.6 MB) >>> > 13/11/19 23:40:40 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=2386518, maxMem=339585269 >>> > 13/11/19 23:40:40 INFO MemoryStore: Block broadcast_21 stored as >>> values to >>> > memory (estimated size 111.0 KB, free 321.5 MB) >>> > 13/11/19 23:40:40 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=2500164, maxMem=339585269 >>> > 13/11/19 23:40:40 INFO MemoryStore: Block broadcast_22 stored as >>> values to >>> > memory (estimated size 111.0 KB, free 321.4 MB) >>> > 13/11/19 23:40:40 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=2613810, maxMem=339585269 >>> > 13/11/19 23:40:40 INFO MemoryStore: Block broadcast_23 stored as >>> values to >>> > memory (estimated size 111.0 KB, free 321.3 MB) >>> > 13/11/19 23:40:40 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=2727456, maxMem=339585269 >>> > 13/11/19 23:40:40 INFO MemoryStore: Block broadcast_24 stored as >>> values to >>> > memory (estimated size 111.0 KB, free 321.1 MB) >>> > 13/11/19 23:40:40 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=2841102, maxMem=339585269 >>> > 13/11/19 23:40:40 INFO MemoryStore: Block broadcast_25 stored as >>> values to >>> > memory (estimated size 111.0 KB, free 321.0 MB) >>> > 13/11/19 23:40:40 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=2954748, maxMem=339585269 >>> > 13/11/19 23:40:40 INFO MemoryStore: Block broadcast_26 stored as >>> values to >>> > memory (estimated size 111.0 KB, free 320.9 MB) >>> > 13/11/19 23:40:40 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=3068394, maxMem=339585269 >>> > 13/11/19 23:40:40 INFO MemoryStore: Block broadcast_27 stored as >>> values to >>> > memory (estimated size 111.0 KB, free 320.8 MB) >>> > 13/11/19 23:40:40 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=3182040, maxMem=339585269 >>> > 13/11/19 23:40:40 INFO MemoryStore: Block broadcast_28 stored as >>> values to >>> > memory (estimated size 111.0 KB, free 320.7 MB) >>> > 13/11/19 23:40:40 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=3295686, maxMem=339585269 >>> > 13/11/19 23:40:40 INFO MemoryStore: Block broadcast_29 stored as >>> values to >>> > memory (estimated size 111.0 KB, free 320.6 MB) >>> > 13/11/19 23:40:40 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=3409332, maxMem=339585269 >>> > 13/11/19 23:40:40 INFO MemoryStore: Block broadcast_30 stored as >>> values to >>> > memory (estimated size 111.0 KB, free 320.5 MB) >>> > 13/11/19 23:40:40 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=3522978, maxMem=339585269 >>> > 13/11/19 23:40:40 INFO MemoryStore: Block broadcast_31 stored as >>> values to >>> > memory (estimated size 111.0 KB, free 320.4 MB) >>> > 13/11/19 23:40:40 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=3636624, maxMem=339585269 >>> > 13/11/19 23:40:40 INFO MemoryStore: Block broadcast_32 stored as >>> values to >>> > memory (estimated size 111.0 KB, free 320.3 MB) >>> > 13/11/19 23:40:40 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=3750270, maxMem=339585269 >>> > 13/11/19 23:40:40 INFO MemoryStore: Block broadcast_33 stored as >>> values to >>> > memory (estimated size 111.0 KB, free 320.2 MB) >>> > 13/11/19 23:40:40 INFO MemoryStore: ensureFreeSpace(113646) called with >>> > curMem=3863916, maxMem=339585269 >>> > 13/11/19 23:40:40 INFO MemoryStore: Block broadcast_34 stored as >>> values to >>> > memory (estimated size 111.0 KB, free 320.1 MB) >>> > 13/11/19 23:40:40 INFO MemoryStore.. >>> > >>> > >>> > >>> > Thanks, >>> > >>> > Gary >>> > >>> > >>> > On Tue, Nov 19, 2013 at 6:22 PM, Gary Malouf <[email protected]> >>> wrote: >>> >> >>> >> We have a 4 node Spark cluster with 3 gigs of ram available per >>> executor >>> >> (via the spark.executor.memory setting). When we run a Spark job, we >>> see >>> >> the following output: >>> >> >>> >> Using Scala version 2.9.3 (Java HotSpot(TM) 64-Bit Server VM, Java >>> >> 1.7.0_21) >>> >> Initializing interpreter... >>> >> Creating SparkContext... >>> >> 13/11/19 23:17:20 INFO Slf4jEventHandler: Slf4jEventHandler started >>> >> 13/11/19 23:17:20 INFO SparkEnv: Registering BlockManagerMaster >>> >> 13/11/19 23:17:20 INFO DiskBlockManager: Created local directory at >>> >> /opt/spark/tmp/spark-local-20131119231720-a023 >>> >> 13/11/19 23:17:20 INFO MemoryStore: MemoryStore started with capacity >>> >> 323.9 MB. >>> >> 13/11/19 23:17:20 INFO ConnectionManager: Bound socket to port 11240 >>> with >>> >> id = ConnectionManagerId(spark-shell-01,11240) >>> >> 13/11/19 23:17:20 INFO BlockManagerMaster: Trying to register >>> BlockManager >>> >> 13/11/19 23:17:20 INFO BlockManagerMasterActor$BlockManagerInfo: >>> >> Registering block manager spark-shell-01:11240 with 323.9 MB RAM >>> >> >>> >> Is this right? I feel like much more RAM should be available. >>> > >>> > >>> >> >> >
