ehhhhh......it's hard to say why 9g is not enough, but your file is 7g, and object each string in that file must need more memory; I think you can somehow try to using hdfs store processing data, instead of putting everything in the memory.
2013/12/17 [email protected] <[email protected]> > HI, > I have set my config with : > export SPARK_WORKER_MEMORY=1024m > export SPARK_DAEMON_JAVA_OPTS=9000m > Why the memory is still not enough ? > > Thanks > > ------------------------------ > [email protected] > > *From:* Jie Deng <[email protected]> > *Date:* 2013-12-17 19:44 > *To:* user <[email protected]> > *Subject:* Re: OOM, help > Hi,Leo, > > I think java.lang.OutOfMemoryError: Java heap space is caused by java > memory problem, no connection with spark. > Just try -Xmx: more memory when start jvm > > > 2013/12/17 [email protected] <[email protected]> > >> hello everyone, >> I have a problem when I run the wordcount example. I read data from hdfs >> , its almost 7G. >> I haven't seen the info from the web ui or sparkhome/work . This is the >> console info : >> ..... >> 13/12/16 19:48:02 INFO LocalTaskSetManager: Size of task 52 is 1834 >> bytes >> 13/12/16 19:48:02 INFO LocalScheduler: Running 52 >> 13/12/16 19:48:02 INFO BlockFetcherIterator$BasicBlockFetcherIterator: >> Getting 52 non-zero-bytes blocks out of 52 blocks >> 13/12/16 19:48:02 INFO BlockFetcherIterator$BasicBlockFetcherIterator: >> Started 0 remote gets in 7 ms >> 13/12/16 19:48:09 INFO LocalTaskSetManager: Loss was due to >> java.lang.OutOfMemoryError >> java.lang.OutOfMemoryError: Java heap space >> at java.util.Arrays.copyOf(Arrays.java:2271) >> at >> java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) >> at >> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) >> at >> java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) >> at >> java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1857) >> at >> java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1766) >> at >> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1185) >> at >> java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:346) >> at >> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:27) >> at >> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:47) >> at >> org.apache.spark.scheduler.local.LocalScheduler.runTask(LocalScheduler.scala:204) >> at >> org.apache.spark.scheduler.local.LocalActor$$anonfun$launchTask$1$$anon$1.run(LocalScheduler.scala:68) >> at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >> at >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) >> at java.util.concurrent.FutureTask.run(FutureTask.java:166) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) >> at java.lang.Thread.run(Thread.java:722) >> 13/12/16 19:48:09 INFO LocalScheduler: Remove TaskSet 0.0 from pool >> 13/12/16 19:48:09 INFO DAGScheduler: Failed to run collect at <console>:17 >> org.apache.spark.SparkException: Job failed: Task 0.0:0 failed more than >> 4 times; aborting job java.lang.OutOfMemoryError: Java heap space >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:760) >> at >> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:758) >> at >> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60) >> at >> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) >> at >> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:758) >> at >> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:379) >> at org.apache.spark.scheduler.DAGScheduler.org >> $apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:441) >> at >> org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:149) >> >> this is my spark-env.sh : >> >> export >> SPARK_HOME=/home/lh1/spark_hadoopapp/spark-0.8.0-hadoop2.0.0-cdh4.2.1 >> export JAVA_HOME=/home/lh1/app/jdk1.7.0 >> export SCALA_HOME=/home/lh1/sparkapp/scala-2.9.3 >> export SPARK_WORKER_CORES=2 >> export SPARK_WORKER_MEMORY=1024m >> export SPARK_WORKER_INSTANCES=2 >> export SPARK_DAEMON_JAVA_OPTS=9000m >> >> I just started to use Spark , so can you give me some suggestions ? >> >> Thanks . >> >> Leo >> ------------------------------ >> >> ------------------------------ >> [email protected] >> > >
