Hi guys,
I'm trying to run a basic version of kmeans (not the mllib version), on
250gb of data on 8 machines (each with 8 cores, and 60gb of ram). I've
tried many configurations, but keep getting an OutOfMemory error (at the
bottom). I've tried the following settings with
persist(MEMORY_AND_DISK_SER) and Kyro Serialization:
System.setProperty("spark.executor.memory", "55g")
System.setProperty("spark.storage.memoryFraction", ".5")
System.setProperty("spark.default.parallelism", "5000")
OutOfMemory Error:
14/02/18 00:34:07 WARN cluster.ClusterTaskSetManager: Loss was due to
java.lang.OutOfMemoryError
java.lang.OutOfMemoryError: Java heap space
at it.unimi.dsi.fastutil.bytes.ByteArrays.grow(ByteArrays.java:170)
at
it.unimi.dsi.fastutil.io.FastByteArrayOutputStream.write(FastByteArrayOutputStream.java:97)
at
it.unimi.dsi.fastutil.io.FastBufferedOutputStream.dumpBuffer(FastBufferedOutputStream.java:120)
at
it.unimi.dsi.fastutil.io.FastBufferedOutputStream.write(FastBufferedOutputStream.java:150)
at com.esotericsoftware.kryo.io.Output.flush(Output.java:155)
at com.esotericsoftware.kryo.io.Output.require(Output.java:135)
at com.esotericsoftware.kryo.io.Output.writeAscii_slow(Output.java:446)
at com.esotericsoftware.kryo.io.Output.writeString(Output.java:306)
at
com.esotericsoftware.kryo.util.DefaultClassResolver.writeName(DefaultClassResolver.java:105)
at
com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:81)
at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:472)
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:565)
at
org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:88)
at
org.apache.spark.serializer.SerializationStream$class.writeAll(Serializer.scala:80)
at
org.apache.spark.serializer.KryoSerializationStream.writeAll(KryoSerializer.scala:84)
at
org.apache.spark.storage.BlockManager.dataSerializeStream(BlockManager.scala:815)
at
org.apache.spark.storage.BlockManager.dataSerialize(BlockManager.scala:824)
at org.apache.spark.storage.MemoryStore.putValues(MemoryStore.scala:78)
at
org.apache.spark.storage.BlockManager.liftedTree1$1(BlockManager.scala:552)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:546)
at org.apache.spark.storage.BlockManager.put(BlockManager.scala:477)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:76)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:224)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:29)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:226)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:159)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:100)
at org.apache.spark.scheduler.Task.run(Task.scala:53)
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemoryError-with-basic-kmeans-tp1651.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.