looks like it could be kryo related? i am only guessing here, but you can configure kryo buffers separately... see: spark.kryoserializer.buffer.mb
On Mon, Feb 17, 2014 at 7:49 PM, agg <[email protected]> wrote: > Hi guys, > > I'm trying to run a basic version of kmeans (not the mllib version), on > 250gb of data on 8 machines (each with 8 cores, and 60gb of ram). I've > tried many configurations, but keep getting an OutOfMemory error (at the > bottom). I've tried the following settings with > persist(MEMORY_AND_DISK_SER) and Kyro Serialization: > > System.setProperty("spark.executor.memory", "55g") > System.setProperty("spark.storage.memoryFraction", ".5") > System.setProperty("spark.default.parallelism", "5000") > > > OutOfMemory Error: > > 14/02/18 00:34:07 WARN cluster.ClusterTaskSetManager: Loss was due to > java.lang.OutOfMemoryError > java.lang.OutOfMemoryError: Java heap space > at it.unimi.dsi.fastutil.bytes.ByteArrays.grow(ByteArrays.java:170) > at > > it.unimi.dsi.fastutil.io.FastByteArrayOutputStream.write(FastByteArrayOutputStream.java:97) > at > > it.unimi.dsi.fastutil.io.FastBufferedOutputStream.dumpBuffer(FastBufferedOutputStream.java:120) > at > > it.unimi.dsi.fastutil.io.FastBufferedOutputStream.write(FastBufferedOutputStream.java:150) > at com.esotericsoftware.kryo.io.Output.flush(Output.java:155) > at com.esotericsoftware.kryo.io.Output.require(Output.java:135) > at > com.esotericsoftware.kryo.io.Output.writeAscii_slow(Output.java:446) > at com.esotericsoftware.kryo.io.Output.writeString(Output.java:306) > at > > com.esotericsoftware.kryo.util.DefaultClassResolver.writeName(DefaultClassResolver.java:105) > at > > com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:81) > at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:472) > at > com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:565) > at > > org.apache.spark.serializer.KryoSerializationStream.writeObject(KryoSerializer.scala:88) > at > > org.apache.spark.serializer.SerializationStream$class.writeAll(Serializer.scala:80) > at > > org.apache.spark.serializer.KryoSerializationStream.writeAll(KryoSerializer.scala:84) > at > > org.apache.spark.storage.BlockManager.dataSerializeStream(BlockManager.scala:815) > at > org.apache.spark.storage.BlockManager.dataSerialize(BlockManager.scala:824) > at > org.apache.spark.storage.MemoryStore.putValues(MemoryStore.scala:78) > at > org.apache.spark.storage.BlockManager.liftedTree1$1(BlockManager.scala:552) > at > org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:546) > at > org.apache.spark.storage.BlockManager.put(BlockManager.scala:477) > at > org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:76) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:224) > at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:29) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:226) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:237) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:226) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:159) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:100) > at org.apache.spark.scheduler.Task.run(Task.scala:53) > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemoryError-with-basic-kmeans-tp1651.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >
