please re-try with --driver-memory 10g . The default is 256m. -Xiangrui On Thu, Oct 9, 2014 at 2:33 AM, Clive Cox <clive....@rummble.com> wrote: > Hi, > > I'm trying out the DIMSUM item similarity from github master commit > 69c3f441a9b6e942d6c08afecd59a0349d61cc7b . My matrix is: > Num items : 8860 > Number of users : 5138702 > Implicit 1.0 values > Running item similarity with threshold :0.5 > > I have a 2 slave spark cluster on EC2 with m3.xlarge (13G each) > > I'm running out of heap space: > > Exception in thread "handle-read-write-executor-1" > java.lang.OutOfMemoryError: Java heap space > at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57) > at java.nio.ByteBuffer.allocate(ByteBuffer.java:331) > at org.apache.spark.network.nio.Message$.create(Message.scala:90) > > while Spark is doing: > > org.apache.spark.rdd.RDD.reduce(RDD.scala:865) > org.apache.spark.mllib.rdd.RDDFunctions.treeAggregate(RDDFunctions.scala:111) > org.apache.spark.mllib.linalg.distributed.RowMatrix.computeColumnSummaryStatistics(RowMatrix.scala:379) > org.apache.spark.mllib.linalg.distributed.RowMatrix.columnSimilarities(RowMatrix.scala:483) > > The spark UI said the shuffle read on this task at that point had used: > 162.6 MB > > I run spark submit from the master like below: > > ./spark/bin/spark-submit --executor-memory 13G .... --master spark://ec2.... > > Just wanted to check this is expected as the matrix doesn't seem excessively > big. Is there some memory setting I am missing? > > Thanks, > > Clive >
--------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org