Re: DIMSUM item similarity tests

Xiangrui Meng Thu, 09 Oct 2014 09:25:07 -0700

please re-try with --driver-memory 10g . The default is 256m. -Xiangrui

On Thu, Oct 9, 2014 at 2:33 AM, Clive Cox <clive....@rummble.com> wrote:
> Hi,
>
>  I'm trying out the DIMSUM item similarity from github master commit
> 69c3f441a9b6e942d6c08afecd59a0349d61cc7b . My matrix is:
> Num items : 8860
> Number of users : 5138702
> Implicit 1.0 values
> Running item similarity with threshold :0.5
>
> I have a 2 slave spark cluster on EC2 with m3.xlarge (13G each)
>
> I'm running out of heap space:
>
> Exception in thread "handle-read-write-executor-1"
> java.lang.OutOfMemoryError: Java heap space
>     at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
>     at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
>     at org.apache.spark.network.nio.Message$.create(Message.scala:90)
>
> while Spark is doing:
>
> org.apache.spark.rdd.RDD.reduce(RDD.scala:865)
> org.apache.spark.mllib.rdd.RDDFunctions.treeAggregate(RDDFunctions.scala:111)
> org.apache.spark.mllib.linalg.distributed.RowMatrix.computeColumnSummaryStatistics(RowMatrix.scala:379)
> org.apache.spark.mllib.linalg.distributed.RowMatrix.columnSimilarities(RowMatrix.scala:483)
>
> The spark UI said the shuffle read on this task at that point had used:
> 162.6 MB
>
> I run spark submit from the master like below:
>
> ./spark/bin/spark-submit --executor-memory 13G .... --master spark://ec2....
>
> Just wanted to check this is expected as the matrix doesn't seem excessively
> big. Is there some memory setting I am missing?
>
> Thanks,
>
>  Clive
>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: DIMSUM item similarity tests

Reply via email to