Re: spark-rowsimilarity java.lang.OutOfMemoryError: Java heap space

2015-05-19 Thread Pat Ferrel
The way the code work is: 1) create a BiMap for every id space in the client code (users and items). This is non-distributed code, typically run on the machine you launch from although in yarn-cluster mode the actual machine may be different. In any case the heap used is associated with the driv

Re: spark-rowsimilarity java.lang.OutOfMemoryError: Java heap space

2015-05-18 Thread Xavier Rampino
I just did that but I ran into the same problem, I feel like -sem doesn't work with my setup. For instance I have : 15/05/18 13:44:39 INFO BlockManagerInfo: Removed broadcast_13_piece0 on localhost:60596 in memory (size: 2.7 KB, free: *1761.1 MB*) (Maybe it's not related though) On Wed, May 13,

Re: spark-rowsimilarity java.lang.OutOfMemoryError: Java heap space

2015-05-13 Thread Pat Ferrel
There is a bug in mahout 0.10.0 that you can fix if you are able to build from source. Get the source tar for 0.10.0, not the current master. Got to https://github.com/apache/mahout/blob/mahout-0.10.x/spark/src/main/scala/org/apache/mahout/drivers/TextDelimitedReaderWriter.scala#L157 remove the