Thank you Pat, you were right, when I run with Spark 1.3.1 with Mahout 0.10 I didn't get this error.
I’m trying to run Mahout with Spark with 20M, 50M, 1G and 10G. Anybody have any ideas how many machines with 6G in ram should I configure with Spark to be able to run this experiment? So far I configured 3 machines, but I think it will not be enough. On Tue, Jul 21, 2015 at 1:58 PM, Pat Ferrel <[email protected]> wrote: > That should be plenty of memory on you executors but is that where you are > running low? This may be a low heap on your driver/client code. > > Increase driver memory by setting MAHOUT_HEAPSIZE=6g or some such when > launching the driver. I think the default is 4g. If you are using Yarn the > answer is more complicated. > > The code creates a BiMaps for your user and item ids which will grow with > the size of your total string storage needs, are your ids very long? With > the default 4g of driver memory and the latest released 0.10.1 (be sure to > upgrade!) or master-0.11.0-snapshot code I wouldn’t expect to have this > problem. > > The current master mahout-0.11.0-snapshot has better partitioning as > Dmitriy mentions but it is built for Spark 1.3.1 so not sure if it is > backward compatible. Some things won’t work but spark-itemsimilarity may be > ok. Somehow I doubt you are running into a partitioning problem. > > On Jul 20, 2015, at 2:04 PM, Dmitriy Lyubimov <[email protected]> wrote: > > assuming task memory x number of cores does not exceed ~5g, and block cache > manager ratio does not have some really weird setting, the next best thing > to look at is initial task split size. I don' think in the release you are > looking at the driver manages initial off-dfs splits satisfactorily (that > is, in any way at all). Basically, you may want smaller splits, more tasks > than what DFS gives you from the beginning. These apps tend to run a bit > better when splits do not exceed 100...500k non-zero elements. > > I think Pat has done some stop-gap measure on current master for that > (which i don't believe is a true optimal thing to do though). > > On Mon, Jul 20, 2015 at 1:40 PM, Rodolfo Viana < > [email protected] > > wrote: > > > I’m trying to run Mahout 0.10 with Spark 1.1.1. > > I have input files with 8k, 10M, 20M, 25M. > > > > So far I run with the following configuration: > > > > 8k with 1,2,3 slaves > > 10M with 1, 2, 3 slaves > > 20M with 1,2,3 slaves > > > > But when I try to run > > bin/mahout spark-itemsimilarity --master spark://node1:7077 --input > > filein.txt --output out --sparkExecutorMem 6g > > > > with 25M I got this error: > > > > java.lang.OutOfMemoryError: Java heap space > > > > or > > > > java.lang.OutOfMemoryError: GC overhead limit exceeded > > > > > > Is that normal? Because when I was running 20M I didn’t get any error, > now > > I have 5M more. > > > > Any ideas why this is happening? > > > > -- > > Rodolfo de Lima Viana > > Undergraduate in Computer Science at UFCG > > > > -- Rodolfo de Lima Viana Undergraduate in Computer Science at UFCG
