Hi again, I think I found some problems in my setup and I will rerun experiments soon. When using 32,64 machines I think that not enough mappers/reducers are allocated. Regarding the patch, I still need it, I run all experiments with D=20, on D=30 and above I get memory errors.
Thanks! On Sun, Mar 6, 2011 at 4:02 PM, Sebastian Schelter <[email protected]> wrote: > Hi Danny, > > thanks for the nice writeup! I'm a little bit disappointed about the > performance though... > > Seems you got around those memory problems from last week without my patch, > which is good, since I unfortunately didn't have the time to finish that one > yet. > > > > > > On 05.03.2011 01:33, Danny Bickson wrote: > >> Hi Sebastian, >> As promised, you can find some results for testing your ALS code, on 64 >> high performance Amazon EC2 machines (with up to 1,024 cores). >> >> http://bickson.blogspot.com/2011/03/tunning-hadoop-configuration-for-high.html >> >> I would love to get any feedback you or others may have about the setup >> of this experiment. >> >> Best, >> >> Danny Bickson >> >> On Wed, Feb 23, 2011 at 4:41 PM, Sebastian Schelter <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hi Danny, >> >> please send all mails to [email protected] >> <mailto:[email protected]> instead of directly sending them to >> >> me, there are a lot of smart people on that list that might join >> with advice. >> >> I'm very excited that you intensively test this code and I'm >> positively suprised to see it give good results. Thank you for the >> effort you put into that! >> >> The exception seems to occur when ALSEvaluator is run. The code uses >> a quick and dirty approach to compute the error of the model as it >> just loads the user and item feature matrices completely into >> memory. With an increasing number of features memory consumption is >> getting too large. >> >> The code of that evaluator step needs to be changed, so that each >> (user,item) pair for which the rating shall be predicted gets joined >> with the according user and item feature vectors in a way that they >> are mapped to the same key and go to the same reducer which can then >> compute the error. >> >> I already started implementing something like this, but I don't have >> a lot of time these days unfortunately. I could update the patch >> during the next week if that's ok for you. >> >> --sebastian >> >> >> >> >> On 23.02.2011 21:57, Danny Bickson wrote: >> >> Another exception I am getting: >> >> 11/02/23 20:45:34 INFO common.AbstractJob: Command line arguments: >> {--endPhase=2147483647, --itemFeatures=/tmp/als/out/M/ >> , --probes=/user/ubuntu/myout/probeSet/, --startPhase=0, >> --tempDir=temp, >> --userFeatures=/tmp/als/out/U/} >> Exception in thread "main" java.lang.OutOfMemoryError: Java heap >> space >> at >> >> >> org.apache.mahout.math.map.OpenIntDoubleHashMap.rehash(OpenIntDoubleHashMap.java:433) >> at >> >> >> org.apache.mahout.math.map.OpenIntDoubleHashMap.put(OpenIntDoubleHashMap.java:387) >> at >> >> >> org.apache.mahout.math.RandomAccessSparseVector.setQuick(RandomAccessSparseVector.java:134) >> at >> >> org.apache.mahout.math.VectorWritable.readFields(VectorWritable.java:113) >> at >> >> >> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1751) >> at >> >> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1879) >> at >> >> org.apache.mahout.utils.eval.ALSEvaluator.readMatrix(ALSEvaluator.java:113) >> at >> org.apache.mahout.utils.eval.ALSEvaluator.run(ALSEvaluator.java:71) >> at >> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >> at >> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) >> at >> >> org.apache.mahout.utils.eval.ALSEvaluator.main(ALSEvaluator.java:52) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >> Method) >> at >> >> >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> >> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:616) >> at >> >> >> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) >> at >> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) >> at >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:174) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >> Method) >> at >> >> >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> >> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:616) >> at org.apache.hadoop.util.RunJar.main(RunJar.java:156) >> >> THANKS! >> ---------- Forwarded message ---------- >> From: *Danny Bickson* <[email protected] >> <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>>> >> Date: Wed, Feb 23, 2011 at 3:05 PM >> Subject: Another mahout ALS question >> To: [email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>> >> >> >> Hi! >> I successfully run 10 iterations for your ALS code, with D=20, >> lambda=0.065 and I get a very impressive RMSE of 0.93 >> However, when I try to increase D, I get various out of memory >> errors, >> even with small netflix subsample of 3M values. >> >> One of the errors I am getting is in the evaluateALS step: >> 11/02/23 19:04:11 WARN driver.MahoutDriver: No evaluateALS.props >> found >> on classpath, will use command-line arguments only >> 11/02/23 19:04:12 INFO common.AbstractJob: Command line arguments: >> {--endPhase=2147483647, --itemFeatures=/tmp/als/out/M/, >> --probes=/user/ubuntu/myout/probeSet/, --startPhase=0, >> --tempDir=temp, >> --userFeatures=/tmp/als/out/U/} >> Exception in thread "main" java.lang.OutOfMemoryError: GC >> overhead limit >> exceeded >> at >> >> >> org.apache.mahout.math.map.OpenIntDoubleHashMap.rehash(OpenIntDoubleHashMap.java:433) >> at >> >> >> org.apache.mahout.math.map.OpenIntDoubleHashMap.put(OpenIntDoubleHashMap.java:387) >> at >> >> >> org.apache.mahout.math.RandomAccessSparseVector.setQuick(RandomAccessSparseVector.java:134) >> at >> >> org.apache.mahout.math.VectorWritable.readFields(VectorWritable.java:113) >> at >> >> >> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1751) >> at >> >> org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1879) >> at >> >> org.apache.mahout.utils.eval.ALSEvaluator.readMatrix(ALSEvaluator.java:113) >> at >> org.apache.mahout.utils.eval.ALSEvaluator.run(ALSEvaluator.java:71) >> at >> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >> at >> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) >> at >> >> org.apache.mahout.utils.eval.ALSEvaluator.main(ALSEvaluator.java:52) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >> Method) >> at >> >> >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> >> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:616) >> at >> >> >> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) >> at >> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) >> at >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:174) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >> Method) >> at >> >> >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> >> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:616) >> at org.apache.hadoop.util.RunJar.main(RunJar.java:156) >> >> >> There is no related exception in the Hadoop logs. >> >> I am running with java child opts of -Xmx2048M. >> >> Do you have any tips for me? Do you want me to post this into the >> Mahout-542 newsgroup? >> >> thanks, >> >> >> DB >> >> >> >> >
