Hi Adam, the BreimanExample is just meant as a test and an example, it doesn't even use MapReduce. Take a look at the following instead:
https://cwiki.apache.org/MAHOUT/partial-implementation.html On Fri, Dec 21, 2012 at 2:59 AM, Marty Kube < [email protected]> wrote: > Hi Adam, > > This is an interesting problem. Increasing the heap size is not > necessarily going to solve the issue. The error you have: > > > Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit > exceeded > > Is due to to much time CPU time spent in GC, as opposed to not enough heap > allocation. Decreasing your heap allocation may in fact help as GC is more > efficient on a smaller heap. You may have to consider GC tuning. > > > > On 12/20/2012 08:32 PM, Adam Baron wrote: > >> I'm trying to run the org.apache.mahout.classifier.**df.BreimanExample >> on a >> custom set of data that is ~4GB which has 500 Numerical Columns, 1 >> Categorical Column with two possible label values and ~4 million rows. I >> already ran the org.apache.mahout.classifier.**df.tools.Describe to >> generate >> the dataset *.info file. However, despite bumping >> my mapred.child.java.opts up to -Xmx12288m, I still get this memory error >> below: >> >> Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit >> exceeded >> at >> sun.misc.FloatingDecimal.**readJavaFormatString(** >> FloatingDecimal.java:1222) >> at java.lang.Double.parseDouble(**Double.java:510) >> at >> org.apache.mahout.classifier.**df.data.DataConverter.convert(** >> DataConverter.java:64) >> at >> org.apache.mahout.classifier.**df.data.DataLoader.loadData(** >> DataLoader.java:130) >> at >> org.apache.mahout.classifier.**df.BreimanExample.run(** >> BreimanExample.java:187) >> at org.apache.hadoop.util.**ToolRunner.run(ToolRunner.**java:65) >> at >> org.apache.mahout.classifier.**df.BreimanExample.main(** >> BreimanExample.java:125) >> at sun.reflect.**NativeMethodAccessorImpl.**invoke0(Native >> Method) >> at >> sun.reflect.**NativeMethodAccessorImpl.**invoke(** >> NativeMethodAccessorImpl.java:**39) >> at >> sun.reflect.**DelegatingMethodAccessorImpl.**invoke(** >> DelegatingMethodAccessorImpl.**java:25) >> at java.lang.reflect.Method.**invoke(Method.java:597) >> at org.apache.hadoop.util.RunJar.**main(RunJar.java:186) >> >> I'm running on a pretty significant Hadoop cluster which has no problem >> running other sizable Mahout jobs such as K-Means Clustering on 100s GB >> n-gram TF/IDF files, so I'm thinking this is more of a configuration/code >> issue than a hardware issue. The small glass.data example from the >> website >> (https://cwiki.apache.org/**MAHOUT/breiman-example.html<https://cwiki.apache.org/MAHOUT/breiman-example.html>) >> worked flawlessly. >> >> I realize that if I decide to pursue Random Forest classification further, >> I'll need to write my own code to classify through a DecisionForest on a >> go >> forward basis (after the training set) since the BreimanExample is an >> example, not a tool. However, for this initial foray I merely want to see >> what type of Test Error numbers my custom set of data would yield, >> preferably without writing any custom code. >> >> Any suggestions? >> >> Thanks, >> Adam >> >> >
