Hi Adam,

the BreimanExample is just meant as a test and an example, it doesn't even
use MapReduce. Take a look at the following instead:

https://cwiki.apache.org/MAHOUT/partial-implementation.html




On Fri, Dec 21, 2012 at 2:59 AM, Marty Kube <
[email protected]> wrote:

> Hi Adam,
>
> This is an interesting problem.  Increasing the heap size is not
> necessarily going to solve the issue.  The error you have:
>
>
> Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit
> exceeded
>
> Is due to to much time CPU time spent in GC, as opposed to not enough heap
> allocation.  Decreasing your heap allocation may in fact help as GC is more
> efficient on a smaller heap.  You may have to consider GC tuning.
>
>
>
> On 12/20/2012 08:32 PM, Adam Baron wrote:
>
>> I'm trying to run the org.apache.mahout.classifier.**df.BreimanExample
>> on a
>> custom set of data that is ~4GB which has 500 Numerical Columns, 1
>> Categorical Column with two possible label values and ~4 million rows.  I
>> already ran the org.apache.mahout.classifier.**df.tools.Describe to
>> generate
>> the dataset *.info file.  However, despite bumping
>> my mapred.child.java.opts up to -Xmx12288m, I still get this memory error
>> below:
>>
>> Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit
>> exceeded
>>          at
>> sun.misc.FloatingDecimal.**readJavaFormatString(**
>> FloatingDecimal.java:1222)
>>          at java.lang.Double.parseDouble(**Double.java:510)
>>          at
>> org.apache.mahout.classifier.**df.data.DataConverter.convert(**
>> DataConverter.java:64)
>>          at
>> org.apache.mahout.classifier.**df.data.DataLoader.loadData(**
>> DataLoader.java:130)
>>          at
>> org.apache.mahout.classifier.**df.BreimanExample.run(**
>> BreimanExample.java:187)
>>          at org.apache.hadoop.util.**ToolRunner.run(ToolRunner.**java:65)
>>          at
>> org.apache.mahout.classifier.**df.BreimanExample.main(**
>> BreimanExample.java:125)
>>          at sun.reflect.**NativeMethodAccessorImpl.**invoke0(Native
>> Method)
>>          at
>> sun.reflect.**NativeMethodAccessorImpl.**invoke(**
>> NativeMethodAccessorImpl.java:**39)
>>          at
>> sun.reflect.**DelegatingMethodAccessorImpl.**invoke(**
>> DelegatingMethodAccessorImpl.**java:25)
>>          at java.lang.reflect.Method.**invoke(Method.java:597)
>>          at org.apache.hadoop.util.RunJar.**main(RunJar.java:186)
>>
>> I'm running on a pretty significant Hadoop cluster which has no problem
>> running other sizable Mahout jobs such as K-Means Clustering on 100s GB
>> n-gram TF/IDF files, so I'm thinking this is more of a configuration/code
>> issue than a hardware issue.  The small glass.data example from the
>> website
>> (https://cwiki.apache.org/**MAHOUT/breiman-example.html<https://cwiki.apache.org/MAHOUT/breiman-example.html>)
>> worked flawlessly.
>>
>> I realize that if I decide to pursue Random Forest classification further,
>> I'll need to write my own code to classify through a DecisionForest on a
>> go
>> forward basis (after the training set) since the BreimanExample is an
>> example, not a tool.  However, for this initial foray I merely want to see
>> what type of Test Error numbers my custom set of data would yield,
>> preferably without writing any custom code.
>>
>> Any suggestions?
>>
>> Thanks,
>>            Adam
>>
>>
>

Reply via email to