/usr/local/hadoop/hadoop-1.0.3/bin/hadoop jar /usr/local/mahout/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar org.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=2167537 -oob -d /securityWaitTime/wt_top_airports_2007-2012_learn.data -ds /securityWaitTime/wt_top_airports_2007-2012.info -sl 5 -p -t 100
/usr/local/hadoop/hadoop-1.0.3/bin/hadoop jar /usr/local/mahout/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar org.apache.mahout.classifier.df.mapreduce.TestForest -i /securityWaitTime/wt_top_airports_2007-2012_test.data -ds /securityWaitTime/wt_top_airports_2007-2012.info -m ob -a -mr -o predictions On Sat, Sep 8, 2012 at 5:05 AM, deneche abdelhakim <[email protected]> wrote: > Could you copy/paste the exact commands you used to run the training and > the testing ? > > On Fri, Sep 7, 2012 at 11:10 PM, Nick Jordan <[email protected]> wrote: > >> Any thoughts here? >> >> On Thu, Sep 6, 2012 at 7:00 AM, Nick Jordan <[email protected]> wrote: >> > Same problem with the sequential classifier. My guess is that this >> > "corruption" is happening because of that particular setting as it is >> > the only thing that I'm changing, but I have no idea how to >> > investigate further. >> > >> > Nick >> > >> > On Thu, Sep 6, 2012 at 2:22 AM, Abdelhakim Deneche <[email protected]> >> wrote: >> >> Hi Nick, >> >> >> >> This is not a memory problem, the classifier tries to load the trained >> forest but it's getting some unexpected values. This problem never occured >> before! Could the forest files be corrupted ? >> >> >> >> Try training the forest once again, and this time use the sequential >> classifier (don't use the -mr parameter) and see if the problem still >> occurs. >> >> >> >> >> >> On 5 sept. 2012, at 23:00, Nick Jordan <[email protected]> wrote: >> >> >> >>> Hello All, >> >>> >> >>> I'm playing around with decision forests using the partial >> >>> implementation and my own data set. I am getting an error with >> >>> TestForest, but only for certain forests that I'm building with >> >>> BuildForest. Using the same descriptor and same build and test data >> >>> sets I get no error if I set mapred.max.split.size=1890528 which is >> >>> roughly 1/100th the size of the build data set. I can build the >> >>> forest and test the remaining data and get the results with no >> >>> problem. When I change the split size to 18905280, everything still >> >>> appears to work fine when building the forest, but when I then try to >> >>> test the remaining data I get the error below. >> >>> >> >>> I've dug around the code a little, but nothing stood out as to why the >> >>> array would go out of bounds at that specific value. One solution is >> >>> to obviously not create partitions that large, but if it was a problem >> >>> with me running out of memory I would have expected an out of memory >> >>> error and not an index past the size the bounds of an array. I'd >> >>> obviously prefer larger partitions and thus less of them and can move >> >>> running this job to something like EMR which should allow me to have >> >>> more memory, but I want to understand the nature of the error. >> >>> >> >>> For what it is worth I'm running this on hadoop-1.0.3 and >> mahout-0.8-SNAPSHOT >> >>> >> >>> Thanks. >> >>> >> >>> -- >> >>> >> >>> 12/09/05 17:52:09 INFO mapred.JobClient: Task Id : >> >>> attempt_201209031756_0008_m_000000_0, Status : FAILED >> >>> java.lang.ArrayIndexOutOfBoundsException: 946827879 >> >>> at org.apache.mahout.classifier.df.node.Node.read(Node.java:58) >> >>> at >> org.apache.mahout.classifier.df.DecisionForest.readFields(DecisionForest.java:197) >> >>> at >> org.apache.mahout.classifier.df.DecisionForest.read(DecisionForest.java:203) >> >>> at >> org.apache.mahout.classifier.df.DecisionForest.load(DecisionForest.java:225) >> >>> at >> org.apache.mahout.classifier.df.mapreduce.Classifier$CMapper.setup(Classifier.java:212) >> >>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) >> >>> at >> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) >> >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) >> >>> at org.apache.hadoop.mapred.Child$4.run(Child.java:255) >> >>> at java.security.AccessController.doPrivileged(Native Method) >> >>> at javax.security.auth.Subject.doAs(Subject.java:416) >> >>> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) >> >>> at org.apache.hadoop.mapred.Child.main(Child.java:249) >>
