Re: Decision Forest/Partial Implementation TestForest Error

Nick Jordan Sat, 08 Sep 2012 10:54:48 -0700

Actually, those are the command that work.  If I run the first command
with: -Dmapred.max.split.size=21675370 (not the split size being 10x
larger) that is when I get the failure running the TestForest job.




On Sat, Sep 8, 2012 at 1:53 PM, Nick Jordan <[email protected]> wrote:
> /usr/local/hadoop/hadoop-1.0.3/bin/hadoop jar
> /usr/local/mahout/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar
> org.apache.mahout.classifier.df.mapreduce.BuildForest
> -Dmapred.max.split.size=2167537 -oob -d
> /securityWaitTime/wt_top_airports_2007-2012_learn.data -ds
> /securityWaitTime/wt_top_airports_2007-2012.info -sl 5 -p -t 100
>
> /usr/local/hadoop/hadoop-1.0.3/bin/hadoop jar
> /usr/local/mahout/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar
> org.apache.mahout.classifier.df.mapreduce.TestForest -i
> /securityWaitTime/wt_top_airports_2007-2012_test.data -ds
> /securityWaitTime/wt_top_airports_2007-2012.info -m ob -a -mr -o
> predictions
>
> On Sat, Sep 8, 2012 at 5:05 AM, deneche abdelhakim <[email protected]> wrote:
>> Could you copy/paste the exact commands you used to run the training and
>> the testing ?
>>
>> On Fri, Sep 7, 2012 at 11:10 PM, Nick Jordan <[email protected]> wrote:
>>
>>> Any thoughts here?
>>>
>>> On Thu, Sep 6, 2012 at 7:00 AM, Nick Jordan <[email protected]> wrote:
>>> > Same problem with the sequential classifier.  My guess is that this
>>> > "corruption" is happening because of that particular setting as it is
>>> > the only thing that I'm changing, but I have no idea how to
>>> > investigate further.
>>> >
>>> > Nick
>>> >
>>> > On Thu, Sep 6, 2012 at 2:22 AM, Abdelhakim Deneche <[email protected]>
>>> wrote:
>>> >> Hi Nick,
>>> >>
>>> >> This is not a memory problem, the classifier tries to load the trained
>>> forest but it's getting some unexpected values. This problem never occured
>>> before! Could the forest files be corrupted ?
>>> >>
>>> >> Try training the forest once again, and this time use the sequential
>>> classifier (don't use the -mr parameter) and see if the problem still
>>> occurs.
>>> >>
>>> >>
>>> >> On 5 sept. 2012, at 23:00, Nick Jordan <[email protected]> wrote:
>>> >>
>>> >>> Hello All,
>>> >>>
>>> >>> I'm playing around with decision forests using the partial
>>> >>> implementation and my own data set.  I am getting an error with
>>> >>> TestForest, but only for certain forests that I'm building with
>>> >>> BuildForest.  Using the same descriptor and same build and test data
>>> >>> sets I get no error if I set mapred.max.split.size=1890528 which is
>>> >>> roughly 1/100th the size of the build data set.  I can build the
>>> >>> forest and test the remaining data and get the results with no
>>> >>> problem.  When I change the split size to 18905280, everything still
>>> >>> appears to work fine when building the forest, but when I then try to
>>> >>> test the remaining data I get the error below.
>>> >>>
>>> >>> I've dug around the code a little, but nothing stood out as to why the
>>> >>> array would go out of bounds at that specific value.  One solution is
>>> >>> to obviously not create partitions that large, but if it was a problem
>>> >>> with me running out of memory I would have expected an out of memory
>>> >>> error and not an index past the size the bounds of an array.  I'd
>>> >>> obviously prefer larger partitions and thus less of them and can move
>>> >>> running this job to something like EMR which should allow me to have
>>> >>> more memory, but I want to understand the nature of the error.
>>> >>>
>>> >>> For what it is worth I'm running this on hadoop-1.0.3 and
>>> mahout-0.8-SNAPSHOT
>>> >>>
>>> >>> Thanks.
>>> >>>
>>> >>> --
>>> >>>
>>> >>> 12/09/05 17:52:09 INFO mapred.JobClient: Task Id :
>>> >>> attempt_201209031756_0008_m_000000_0, Status : FAILED
>>> >>> java.lang.ArrayIndexOutOfBoundsException: 946827879
>>> >>>        at org.apache.mahout.classifier.df.node.Node.read(Node.java:58)
>>> >>>        at
>>> org.apache.mahout.classifier.df.DecisionForest.readFields(DecisionForest.java:197)
>>> >>>        at
>>> org.apache.mahout.classifier.df.DecisionForest.read(DecisionForest.java:203)
>>> >>>        at
>>> org.apache.mahout.classifier.df.DecisionForest.load(DecisionForest.java:225)
>>> >>>        at
>>> org.apache.mahout.classifier.df.mapreduce.Classifier$CMapper.setup(Classifier.java:212)
>>> >>>        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>> >>>        at
>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>>> >>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>>> >>>        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>>> >>>        at java.security.AccessController.doPrivileged(Native Method)
>>> >>>        at javax.security.auth.Subject.doAs(Subject.java:416)
>>> >>>        at
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>>> >>>        at org.apache.hadoop.mapred.Child.main(Child.java:249)
>>>

Re: Decision Forest/Partial Implementation TestForest Error

Reply via email to