Lorenz Knies <lorenz.knies <at> metrigo.de> writes: > > i think the oob option is gone > > On Jan 21, 2013, at 1:52 PM, Stuti Awasthi <stutiawasthi <at> hcl.com> wrote: > > > Hi, > > > > I have downloaded Mahout and tried to execute Partial Implementation. When I try to run I am getting the > parsing error: > > > > $HADOOP_HOME/hadoop jar $MAHOUT_HOME/examples/target/mahout-examples- 0.7-job.jar > org.apache.mahout.classifier.df.mapreduce.BuildForest -oob -d /testdata/KDDTrain+.arff -ds > /testdata/KDDTrain+.info -sl 5 -p -t 100 -o /testdata/nsl-forest > > > > 13/01/21 18:16:24 ERROR mapreduce.BuildForest: Exception > > org.apache.commons.cli2.OptionException: Unexpected /testdata/nsl-forest while processing Options > > at org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99) > > at org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:1 39) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > at org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java: 253) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57 ) > > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl .java:43) > > at java.lang.reflect.Method.invoke(Method.java:616) > > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > Usage: > > [--data <path> --dataset <dataset> --selection <m> --no-complete -- minsplit > > <minsplit> --minprop <minprop> --seed <seed> --partial --nbtrees <nbtrees> > > --output <path> --help] > > Options > > --data (-d) path Data path > > --dataset (-ds) dataset Dataset path > > --selection (-sl) m Optional, Number of variables to select randomly > > at each tree-node. > > For classification problem, the default is > > square root of the number of explanatory > > variables. > > For regression problem, the default is 1/3 of > > the number of explanatory variables. > > --no-complete (-nc) Optional, The tree is not complemented > > --minsplit (-ms) minsplit Optional, The tree-node is not divided, if the > > branching data size is smaller than this value. > > The default is 2. > > --minprop (-mp) minprop Optional, The tree-node is not divided, if the > > proportion of the variance of branching data is > > smaller than this value. > > In the case of a regression problem, this value > > is used. The default is 1/1000(0.001). > > --seed (-sd) seed Optional, seed value used to initialise the > > Random number generator > > --partial (-p) Optional, use the Partial Data implementation > > --nbtrees (-t) nbtrees Number of trees to grow > > --output (-o) path Output path, will contain the Decision Forest > > --help (-h) Print out help > > > > > > If I try to run with Mahout-0.5, its working fine and generating /testdata/nsl-forest/forest.seq in hdfs. > > Is this a bug in Mahout-0.7 or am I doing something wrong. > > > > Please Suggest > > > > Thanks > > Stuti Awasthi > > > > > > > > > > > > ::DISCLAIMER:: > > ------------------------------------------------------------------------ ---------------------------------------------------------------------------- > > > > The contents of this e-mail and any attachment(s) are confidential and intended for the named > recipient(s) only. > > E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted, corrupted, > > lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e mail and its contents > > (with or without referred errors) shall therefore not attach any liability on the originator or HCL or its affiliates. > > Views or opinions, if any, presented in this email are solely those of the author and may not necessarily > reflect the > > views or opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, > disclosure, modification, > > distribution and / or publication of this message without the prior written consent of authorized > representative of > > HCL is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. > > Before opening any email and/or attachments, please check them for viruses and other defects. > > > > ------------------------------------------------------------------------ ---------------------------------------------------------------------------- > >
So how can I calculate the oob error rate in mahout 0.9? Does the code support it? If yes what arguments do I need to give/changes do I need to make? Also which version of mahout supports -oob option? Follwing is the trace of my execution of the example: hduser@ubuntu:/home/prasanna/Downloads/mahout-distribution-0.9$ hadoop jar mahout-examples-0.9-job.jar org.apache.mahout.classifier.df.mapreduce.BuildForest - Dmapred.max.split.size=1874231 -d testdata/KDDTrain+.arff -ds testdata/KDDTrain+.info -sl 5 -p -t 10 -o nsl-forest Warning: $HADOOP_HOME is deprecated. 14/07/10 02:17:43 INFO mapreduce.BuildForest: Partial Mapred implementation 14/07/10 02:17:43 INFO mapreduce.BuildForest: Building the forest... 14/07/10 02:17:44 INFO input.FileInputFormat: Total input paths to process : 1 14/07/10 02:17:44 INFO util.NativeCodeLoader: Loaded the native-hadoop library 14/07/10 02:17:44 WARN snappy.LoadSnappy: Snappy native library not loaded 14/07/10 02:17:44 INFO mapred.JobClient: Running job: job_201407092351_0003 14/07/10 02:17:45 INFO mapred.JobClient: map 0% reduce 0% 14/07/10 02:18:07 INFO mapred.JobClient: map 20% reduce 0% 14/07/10 02:18:19 INFO mapred.JobClient: map 30% reduce 0% 14/07/10 02:18:22 INFO mapred.JobClient: map 40% reduce 0% 14/07/10 02:18:31 INFO mapred.JobClient: map 60% reduce 0% 14/07/10 02:18:43 INFO mapred.JobClient: map 80% reduce 0% 14/07/10 02:18:55 INFO mapred.JobClient: map 90% reduce 0% 14/07/10 02:18:58 INFO mapred.JobClient: map 100% reduce 0% 14/07/10 02:19:03 INFO mapred.JobClient: Job complete: job_201407092351_0003 14/07/10 02:19:03 INFO mapred.JobClient: Counters: 20 14/07/10 02:19:03 INFO mapred.JobClient: Job Counters 14/07/10 02:19:03 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=129677 14/07/10 02:19:03 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 14/07/10 02:19:03 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 14/07/10 02:19:03 INFO mapred.JobClient: Launched map tasks=10 14/07/10 02:19:03 INFO mapred.JobClient: Data-local map tasks=10 14/07/10 02:19:03 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 14/07/10 02:19:03 INFO mapred.JobClient: File Output Format Counters 14/07/10 02:19:03 INFO mapred.JobClient: Bytes Written=75424 14/07/10 02:19:03 INFO mapred.JobClient: FileSystemCounters 14/07/10 02:19:03 INFO mapred.JobClient: FILE_BYTES_READ=28270 14/07/10 02:19:03 INFO mapred.JobClient: HDFS_BYTES_READ=18759170 14/07/10 02:19:03 INFO mapred.JobClient: FILE_BYTES_WRITTEN=227620 14/07/10 02:19:03 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=75424 14/07/10 02:19:03 INFO mapred.JobClient: File Input Format Counters 14/07/10 02:19:03 INFO mapred.JobClient: Bytes Read=18757940 14/07/10 02:19:03 INFO mapred.JobClient: Map-Reduce Framework 14/07/10 02:19:03 INFO mapred.JobClient: Map input records=125973 14/07/10 02:19:03 INFO mapred.JobClient: Physical memory (bytes) snapshot=775561216 14/07/10 02:19:03 INFO mapred.JobClient: Spilled Records=0 14/07/10 02:19:03 INFO mapred.JobClient: CPU time spent (ms)=20080 14/07/10 02:19:03 INFO mapred.JobClient: Total committed heap usage (bytes)=317194240 14/07/10 02:19:03 INFO mapred.JobClient: Virtual memory (bytes) snapshot=10654883840 14/07/10 02:19:03 INFO mapred.JobClient: Map output records=10 14/07/10 02:19:03 INFO mapred.JobClient: SPLIT_RAW_BYTES=1230 14/07/10 02:19:04 INFO common.HadoopUtil: Deleting hdfs://localhost:54310/user/hduser/nsl-fores 14/07/10 02:19:04 INFO mapreduce.BuildForest: Build Time: 0h 1m 20s 892 14/07/10 02:19:04 INFO mapreduce.BuildForest: Forest num Nodes: 4085 14/07/10 02:19:04 INFO mapreduce.BuildForest: Forest mean num Nodes: 408 14/07/10 02:19:04 INFO mapreduce.BuildForest: Forest mean max Depth: 12 14/07/10 02:19:04 INFO mapreduce.BuildForest: Storing the forest in: nsl- fores/forest.seq It gives the above statistics but not the oob error estimate as mentioned in the example here https://mahout.apache.org/users/classification/partial- implementation.html How can I get the oob?
