Partial Implementation of Random Forests

Sara Del Río García Thu, 28 Feb 2013 13:07:55 -0800

Hello all:

I'm testing the Random Forest Partial version in the version of Hadoop:Hadoop 2.0.0-cdh4.1.1

I'm trying to modify the algorithm, all I do is add more information tothe leaves of the tree. Currently containing the label and I want to addanother label more:


@Override
public void readFields(DataInput in) throws IOException{

label = in.readDouble();
leafWeight = in.readDouble();

}

@Override
protected void writeNode(DataOutput out) throws IOException{

out.writeDouble(label);
out.writeDouble(leafWeight);

}

And I get the following error:

13/02/27 06:53:27 INFO mapreduce.BuildForest: Partial Mapred implementation
13/02/27 06:53:27 INFO mapreduce.BuildForest: Building the forest...
13/02/27 06:53:27 INFO mapreduce.BuildForest: Weights Estimation: IR

13/02/27 06:53:37 WARN mapred.JobClient: Use GenericOptionsParser forparsing the arguments. Applications should implement Tool for the same.13/02/27 06:53:39 INFO input.FileInputFormat: Total input paths toprocess : 113/02/27 06:53:39 WARN util.NativeCodeLoader: Unable to loadnative-hadoop library for your platform... using builtin-java classeswhere applicable

13/02/27 06:53:39 WARN snappy.LoadSnappy: Snappy native library not loaded
13/02/27 06:53:39 INFO mapred.JobClient: Running job: job_201302270205_0013
13/02/27 06:53:40 INFO mapred.JobClient: map 0% reduce 0%
13/02/27 06:54:18 INFO mapred.JobClient: map 20% reduce 0%
13/02/27 06:54:42 INFO mapred.JobClient: map 40% reduce 0%
13/02/27 06:55:03 INFO mapred.JobClient: map 60% reduce 0%
13/02/27 06:55:26 INFO mapred.JobClient: map 70% reduce 0%
13/02/27 06:55:27 INFO mapred.JobClient: map 80% reduce 0%
13/02/27 06:55:49 INFO mapred.JobClient: map 100% reduce 0%
13/02/27 06:56:04 INFO mapred.JobClient: Job complete: job_201302270205_0013
13/02/27 06:56:04 INFO mapred.JobClient: Counters: 24
13/02/27 06:56:04 INFO mapred.JobClient: File System Counters
13/02/27 06:56:04 INFO mapred.JobClient: FILE: Number of bytes read=0

13/02/27 06:56:04 INFO mapred.JobClient: FILE: Number of byteswritten=1828230

13/02/27 06:56:04 INFO mapred.JobClient: FILE: Number of read operations=0

13/02/27 06:56:04 INFO mapred.JobClient: FILE: Number of large readoperations=0

13/02/27 06:56:04 INFO mapred.JobClient: FILE: Number of write operations=0
13/02/27 06:56:04 INFO mapred.JobClient: HDFS: Number of bytes read=1381649
13/02/27 06:56:04 INFO mapred.JobClient: HDFS: Number of bytes written=1680
13/02/27 06:56:04 INFO mapred.JobClient: HDFS: Number of read operations=30

13/02/27 06:56:04 INFO mapred.JobClient: HDFS: Number of large readoperations=0

13/02/27 06:56:04 INFO mapred.JobClient: HDFS: Number of write operations=10
13/02/27 06:56:04 INFO mapred.JobClient: Job Counters
13/02/27 06:56:04 INFO mapred.JobClient: Launched map tasks=10
13/02/27 06:56:04 INFO mapred.JobClient: Data-local map tasks=10

13/02/27 06:56:04 INFO mapred.JobClient: Total time spent by all maps inoccupied slots (ms)=25470713/02/27 06:56:04 INFO mapred.JobClient: Total time spent by all reducesin occupied slots (ms)=013/02/27 06:56:04 INFO mapred.JobClient: Total time spent by all mapswaiting after reserving slots (ms)=013/02/27 06:56:04 INFO mapred.JobClient: Total time spent by all reduceswaiting after reserving slots (ms)=0

13/02/27 06:56:04 INFO mapred.JobClient: Map-Reduce Framework
13/02/27 06:56:04 INFO mapred.JobClient: Map input records=20
13/02/27 06:56:04 INFO mapred.JobClient: Map output records=10
13/02/27 06:56:04 INFO mapred.JobClient: Input split bytes=1540
13/02/27 06:56:04 INFO mapred.JobClient: Spilled Records=0
13/02/27 06:56:04 INFO mapred.JobClient: CPU time spent (ms)=12070

13/02/27 06:56:04 INFO mapred.JobClient: Physical memory (bytes)snapshot=94957977613/02/27 06:56:04 INFO mapred.JobClient: Virtual memory (bytes)snapshot=841234022413/02/27 06:56:04 INFO mapred.JobClient: Total committed heap usage(bytes)=478412800Exception in thread "main" java.lang.IllegalStateException:java.io.EOFExceptionatorg.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.computeNext(SequenceFileIterator.java:104)atorg.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.computeNext(SequenceFileIterator.java:38)atcom.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)atcom.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)atorg.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:129)atorg.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:96)

at org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:312)

atorg.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:246)atorg.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:200)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

atorg.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:270)

Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at java.io.DataInputStream.readLong(DataInputStream.java:399)
at java.io.DataInputStream.readDouble(DataInputStream.java:451)
at org.apache.mahout.classifier.df.node.Leaf.readFields(Leaf.java:136)
at org.apache.mahout.classifier.df.node.Node.read(Node.java:85)

atorg.apache.mahout.classifier.df.mapreduce.MapredOutput.readFields(MapredOutput.java:64)atorg.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2114)

at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2242)

atorg.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.computeNext(SequenceFileIterator.java:95)

... 10 more

What's the problem?

You can try to write more information in the leaves of the tree?

Thank you very much.


Best regards,

Sara

Partial Implementation of Random Forests

Reply via email to