Hello all:
I'm testing the Random Forest Partial version in the version of Hadoop:
Hadoop 2.0.0-cdh4.1.1
I'm trying to modify the algorithm, all I do is add more information to
the leaves of the tree. Currently containing the label and I want to add
another label more:
@Override
public void readFields(DataInput in) throws IOException{
label = in.readDouble();
leafWeight = in.readDouble();
}
@Override
protected void writeNode(DataOutput out) throws IOException{
out.writeDouble(label);
out.writeDouble(leafWeight);
}
And I get the following error:
13/02/27 06:53:27 INFO mapreduce.BuildForest: Partial Mapred implementation
13/02/27 06:53:27 INFO mapreduce.BuildForest: Building the forest...
13/02/27 06:53:27 INFO mapreduce.BuildForest: Weights Estimation: IR
13/02/27 06:53:37 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
13/02/27 06:53:39 INFO input.FileInputFormat: Total input paths to
process : 1
13/02/27 06:53:39 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable
13/02/27 06:53:39 WARN snappy.LoadSnappy: Snappy native library not loaded
13/02/27 06:53:39 INFO mapred.JobClient: Running job: job_201302270205_0013
13/02/27 06:53:40 INFO mapred.JobClient: map 0% reduce 0%
13/02/27 06:54:18 INFO mapred.JobClient: map 20% reduce 0%
13/02/27 06:54:42 INFO mapred.JobClient: map 40% reduce 0%
13/02/27 06:55:03 INFO mapred.JobClient: map 60% reduce 0%
13/02/27 06:55:26 INFO mapred.JobClient: map 70% reduce 0%
13/02/27 06:55:27 INFO mapred.JobClient: map 80% reduce 0%
13/02/27 06:55:49 INFO mapred.JobClient: map 100% reduce 0%
13/02/27 06:56:04 INFO mapred.JobClient: Job complete: job_201302270205_0013
13/02/27 06:56:04 INFO mapred.JobClient: Counters: 24
13/02/27 06:56:04 INFO mapred.JobClient: File System Counters
13/02/27 06:56:04 INFO mapred.JobClient: FILE: Number of bytes read=0
13/02/27 06:56:04 INFO mapred.JobClient: FILE: Number of bytes
written=1828230
13/02/27 06:56:04 INFO mapred.JobClient: FILE: Number of read operations=0
13/02/27 06:56:04 INFO mapred.JobClient: FILE: Number of large read
operations=0
13/02/27 06:56:04 INFO mapred.JobClient: FILE: Number of write operations=0
13/02/27 06:56:04 INFO mapred.JobClient: HDFS: Number of bytes read=1381649
13/02/27 06:56:04 INFO mapred.JobClient: HDFS: Number of bytes written=1680
13/02/27 06:56:04 INFO mapred.JobClient: HDFS: Number of read operations=30
13/02/27 06:56:04 INFO mapred.JobClient: HDFS: Number of large read
operations=0
13/02/27 06:56:04 INFO mapred.JobClient: HDFS: Number of write operations=10
13/02/27 06:56:04 INFO mapred.JobClient: Job Counters
13/02/27 06:56:04 INFO mapred.JobClient: Launched map tasks=10
13/02/27 06:56:04 INFO mapred.JobClient: Data-local map tasks=10
13/02/27 06:56:04 INFO mapred.JobClient: Total time spent by all maps in
occupied slots (ms)=254707
13/02/27 06:56:04 INFO mapred.JobClient: Total time spent by all reduces
in occupied slots (ms)=0
13/02/27 06:56:04 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
13/02/27 06:56:04 INFO mapred.JobClient: Total time spent by all reduces
waiting after reserving slots (ms)=0
13/02/27 06:56:04 INFO mapred.JobClient: Map-Reduce Framework
13/02/27 06:56:04 INFO mapred.JobClient: Map input records=20
13/02/27 06:56:04 INFO mapred.JobClient: Map output records=10
13/02/27 06:56:04 INFO mapred.JobClient: Input split bytes=1540
13/02/27 06:56:04 INFO mapred.JobClient: Spilled Records=0
13/02/27 06:56:04 INFO mapred.JobClient: CPU time spent (ms)=12070
13/02/27 06:56:04 INFO mapred.JobClient: Physical memory (bytes)
snapshot=949579776
13/02/27 06:56:04 INFO mapred.JobClient: Virtual memory (bytes)
snapshot=8412340224
13/02/27 06:56:04 INFO mapred.JobClient: Total committed heap usage
(bytes)=478412800
Exception in thread "main" java.lang.IllegalStateException:
java.io.EOFException
at
org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.computeNext(SequenceFileIterator.java:104)
at
org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.computeNext(SequenceFileIterator.java:38)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at
org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:129)
at
org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:96)
at org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:312)
at
org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:246)
at
org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:200)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at
org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:270)
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at java.io.DataInputStream.readLong(DataInputStream.java:399)
at java.io.DataInputStream.readDouble(DataInputStream.java:451)
at org.apache.mahout.classifier.df.node.Leaf.readFields(Leaf.java:136)
at org.apache.mahout.classifier.df.node.Node.read(Node.java:85)
at
org.apache.mahout.classifier.df.mapreduce.MapredOutput.readFields(MapredOutput.java:64)
at
org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2114)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2242)
at
org.apache.mahout.common.iterator.sequencefile.SequenceFileIterator.computeNext(SequenceFileIterator.java:95)
... 10 more
What's the problem?
You can try to write more information in the leaves of the tree?
Thank you very much.
Best regards,
Sara