I have a pretty good guess. Even as we speak, I am testing updating to Hadoop 0.20.203.0 in Mahout. The only difference that causes a problem is that the newer Hadoop adds a "_SUCCESS" file to output dirs. This confuses a few bits of Mahout code that don't properly ignore that. I've got a change to fix that. If all goes well it will go in tonight.
I give it reasonable odds that this is your issue. 2011/6/23 Patricio Echagüe <[email protected]> > Hi all, I'm observing this exception using/integrating Brisk with Mahout. > > > *Brisk* currently works perfectly with all hadoop stack (Hadoop, hive, > pig). > > > I read a similar thread: > http://comments.gmane.org/gmane.comp.apache.mahout.user/6757 which makes > me > think it can be hadoop related. > > > We are using *0.20.203* (Yahoo distribution) > > > Does this exception look familiar to you all ? > > > It happens after running the 3 jobs for the Example: *Clustering of > Synthetic control data*. > > > INFO [IPC Server handler 0 on 56077] 2011-06-22 17:24:40,599 > TaskTracker.java (line 2428) attempt_201106221720_0003_m_000000_0 0.0% > > INFO [IPC Server handler 5 on 8012] 2011-06-22 17:24:41,806 > TaskInProgress.java (line 551) Error from > attempt_201106221720_0003_m_000000_0: java.lang.IndexOutOfBoundsException > > at java.io.DataInputStream.readFully(DataInputStream.java:175) > > at > > org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63) > > at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101) > > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1930) > > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2062) > > at > > org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.nextKeyValue(SequenceFileRecordReader.java:68) > > at > > org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:531) > > at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) > > at org.apache.hadoop.mapred.Child$4.run(Child.java:259) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:396) > > at > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > > at org.apache.hadoop.mapred.Child.main(Child.java:253) > > > Thanks >
