Sean, this is the output of hadoop fs -ls for the output dir: patricioe:brisk patricioe$ bin/brisk hadoop fs -ls output Found 3 items drwxrwxrwx - patricioe patricioe 0 2011-06-22 18:08 /user/patricioe/output/clusters-0 drwxrwxrwx - patricioe patricioe 0 2011-06-22 18:09 /user/patricioe/output/clusteredPoints drwxrwxrwx - patricioe patricioe 0 2011-06-22 18:08 /user/patricioe/output/data patricioe:brisk patricioe$
I don't see any _SUCCESS file here. 2011/6/23 Patricio Echagüe <[email protected]> > Interesting. > > Just for the records, we use mahout 0.6-SNAPSHOT. > > I figure I should build from trunk after you push the changes? > > Mind dropping a line when that change gets pushed? > > Thanks for your help. > > > On Thu, Jun 23, 2011 at 11:43 AM, Sean Owen <[email protected]> wrote: > >> I have a pretty good guess. Even as we speak, I am testing updating to >> Hadoop 0.20.203.0 in Mahout. The only difference that causes a problem is >> that the newer Hadoop adds a "_SUCCESS" file to output dirs. This confuses >> a >> few bits of Mahout code that don't properly ignore that. I've got a change >> to fix that. If all goes well it will go in tonight. >> >> I give it reasonable odds that this is your issue. >> >> 2011/6/23 Patricio Echagüe <[email protected]> >> >> > Hi all, I'm observing this exception using/integrating Brisk with >> Mahout. >> > >> > >> > *Brisk* currently works perfectly with all hadoop stack (Hadoop, hive, >> > pig). >> > >> > >> > I read a similar thread: >> > http://comments.gmane.org/gmane.comp.apache.mahout.user/6757 which >> makes >> > me >> > think it can be hadoop related. >> > >> > >> > We are using *0.20.203* (Yahoo distribution) >> > >> > >> > Does this exception look familiar to you all ? >> > >> > >> > It happens after running the 3 jobs for the Example: *Clustering of >> > Synthetic control data*. >> > >> > >> > INFO [IPC Server handler 0 on 56077] 2011-06-22 17:24:40,599 >> > TaskTracker.java (line 2428) attempt_201106221720_0003_m_000000_0 0.0% >> > >> > INFO [IPC Server handler 5 on 8012] 2011-06-22 17:24:41,806 >> > TaskInProgress.java (line 551) Error from >> > attempt_201106221720_0003_m_000000_0: >> java.lang.IndexOutOfBoundsException >> > >> > at java.io.DataInputStream.readFully(DataInputStream.java:175) >> > >> > at >> > >> > >> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63) >> > >> > at >> org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101) >> > >> > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1930) >> > >> > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2062) >> > >> > at >> > >> > >> org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.nextKeyValue(SequenceFileRecordReader.java:68) >> > >> > at >> > >> > >> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:531) >> > >> > at >> org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) >> > >> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) >> > >> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) >> > >> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) >> > >> > at org.apache.hadoop.mapred.Child$4.run(Child.java:259) >> > >> > at java.security.AccessController.doPrivileged(Native Method) >> > >> > at javax.security.auth.Subject.doAs(Subject.java:396) >> > >> > at >> > >> > >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) >> > >> > at org.apache.hadoop.mapred.Child.main(Child.java:253) >> > >> > >> > Thanks >> > >> > >
