Sean, this is the output of hadoop fs -ls for the output dir:

patricioe:brisk patricioe$ bin/brisk hadoop fs -ls output
Found 3 items
drwxrwxrwx   - patricioe patricioe          0 2011-06-22 18:08
/user/patricioe/output/clusters-0
drwxrwxrwx   - patricioe patricioe          0 2011-06-22 18:09
/user/patricioe/output/clusteredPoints
drwxrwxrwx   - patricioe patricioe          0 2011-06-22 18:08
/user/patricioe/output/data
patricioe:brisk patricioe$


I don't see any _SUCCESS file here.

2011/6/23 Patricio Echagüe <[email protected]>

> Interesting.
>
> Just for the records, we use mahout 0.6-SNAPSHOT.
>
> I figure I should build from trunk after you push the changes?
>
> Mind dropping a line when that change gets pushed?
>
> Thanks for your help.
>
>
> On Thu, Jun 23, 2011 at 11:43 AM, Sean Owen <[email protected]> wrote:
>
>> I have a pretty good guess. Even as we speak, I am testing updating to
>> Hadoop 0.20.203.0 in Mahout. The only difference that causes a problem is
>> that the newer Hadoop adds a "_SUCCESS" file to output dirs. This confuses
>> a
>> few bits of Mahout code that don't properly ignore that. I've got a change
>> to fix that. If all goes well it will go in tonight.
>>
>> I give it reasonable odds that this is your issue.
>>
>> 2011/6/23 Patricio Echagüe <[email protected]>
>>
>> > Hi all, I'm observing this exception using/integrating Brisk with
>> Mahout.
>> >
>> >
>> > *Brisk* currently works perfectly with all hadoop stack (Hadoop, hive,
>> > pig).
>> >
>> >
>> > I read a similar thread:
>> > http://comments.gmane.org/gmane.comp.apache.mahout.user/6757 which
>> makes
>> > me
>> > think it can be hadoop related.
>> >
>> >
>> > We are using *0.20.203* (Yahoo distribution)
>> >
>> >
>> > Does this exception look familiar to you all ?
>> >
>> >
>> > It happens after running the 3 jobs for the Example: *Clustering of
>> > Synthetic control data*.
>> >
>> >
>> > INFO [IPC Server handler 0 on 56077] 2011-06-22 17:24:40,599
>> > TaskTracker.java (line 2428) attempt_201106221720_0003_m_000000_0 0.0%
>> >
>> >  INFO [IPC Server handler 5 on 8012] 2011-06-22 17:24:41,806
>> > TaskInProgress.java (line 551) Error from
>> > attempt_201106221720_0003_m_000000_0:
>> java.lang.IndexOutOfBoundsException
>> >
>> > at java.io.DataInputStream.readFully(DataInputStream.java:175)
>> >
>> > at
>> >
>> >
>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:63)
>> >
>> > at
>> org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:101)
>> >
>> > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:1930)
>> >
>> > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2062)
>> >
>> > at
>> >
>> >
>> org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.nextKeyValue(SequenceFileRecordReader.java:68)
>> >
>> > at
>> >
>> >
>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:531)
>> >
>> > at
>> org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>> >
>> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>> >
>> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>> >
>> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>> >
>> > at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>> >
>> > at java.security.AccessController.doPrivileged(Native Method)
>> >
>> > at javax.security.auth.Subject.doAs(Subject.java:396)
>> >
>> > at
>> >
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>> >
>> > at org.apache.hadoop.mapred.Child.main(Child.java:253)
>> >
>> >
>> > Thanks
>> >
>>
>
>

Reply via email to