You might also want to take a look at https://github.com/cloudera/crunch/
Not sure what state its in but judging by file names it might support flume. J On Wed, Feb 8, 2012 at 10:04 AM, Scott Carey <[email protected]> wrote: > I have not tried or tested ChainMapper with Avro myself. It will probably > work if you configure the input schemas or output schemas appropriately. > Take a look at what AvroJog.setInputSchema is doing, if you are familiar > enough with hadoop's configuration you may be able to work it out. Others > likely know more than I do on this. > > Also, you may be interested in how things are done in this variation: > https://github.com/wibidata/odiago-avro > > > On 2/1/12 8:23 AM, "Andrew Kenworthy" <[email protected]> wrote: > > Hallo, > > Is it possible to chain Avro MR jobs using the ChainMapper? I'm looking > to chain two map tasks and a reducer, but haven't been able to find any > examples: > > Chain summary: > a) first map task: takes non-avro input and produces K/V output in the > form of AvroKey(Record), NullWritable > b) second map task: taking output of first task as its input [mapper extends > AvroMapper(Record, Pair(Record, NullWritable))] > c) reducer: AvroReducer > > In particular, how would I specify the input and output schemas - simply > calling AvroJob.setInputSchema/setOutputSchema on the individual chained > job conf objects? > > Thanks, > > Andrew > >
