Thanks Adrian, Sorry for the late response.
I think following your second approach is better, but as of now I did it in first approach. Thanks & Regards, B Anil Kumar. On Thu, Feb 20, 2014 at 10:22 PM, Adrian Hains <[email protected]> wrote: > If the avro data from flume has the schema: > {"type":"record","name":"Event","fields":[{"name":" > headers","type":{"type":"map","values":"string"}},{"name":" > body","type":"bytes"}]} > then a record can only contain a headers map of strings, and a body field > with bytes. I don't see how it could contain structured data in the body > like you described: > {"headers": {"timestamp": "1392825607332", "parentnode": > "2014021909\/1392825638009"}, > "body": {"bytes": "{"row":"000372d8","data":{" > x1":"v1","x2":"v2","x3":"v3"},"timestamp":1392380848474}"}} > > Typically your flume event contains your data payload in that body field > as a blob. So if you have a flume hdfs sink that is logging the raw flume > event with a config of serializer=avro_event then you would need to unpack > the data in the body field manually in your mapreduce. If you instead want > the hdfs sink to write your payload in your custom avro format then I think > you would need to configure the sink with the appropriate serializer (e.g. > https://github.com/cloudera/cdk/blob/master/cdk-flume-avro-event-serializer/src/main/java/org/apache/flume/serialization/AvroEventSerializer.java > ) > > Apologies if I'm misunderstanding your problem and what you're trying to > accomplish. > -a > > > > On Wed, Feb 19, 2014 at 9:52 PM, AnilKumar B <[email protected]>wrote: > >> Hi, >> >> I am trying to process avro data using mapreduce. The data which I get in >> avro format is generated by flume in below format. >> >> >> {"type":"record","name":"Event","fields":[{"name":"headers","type":{"type":"map","values":"string"}},{"name":"body","type":"bytes"}]} >> >> >> And data sample is as below: >> >> {"headers": {"timestamp": "1392825607332", "parentnode": >> "2014021909\/1392825638009"}, >> "body": {"bytes": >> "{"row":"000372d8","data":{"x1":"v1","x2":"v2","x3":"v3"},"timestamp":1392380848474}"}} >> >> But when I want to use this data in Mapreduce, I am trying to read this >> data as AvroKey<GenericData.Record>, NullWritable in mapper. I am able to >> get the whole message when I see key.datum(), I am unable access the fields >> like "row", "data", "timestamp". >> >> >> So how can I resolve this? Do I need to generate specific avro java class >> for below schema and should I use generated class for processing in >> Mapreduce or Should I use GenericData.Record itself? >> >> >> { >> >> "namespace": "com.test.avro", >> >> "type": "record", >> >> "name": "Event", >> >> "fields": [ >> >> { >> >> "name": "row", >> >> "type": "string" >> >> }, >> >> { >> >> "name": "data", >> >> "type": { >> >> "type": "map", >> >> "values": "string" >> >> } >> >> }, >> >> { >> >> "name": "timestamp", >> >> "type": "string" >> >> } >> >> ] >> >> } >> >> >> Thanks & Regards, >> B Anil Kumar. >> > >
