My guess is you are not deserializing it properly (if at all)
Can you share the relevant code that's within your mapper?
On Feb 19, 2014 9:53 PM, "AnilKumar B" <[email protected]> wrote:
> Hi,
>
> I am trying to process avro data using mapreduce. The data which I get in
> avro format is generated by flume in below format.
>
>
> {"type":"record","name":"Event","fields":[{"name":"headers","type":{"type":"map","values":"string"}},{"name":"body","type":"bytes"}]}
>
>
> And data sample is as below:
>
> {"headers": {"timestamp": "1392825607332", "parentnode":
> "2014021909\/1392825638009"},
> "body": {"bytes":
> "{"row":"000372d8","data":{"x1":"v1","x2":"v2","x3":"v3"},"timestamp":1392380848474}"}}
>
> But when I want to use this data in Mapreduce, I am trying to read this
> data as AvroKey<GenericData.Record>, NullWritable in mapper. I am able to
> get the whole message when I see key.datum(), I am unable access the fields
> like "row", "data", "timestamp".
>
>
> So how can I resolve this? Do I need to generate specific avro java class
> for below schema and should I use generated class for processing in
> Mapreduce or Should I use GenericData.Record itself?
>
>
> {
>
> "namespace": "com.test.avro",
>
> "type": "record",
>
> "name": "Event",
>
> "fields": [
>
> {
>
> "name": "row",
>
> "type": "string"
>
> },
>
> {
>
> "name": "data",
>
> "type": {
>
> "type": "map",
>
> "values": "string"
>
> }
>
> },
>
> {
>
> "name": "timestamp",
>
> "type": "string"
>
> }
>
> ]
>
> }
>
>
> Thanks & Regards,
> B Anil Kumar.
>