GenericData.Record vs specific generated avro object

AnilKumar B Wed, 19 Feb 2014 21:53:44 -0800

Hi,

I am trying to process avro data using mapreduce. The data which I get in
avro format is generated by flume in below format.


{"type":"record","name":"Event","fields":[{"name":"headers","type":{"type":"map","values":"string"}},{"name":"body","type":"bytes"}]}


And data sample is as below:

{"headers": {"timestamp": "1392825607332", "parentnode":
"2014021909\/1392825638009"}, "body": {"bytes":
"{"row":"000372d8","data":{"x1":"v1","x2":"v2","x3":"v3"},"timestamp":1392380848474}"}}

But when I want to use this data in Mapreduce, I am trying to read this
data as AvroKey<GenericData.Record>, NullWritable in mapper. I am able to
get the whole message when I see key.datum(), I am unable access the fields
like "row",  "data", "timestamp".


So how can I resolve this? Do I need to generate specific avro java class
for below schema and should I use generated class for processing in
Mapreduce or Should I use GenericData.Record itself?


{

  "namespace": "com.test.avro",

  "type": "record",

  "name": "Event",

  "fields": [

    {

      "name": "row",

      "type": "string"

    },

    {

      "name": "data",

      "type": {

        "type": "map",

        "values": "string"

      }

    },

    {

      "name": "timestamp",

      "type": "string"

    }

  ]

}


Thanks & Regards,
B Anil Kumar.

GenericData.Record vs specific generated avro object

Reply via email to