Hi,
I am trying to process avro data using mapreduce. The data which I get in
avro format is generated by flume in below format.
{"type":"record","name":"Event","fields":[{"name":"headers","type":{"type":"map","values":"string"}},{"name":"body","type":"bytes"}]}
And data sample is as below:
{"headers": {"timestamp": "1392825607332", "parentnode":
"2014021909\/1392825638009"}, "body": {"bytes":
"{"row":"000372d8","data":{"x1":"v1","x2":"v2","x3":"v3"},"timestamp":1392380848474}"}}
But when I want to use this data in Mapreduce, I am trying to read this
data as AvroKey<GenericData.Record>, NullWritable in mapper. I am able to
get the whole message when I see key.datum(), I am unable access the fields
like "row", "data", "timestamp".
So how can I resolve this? Do I need to generate specific avro java class
for below schema and should I use generated class for processing in
Mapreduce or Should I use GenericData.Record itself?
{
"namespace": "com.test.avro",
"type": "record",
"name": "Event",
"fields": [
{
"name": "row",
"type": "string"
},
{
"name": "data",
"type": {
"type": "map",
"values": "string"
}
},
{
"name": "timestamp",
"type": "string"
}
]
}
Thanks & Regards,
B Anil Kumar.