Pig to Hadoop complex data exchange format

Gianmarco Fri, 15 Apr 2011 09:19:11 -0700

Hi all,

I have a pig script that produces a complex nested data structure:


result: {child: chararray, childTraces: {action: int, time: long}, legacy:
{parent: chararray, parentTraces: {action: int, time: long}}}

I would like to post-process the output of the pig script with a mapreduce
job.
In the mapreduce job I would like to do some nested for and iterate over the
bags.

Do you have any advice on which would be the simplest way to store pig's
output in order not to have to write my own parser in mapreduce?
I thought about using JSON but it looks like there is no JSON store format
for tuples yet (I know elephantbird can store maps, but I would need to
convert my result to a nested map, which is a bit unnatural).
Avro is not an easy option on the hadoop side.

Any help would be highly appreciated.

Thanks,
--
Gianmarco De Francisci Morales

Pig to Hadoop complex data exchange format

Reply via email to