I think we have cases where it is a writable (sent as bytearray to pig
and stored as sequence files iirc - hardcoded schema for the output),
and plain csv text output.
Text output has issues due to it being not very 'nice' in terms of
schema evolution, but extremely simple to code it up for prototyping !
The only constraint is to have appropriate LoadFunc/StoreFunc [1] for
pig and appropriate input format/reader for hadoop.
Regards,
Mridul
[1] LoadFunc is not really required, but helps in case you want to try
other pig jobs on the intermediate output.
On Wednesday 28 July 2010 08:55 PM, Corbin Hoenes wrote:
Mridul -
What file format do you use to exchange data between pig and java? Text or
something else?
On Jul 25, 2010, at 1:52 PM, Mridul Muralidharan wrote:
In some of our pipelines, pig jobs are part of the pipeline - which consist of
other hadoop jobs, shell executions, etc.
We currently do this by using intermediate file dumps.
Regards,
Mridul
On Friday 23 July 2010 10:45 PM, Corbin Hoenes wrote:
What are some strategies to have pig and java mapreduce jobs exchange data?
E.g. we find a particular pig script in a chain is too slow and we could
optimize with a custom mapreduce job we'd want pig to write the data out in a
format that mapreduce could access and vice versa.