Re: best way for pig and mapreduce jobs to be used interchangeably

Mridul Muralidharan Wed, 28 Jul 2010 12:49:56 -0700

I think we have cases where it is a writable (sent as bytearray to pigand stored as sequence files iirc - hardcoded schema for the output),and plain csv text output.Text output has issues due to it being not very 'nice' in terms ofschema evolution, but extremely simple to code it up for prototyping !

The only constraint is to have appropriate LoadFunc/StoreFunc [1] forpig and appropriate input format/reader for hadoop.



Regards,
Mridul

[1] LoadFunc is not really required, but helps in case you want to tryother pig jobs on the intermediate output.



On Wednesday 28 July 2010 08:55 PM, Corbin Hoenes wrote:

Mridul -

What file format do you use to exchange data between pig and java?  Text or 
something else?

On Jul 25, 2010, at 1:52 PM, Mridul Muralidharan wrote:



In some of our pipelines, pig jobs are part of the pipeline - which consist of 
other hadoop jobs, shell executions, etc.
We currently do this by using intermediate file dumps.


Regards,
Mridul



On Friday 23 July 2010 10:45 PM, Corbin Hoenes wrote:

What are some strategies to have pig and java mapreduce jobs exchange data?  
E.g. we find a particular pig script in a chain is too slow and we could 
optimize with a custom mapreduce job we'd want pig to write the data out in a 
format that mapreduce could access and vice versa.

Re: best way for pig and mapreduce jobs to be used interchangeably

Reply via email to