Re: storing intermediate results ?

Ashutosh Chauhan Wed, 07 Oct 2009 09:40:42 -0700

Hi Vincent,

Pig has a multi-query optimization which if firing will automatically figure
out that join needs to be done only once and there will not be any
repetition of work. If Pig determines that its not safe to do that
optimization then its possible that your join is getting computed more then
once. If thats the case, then it will be better to do the join and store it.
You can do that within same script using "exec"
http://hadoop.apache.org/pig/docs/r0.3.0/piglatin.html#exec


You can read more about multi-query optimization here:
http://hadoop.apache.org/pig/docs/r0.3.0/piglatin.html#Multi-Query+Execution

Hope it helps,
Ashutosh

On Wed, Oct 7, 2009 at 10:54, Vincent BARAT <vincent.ba...@ubikod.com>wrote:

> Hello,
>
> I'm new to PIG, and I have a bunch of statements that process the same
> input, which is actually the result of a JOIN between two very big data set
> (millions of entries).
>
> I wonder if it is better (faster) to save the result of this JOIN into an
> Hadoop file and then to LOAD it, instead of just relying on PIG
> optimizations ?
>
> Thank a lot for your help.
>

Re: storing intermediate results ?

Reply via email to