Hi Vincent, Pig has a multi-query optimization which if firing will automatically figure out that join needs to be done only once and there will not be any repetition of work. If Pig determines that its not safe to do that optimization then its possible that your join is getting computed more then once. If thats the case, then it will be better to do the join and store it. You can do that within same script using "exec" http://hadoop.apache.org/pig/docs/r0.3.0/piglatin.html#exec
You can read more about multi-query optimization here: http://hadoop.apache.org/pig/docs/r0.3.0/piglatin.html#Multi-Query+Execution Hope it helps, Ashutosh On Wed, Oct 7, 2009 at 10:54, Vincent BARAT <vincent.ba...@ubikod.com>wrote: > Hello, > > I'm new to PIG, and I have a bunch of statements that process the same > input, which is actually the result of a JOIN between two very big data set > (millions of entries). > > I wonder if it is better (faster) to save the result of this JOIN into an > Hadoop file and then to LOAD it, instead of just relying on PIG > optimizations ? > > Thank a lot for your help. >