Hi Zaki, Please file a jira if you are able to identify the problem you were facing and the steps to reproduce it. Thanks, Thejas
On 10/7/09 1:08 PM, "zaki rahaman" <zaki.raha...@gmail.com> wrote: > Vincent, > > I've run into this problem before, if you know beforehand that you're going > to recycle this joined dataset for several different operations or > pipelines, it is worth your time to simply store it intermediately. While > Pig can definitely handle this and the Multiquery Optimizer is great, I've > run into problems with it before (can't remember what now exactly), and > pre-joining has worked well for me. > > Hopefully you found some part of that useful. > > On Wed, Oct 7, 2009 at 12:33 PM, Ashutosh Chauhan < > ashutosh.chau...@gmail.com> wrote: > >> Hi Vincent, >> >> Pig has a multi-query optimization which if firing will automatically >> figure >> out that join needs to be done only once and there will not be any >> repetition of work. If Pig determines that its not safe to do that >> optimization then its possible that your join is getting computed more then >> once. If thats the case, then it will be better to do the join and store >> it. >> You can do that within same script using "exec" >> http://hadoop.apache.org/pig/docs/r0.3.0/piglatin.html#exec >> >> You can read more about multi-query optimization here: >> >> http://hadoop.apache.org/pig/docs/r0.3.0/piglatin.html#Multi-Query+Execution >> >> Hope it helps, >> Ashutosh >> >> On Wed, Oct 7, 2009 at 10:54, Vincent BARAT <vincent.ba...@ubikod.com >>> wrote: >> >>> Hello, >>> >>> I'm new to PIG, and I have a bunch of statements that process the same >>> input, which is actually the result of a JOIN between two very big data >> set >>> (millions of entries). >>> >>> I wonder if it is better (faster) to save the result of this JOIN into an >>> Hadoop file and then to LOAD it, instead of just relying on PIG >>> optimizations ? >>> >>> Thank a lot for your help. >>> >> > >