hi Dmitriy, Thanks for the information.
Can you share your views on the below query. BinStorage() PigDump() PigStorage() TextLoader() Load or storing in which of the above format.Will optimize the queries.Considering i have text files. Regards Abhi On Mon, Oct 8, 2012 at 12:10 AM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote: > Pig has multi-query execution optimization built-in. If you compute > multiple relations in your script that share parent relations, those > parent relations will be computed only once. You don't have to do > anything to make that happen. > > If you prefer to handle your own caching, you would have to handle it > yourself, of course. > > There is some academic work on reusing parts of previous runs of the > same script (potentially on overlapping, but not identical datasets); > the papers to read are: > Nectar http://research.microsoft.com/apps/pubs/default.aspx?id=131525 > ReStore: http://vldb.org/pvldb/vol5/p586_imanelghandour_vldb2012.pdf > > There are a lot of papers on iterative mapreduce, I am sure if you > start with ReStore citations and/or Google Scholar, you'll find some. > > None of that has yet made it into Pig yet; I believe a general compute > caching framework would be very useful, and look forward to someone > taking up that challenge.. > > D > > On Fri, Oct 5, 2012 at 2:51 PM, Abhishek <abhishek.dod...@gmail.com> wrote: >> BinStorage() >> PigDump() >> PigStorage() >> TextLoader() >> >> Load or storing in which of the above format.Will optimize the queries. >> >> Can cache be any where in pig.How can the cache be use ful in pig. >> >> Regards >> Abhi