Dmitriy,

that's the same sort of thing I am talking about, thank you for your reply !

Robert.

On 6 February 2011 21:02, Dmitriy Ryaboy <[email protected]> wrote:

> Robert,
> It is not clear from your code snippets what the relationships are
> between the various "var" relations. Could you provide more detail?
>
> It sort of sounds like you are asking about Pig's multiquery
> optimization. You can read about it in these pages:
> http://pig.apache.org/docs/r0.7.0/piglatin_ref1.html#Multi-Query+Execution
> http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification
>
>
>
> On Sun, Feb 6, 2011 at 12:11 PM, Robert Waddell
> <[email protected]> wrote:
> > Hey Guys,
> >
> > I am trying to optimize my Pig jobs as much as possible and wanted to
> know a
> > little about how Pig handles its loading of data.
> >
> > When I have:
> >
> > var1 = LOAD ....
> > local_var1 = FOREACH
> > local_var1 = JOIN ... [etc]
> > ~~
> > ~~
> > ~~
> > STORE local_var1 ...
> > local_var2 = FOREACH local_var2
> > local_var2 = JOIN ... [etc]
> > ~~
> > STORE local_var2
> >
> > am I gaining any performance improvements by not loading a lengthy file
> > everytime, instead, storing it in a different alias (local_var2 &
> > local_var1) and manipulating it there, preserving the original (var1), or
> am
> > I better having multiple LOADs and manipulating the original alias
> directly
> > ?
> >
> > Robert.
> >
>

Reply via email to