Mridul,
By design, we are heading to the point where all (or nearly all) operators
are supported in nested queries plus they will not be limited to only 1
nested level.
Pi
On Mon, May 5, 2008 at 4:48 PM, Mridul Muralidharan <[EMAIL PROTECTED]>
wrote:
>
> This is something which is quite heavily (atleast by our team).
> I was hoping this would be expanded - like add support for nested
> statement support in FILTER also (like in FOREACH), for example : currently
> we have to hack using FOREACH & flags statements to functionality since
> FILTER does not support it.
>
> Regards,
> Mridul
>
>
> Santhosh Srinivasan wrote:
>
> > Pig currently allows implicit splits within the foreach block. An
> > example that illustrates this behaviour follows:
> >
> > A = load 'input1';
> > B = load 'input2';
> > C = cogroup A by $0, B by $0;
> > D = foreach C do {
> > XX = filter A by $0 > 5;
> > XY = filter B by $0 > 5; //at this point, there is an implicit
> > split in the foreach plan
> > generate XX.$1, XY.$1; //here the generate needs to handle the
> > merge as its inputs are from XX and XY
> > }
> >
> > Notice that there is an implicit split in the foreach plan. Each input
> > tuple from C has to be piped to XX and XY. The generate has to now
> > handle the merge as both XX and XY serve as inputs. The inputs to
> > generate are now a DAG and not a tree.
> >
> > Generate
> > / \
> > XX XY
> > \ /
> > Foreach
> >
> > This makes the execution pipeline fairly complex. Should we restrict the
> > usage to not allow DAGs as input to the generate?
> >
> >
> > Thoughts?
> >
> > Thanks,
> > Santhosh
> >
>
>