Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by PiSong:
http://wiki.apache.org/pig/NewLOProject

New page:
The new LOProject works in two different ways:-
 * Given 1 index, it outputs datum.
 * Given 2 or more indexes, it outputs tuple.

Besides that it can be marked as sentinel, meaning it bridges data from outer 
plan to inner plan.

Doesn't seem it having __too many meanings__?

'''Example'''
{{{
B = COGroup A BY $0, B BY S1 ;
C = FOREACH B GENERATE flatten(A.(f1, f2)), group ;
}}}
Here are the inner plans (inside GENERATE):-
{{{
     (plan1)                 (plan2)

Project(A.(f1, f2))         Project(group) 

}}}
The one in the first plan returns projected bag but the one from the second 
plan returns datum. Both of them also act as bridges between outer/inner plans.

== My suggestion ==

It would be __cleaner__ and __more understandable__ if we just:-
 1. Introduce LOSentinel which can be used to get 1 field out of outer plan 
(from tuple or bag).
 1. Use LOProject only when projecting tuples or bags (and output tuple/bag)

Following examples show plans inside LOGenerate:-

'''Example1''' 
{{{
B = FOREACH A GENERATE x1*x2 ;

Sentinel(x1) Sentinel(x2) 
        \    /
          MUL
}}}

'''Example2'''
{{{
FOREACH C GENERATE FLATTEN(A.(f1, f2)), group ;

     (plan1)                 (plan2)

    Sentinel(A)             Sentinel(group)
        |
  Project(f1, f2)          

}}}

Note: Flatten is handled by LOGenerate

'''Example3'''
{{{
W = LOAD '...' AS (url, outlink);
G = GROUP W by url;
R = FOREACH G {
        FW = FILTER W BY outlink eq 'www.apache.org';
        PW = FW.outlink;
        DW = DISTINCT PW;
        GENERATE group, COUNT(DW);
}

   (plan1)           (plan2)

  Sentinel(group)   Sentinel(W)
                        |
                      Filter
                        |
                  Project(outlink)
                        |
                     Distinct 
                        |
                       COUNT

}}}

Thought?

Reply via email to