Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by PiSong:
http://wiki.apache.org/pig/NestedLogicalPlan

------------------------------------------------------------------------------
  
  [pi] We can think about this in two ways: first, only one of them do all the 
work. Second, we split responsibilities. I'm confused with what it is. We 
should come up with clear cut of responsibilies. Though, if you say "foreach 
just takes each input and uses", then it is not a dummy.
  
+ [pi] This is one possible way to describe internal operations of FOREACH 
GENERATE:-
+ 
+ Operator FOREACH:
+ {{{
+ FOREACH: Bag x (f: Tuple -> Tuple) x  (list of flatten indexes) -> Bag
+ }}}
+  1. Iterate through the bag from input port
+  1. For each tuple in the bag, apply f: Tuple -> Tuple  (Which is the inner 
plan)
+  1. Flatten and put all the output tuples to the output bag. Repeat previous 
step again.
+  1. Output bag to the output port.
+ 
+ This way we don't need GENERATE and only use a normal inner plan in FOREACH . 
The list of flatten flags is belong to FOREACH.
+ 
+ 
  ==== LOProject ====
  This operator is only for mapping input tuple to output tuple (eg. 
{A,B,C,D,E} ==> {A,C,D} ). Given the fact that we allow users to have fields in 
COGROUP, FILTER, FOREACH as expressions, LOProject then becomes just a special 
case when users merely specify direct mapping. Since we have agreed upon the 
concept of inner plans, I think LOProject is not needed.
  
  [shrav]Project is a consistent way implementing these fields that the user 
mentions without letting the user bother about all the conversions he might 
need to do if we just pass the raw tuple to him. Also you can only project out 
one field and not multiple fields.
+ 
  [pi] What you mentioned here is different from the current implementation.
  

Reply via email to