Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by GuntherHagleitner:
http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification

------------------------------------------------------------------------------
  The exec statment will trigger the execution of the job resulting in the file 
foo. This way the right execution order
  is enforced.
  
+ [[Anchor(Execution_Plans)]]
+ === Execution Plans ===
+ Here is a closer look at what statements get combined into a single 
map-reduce plan.
+ 
+ Any implicit or explicit split is first compiled into a map-reduce job (the 
splitter) that stores the input of the split and another
+ map-reduce job (the splittee) for each branch of the split that loads the 
split input and processes the branch.
+ 
+ [[Anchor(Implicit_vs_Explicit)]]
+ ==== Implicit vs. Explicit Splits ====
+ 
+ An explicit split is a split that is specified by using the split statement.
+ 
+ E.g.:
+ 
+ {{{
+ A = load 'foo';
+ split A into B if $0 is not null, C if $0 is null;
+ store B into 'bar';
+ store C into 'baz';
+ }}}
+ 
+ An implicit split is a split that is produced by using the same handle in 
multiple statements as input handles.
+ 
+ E.g.:
+ 
+ {{{
+ A = load 'foo';
+ B = filter A by $0 is not null;
+ C = filter A by $0 is null;
+ store B into 'bar';
+ store C into 'baz';
+ }}}
+ 
+ The following will not produce a split, because different handles are used in 
the filter statements, even though
+ the statements are logically the same as above:
+ 
+ {{{
+ A = load 'foo';
+ B = filter A by $0 is not null;
+ A = load 'foo'
+ C = filter A by $0 is null;
+ store B into 'bar';
+ store C into 'baz';
+ }}}
+ 
+ The multi-query optimization then tries to combine splitters and splittees in 
the same job.
+ 
+ [[Anchor(Map_only_splittee)]]
+ ==== Map-only Splittees ====
+ 
+ If a splittee is a map-only job (doesn't require join, cogroup, group, etc) 
the splittee is merged into
+ the splitter - into either the map or reduce plan.
+ 
+ The script:
+ 
+ {{{
+ A = load '/user/pig/tests/data/pigmix/page_views'
+     as (user, action, timespent, query_term, ip_addr, timestamp,
+         estimated_revenue, page_info, page_links);
+ B = filter A by user is not null;
+ store B into 'filtered_by_user';
+ C = filter B by query_term is null;
+ store C into 'filtered_by_query';
+ }}}
+ 
+ Will be executed as:
+ 
+ attachment:map-only.png
+ 
  [[Anchor(Phases)]]
  == Phases ==
  

Reply via email to