Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.
The following page has been changed by GuntherHagleitner: http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification ------------------------------------------------------------------------------ The exec statment will trigger the execution of the job resulting in the file foo. This way the right execution order is enforced. + [[Anchor(Execution_Plans)]] + === Execution Plans === + Here is a closer look at what statements get combined into a single map-reduce plan. + + Any implicit or explicit split is first compiled into a map-reduce job (the splitter) that stores the input of the split and another + map-reduce job (the splittee) for each branch of the split that loads the split input and processes the branch. + + [[Anchor(Implicit_vs_Explicit)]] + ==== Implicit vs. Explicit Splits ==== + + An explicit split is a split that is specified by using the split statement. + + E.g.: + + {{{ + A = load 'foo'; + split A into B if $0 is not null, C if $0 is null; + store B into 'bar'; + store C into 'baz'; + }}} + + An implicit split is a split that is produced by using the same handle in multiple statements as input handles. + + E.g.: + + {{{ + A = load 'foo'; + B = filter A by $0 is not null; + C = filter A by $0 is null; + store B into 'bar'; + store C into 'baz'; + }}} + + The following will not produce a split, because different handles are used in the filter statements, even though + the statements are logically the same as above: + + {{{ + A = load 'foo'; + B = filter A by $0 is not null; + A = load 'foo' + C = filter A by $0 is null; + store B into 'bar'; + store C into 'baz'; + }}} + + The multi-query optimization then tries to combine splitters and splittees in the same job. + + [[Anchor(Map_only_splittee)]] + ==== Map-only Splittees ==== + + If a splittee is a map-only job (doesn't require join, cogroup, group, etc) the splittee is merged into + the splitter - into either the map or reduce plan. + + The script: + + {{{ + A = load '/user/pig/tests/data/pigmix/page_views' + as (user, action, timespent, query_term, ip_addr, timestamp, + estimated_revenue, page_info, page_links); + B = filter A by user is not null; + store B into 'filtered_by_user'; + C = filter B by query_term is null; + store C into 'filtered_by_query'; + }}} + + Will be executed as: + + attachment:map-only.png + [[Anchor(Phases)]] == Phases ==