Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.
The following page has been changed by GuntherHagleitner: http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification ------------------------------------------------------------------------------ attachment:mapreduce.png + [[Anchor(Store_load_bridge)]] + === Store-load sequences === + + If a script stores and loads from the same file in a script, some special processing takes place + to ensure that the jobs are executed in the right sequence. + + ==== Reversible LoadStoreFunc ==== + + If the store and load are processed using the same function and the LoadStoreFunc is reversible, + the store is processed, but the load is removed from the plan. Instead the parent of the store is + used as input for the dependent processing nodes. + + The script: + + {{{ + A = load 'page_views'; + store A into 'tmp1' using PigStorage(); + B = load 'tmp1' using PigStorage(); + C = filter B by $0 is not null; + store C into 'tmp2'; + }}} + + Will result in the following logical plan: + + attachment:load-store-rev.png + + If on the other side different load and store functions are used or the function is not reversible, + the store and load will connected in the logical plan and eventually will result in 2 jobs running + in sequence. + + The script: + + {{{ + A = load 'page_views'; + store A into 'tmp1' using PigStorage(); + B = load 'tmp1' using BinStorage(); + C = filter B by $0 is not null; + store C into 'tmp2'; + }}} + + Will result in the following logical plan: + + attachment:load-store-non.png + + [[Anchor(File_commands)]] + === File commands === + + Commands like rm, rmf, mv, copyToLocal and copy will trigger execution of all the stores that + were defined before the command. This is done so that we can make sure that the targets of these + commands will be there. + + For instance: + + {{{ + A = load 'foo'; + store A into 'bar'; + mv bar baz; + rm foo; + A = load 'baz'; + store A into 'foo'; + }}} + + Will result in a job that produces bar, then the mv and rm are executed. Finally, another job + is run that will generate foo. + [[Anchor(Phases)]] == Phases ==