Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 

The following page has been changed by Shravan Narayanamurthy:

   * These have been fixed however in hadoop-16
   * Not sure how this will affect Example generator
+ Pi Song had some interesting observations:
+  1) Will the LocalJobRunner be invoked when processing the nested plan inside 
+  Currently, no. We have local versions of operators that are allowed inside 
the nested plan which can be used for running tuples through the plan. However, 
later if we intend to support a full blown foreach with arbitrary nesting and 
all operators supported, we can take two approaches:
+    i. Have local version of all operators and just use the current model to 
run tuples through. This also entails that we would not have to change anything 
in the MRCompiler.
+    ii. Change MRCompiler to process nested foreach as a blocking operator and 
recursilvely process it creating a list of dependent jobs. In this case, it 
probably would make more sense to run it in MapReduce itself and not locally 
for the nested plan. However, this can be a choice and the MapReduce Launcher 
can decide to execute these plans either locally by invoking the LocalJobRunner 
or the Hadoop Job Tracker based on the input size for the plans.
+  2) Will the invocation of LocalJobRunner have some latency?
+     Definitely it does. As measured in hadoop 15, it has about 5 sec startup 
latency. Whether this affects depends on how and where we are using 
LocalJobRunner. If we strictly use it only when the user asks for local 
execution mode it should not matter. Also if the size of the data is at least 
in 10s of MBs, the LocalJobRunner performs better than streaming tuples through 
the plan of local operators.
+ I guess the choice is harder now :)
+ The choice now depends on what we want to do for the full blown foreach. 
Since I would like to implement choice (ii), I would vote for using 

Reply via email to