Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by Arun C Murthy:
http://wiki.apache.org/pig/PigStreamingDesign

------------------------------------------------------------------------------
  
  {{{
    class org.apache.hadoop.mapred.lib.external.ExecutableManager {
+     void configure() throws IOException;
+     void run() throws IOException;
-     void setup() throws Exception;
+     void close() throws IOException;
+     void next(WritableComparable key, Writable value);
-     void teardown() throws Exception;
-     void setInput(Writable w);
-     Writable getOutput();
    }
  }}}
  
- The important deviation from current Pig infrastructure is that there isn't a 
one-to-one mapping between inputs and output records anymore since the 
user-script could (potentially) consume ''all'' the input before it emits 
''any'' output records. Hence, StreamEvalSpec.add will call 
{{{ExecutableManager.setInput((Writable)(d))}}}; while it collects output from 
the task ({{{ExecutableManager.getOutput()}}}) and pass it along for the next 
''eval'' in the pipeline.
+ The important deviation from current Pig infrastructure is that there isn't a 
one-to-one mapping between inputs and output records anymore since the 
user-script could (potentially) consume ''all'' the input before it emits 
''any'' output records. The way to get around this is to wrap the 
{{{DataCollector}}} and hence the next successor in the pipleline in an 
{{{OutputCollector}}} and pass it along to the {{{ExecutableManager}}}.
  

Reply via email to