Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by OlgaN:
http://wiki.apache.org/pig/PigStreamingFunctionalSpec

------------------------------------------------------------------------------
  define X `stream.pl foo.txt` ship('stream.pl', 'foo.txt')
  }}}
  
- Note that Pig would not automatically ship any dependecies. It is the 
responsibility of the user to specify all the dependencies explicitely and also 
make sure that the software that the processing relies on such as, for 
instance, `perl` or `python` is installed on the cluster.
+ Note that Pig would not automatically ship any dependecies. It is the 
responsibility of the user to specify all the dependencies explicitely and also 
make sure that the software that the processing relies on such as, for 
instance, `perl` or `python` is installed on the cluster. The files will be 
shipped to the task's current working directory and only relative paths to them 
should be specified. Any pre-installed binaries are expected to be specified in 
the PATH.
  
  If `ship` and `cache` options are not specified, pig will attempt to ship the 
binary in the following way:
  
@@ -150, +150 @@

  
  The approach described above works fine for binaries/jars and small data 
sets. For larger datasets, loading them at run time for every execution can 
have serious performance consequences. 
  
- Similarly to 2.1, a user will be able to specify files to cache via `cache` 
clause in the define statement. For instance,
+ Similarly to 2.1, a user will be able to specify cached files via `cache` 
clause in the define statement. For instance,
  
  {{{
  define X `stream.pl foo.gz` ship('stream.pl') cache('foo.gz')
  }}}
  
- In the later release we might choose to decide what data to ship based on the 
sizes rather than requiring users to do so.
+ Pig will not ship the files but would expect the files to be available on the 
compute nodes.
  
  [[Anchor(Input/Output_Handling)]]
  === 3 Input/Output Handling ===

Reply via email to