Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by OlgaN:
http://wiki.apache.org/pig/PigStreamingFunctionalSpec

------------------------------------------------------------------------------
  {{{
  <define command> ::= define <alias> <computation spec>
  <alias> ::= pig identifier
- <comparison spec> ::= <UDF spec> | <command spec>
+ <computation spec> ::= <UDF spec> | <command spec>
  <UDF spec> ::= pig standard function spec
  <command spec> ::= `<command>` [<input spec>] [<output spec>] [<ship_spec>] 
[<cache_spec>]
  <command> ::= standard Unix command including the arguments
@@ -90, +90 @@

  
     * '''unordered''' - no guarantees on the order in which the data is 
delivered to the streaming application
     * '''grouped''' - the data for the same key is guaranteed to be processed 
contiguously on a single node
-    * '''grouped and ordered''' - date is grouped and sorted within a group on 
user specified key.
+    * '''grouped and ordered''' - data is grouped and sorted within a group on 
user specified key.
  
  In addition to position, the data grouping and ordering can be determine by 
the data itself. For now, users would need to know the property of the data to 
be able to take advantage of its structure; however, eventually, this should be 
part of metadata.
  
@@ -142, +142 @@

  
  To prevent a command from being shipped, an empty list can be passed to 
`clause`.
  
+ Note that we need to make sure that executables retain their permissions and 
can be executed on the compute nodes.
+ 
  ==== 2.2 Ability to cache data ====
  
  The approach described above works fine for binaries/jars and small data 
sets. For larger datasets, loading them at run time for every execution can 
have serious performance consequences. 

Reply via email to