Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by OlgaN:
http://wiki.apache.org/pig/PigStreamingFunctionalSpec

------------------------------------------------------------------------------
  
  Serialization/deserialization is needed respectively to convert data from 
tuples to a stream to pass to the streaming application and converts streaming 
application output back to tuples.
  
- By default, the data going into the streaming command and the one coming out 
is assumed to be tab delimited.
+ By default, the data going into the streaming command and the one coming out 
is assumed to be tab delimited. 
  
  {{{
  S = stream A through `stream.pl`;
  }}}
  
- In the example above, the elements of A are concatenated with tabs and passed 
to `stream.pl`. The output of streaming is processed one line at a time and 
split on tabs. The user would be able to provide an alternative delimiter to 
default (de)serializer via `define command`:
+ In the example above, `DefaultSerializer` is used that takes tuples out of A 
and converts them into tab delimitted lines that are passed to `stream.pl`. If 
A was a result of a grouping operation, the `DefaultSerializer` would also 
flatten the data. The output of streaming is processed by `DefaultDeserializer` 
one line at a time and split on tabs. 
+ 
+ The user would be able to provide an alternative delimiter to default 
(de)serializer via `define command`:
  
  {{{
  define X `stream.pl` input(stdin using DefaultSerializer('^A')) output 
(stdout using DefaultDeserializer('^A'));
@@ -186, +188 @@

  The following serializers/deserializer will be part of pig distribution:
  
   1. !DefaultSerializer, !DefaultDeserializer as described above
-  2. !FlattenSerializer - it would take a bag and flatten it before passing it 
to streaming application.
+  2. !DontFlattenSerializer - it would not flatten bags.
   3. !PythonSerializer, !PythonDeserializer 
  
  ==== 3.2 Ability to declare schema for streaming output ====

Reply via email to