Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.
The following page has been changed by OlgaN: http://wiki.apache.org/pig/PigStreamingFunctionalSpec ------------------------------------------------------------------------------ Serialization/deserialization is needed respectively to convert data from tuples to a stream to pass to the streaming application and converts streaming application output back to tuples. - By default, the data going into the streaming command and the one coming out is assumed to be tab delimited. + By default, the data going into the streaming command and the one coming out is assumed to be tab delimited. {{{ S = stream A through `stream.pl`; }}} - In the example above, the elements of A are concatenated with tabs and passed to `stream.pl`. The output of streaming is processed one line at a time and split on tabs. The user would be able to provide an alternative delimiter to default (de)serializer via `define command`: + In the example above, `DefaultSerializer` is used that takes tuples out of A and converts them into tab delimitted lines that are passed to `stream.pl`. If A was a result of a grouping operation, the `DefaultSerializer` would also flatten the data. The output of streaming is processed by `DefaultDeserializer` one line at a time and split on tabs. + + The user would be able to provide an alternative delimiter to default (de)serializer via `define command`: {{{ define X `stream.pl` input(stdin using DefaultSerializer('^A')) output (stdout using DefaultDeserializer('^A')); @@ -186, +188 @@ The following serializers/deserializer will be part of pig distribution: 1. !DefaultSerializer, !DefaultDeserializer as described above - 2. !FlattenSerializer - it would take a bag and flatten it before passing it to streaming application. + 2. !DontFlattenSerializer - it would not flatten bags. 3. !PythonSerializer, !PythonDeserializer ==== 3.2 Ability to declare schema for streaming output ====