Dear Wiki user,
You have subscribed to a wiki page or wiki category on Pig Wiki for change
notification.
The following page has been changed by XuZhang:
http://wiki.apache.org/pig/PigStreamingFunctionalSpec
--
If `ship` and `cache` options are not specified, pig will attempt to ship the
binary in the following way:
* If the first word on the streaming command is `perl` or `python`, pig
would assume that the binary is the first string it encounters that does not
start with dash.
-* Otherwise, pig will attempt to ship the first string from the command
line as long as it does not come from `/bin, /user/bin, /user/local/bin`. It
will determine that by scanning the path if an absolute path is provided or by
executing `which`. The paths can be made configurable via `set stream.skippath
paths` option.
+* Otherwise, pig will attempt to ship the first string from the command
line as long as it does not come from `/bin, /usr/bin, /usr/local/bin`. It will
determine that by scanning the path if an absolute path is provided or by
executing `which`. The paths can be made configurable via `set stream.skippath
paths` option.
To prevent a command from being shipped, an empty list can be passed to
`ship` clause.
@@ -191, +191 @@
1. !DefaultSerializer, !DefaultDeserializer as described above (This is
going to be PigStorage)
2. !PythonSerializer, !PythonDeserializer
- 3. !BinarSerailzie, !BinaryDeserializer - treats the entire file as byte
stream - no formating or interpretation.
+ 3. !BinarySerializer, !BinaryDeserializer - treats the entire file as byte
stream - no formating or interpretation.
Each deserializer will be implementing `LoadFunc` interface. Each serializer
will be implementing `StoreFunc` interface. `StoreFunc` interface will be
extended with `void flatten() throws OperationNotSupportedException;` method
that would indicate that the data needs to be flattened before it is
serialized. The class can choose not to support this functionality and through
an exception.
@@ -237, +237 @@
Y = stream X through Z;
}}}
- This tells pig that streaming application stored its complete output into
file called `outputfile` in the tasks's working directory and that the content
of that file should be serialized into Y using !MySerializer.
+ This tells pig that streaming application stored its complete output into
file called `outputfile` in the tasks's working directory and that the content
of that file should be deserialized into Y using MyDeserializer.
A user can specify multiple outputs but only the first one will be
automatically loaded; the rest would be stored in dfs using the file name
specified in the output as absolute path: