Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 

The following page has been changed by UtkarshSrivastava:

     * In the current (1.2) and earlier releases, storage functions are case 
sensitive. This will get changes in the future releases.
     * !PigStorage can only store flat tuples, i.e., tuples having atomic 
fields. If you want to store nested data, use !BinStorage instead.
+ [[Anchor(Working_with_compressed_files)]]
+ === Working with Compressed Files ===
+ ==== Compressed Input ====
+ Compressed files are difficult to process in parallel, since they cannot, in 
general, be split into fragments and independently decompressed. However, if 
the compression is block-oriented (e.g. bz2), the splitting and parallel 
processing is easy to do.
+ Pig has inbuilt support for processing .bz2 files in parallel (.gz support is 
coming soon). If the input file name extension is .bz2, Pig decompresses the 
file on the fly and passes the decompressed input stream to your load function. 
For example,
+ {{{
+ A = LOAD 'input.bz2' USING myLoad();
+ }}}
+ Multiple instances of myLoad() (as dictated by the degree of parallelism) 
will be created and each will be given a fragment of the *decompressed* version 
of input.bz2 to process.
+ ==== Compressed Output ====
+ Pig currently supports output compression in the .bz2 format (so that the 
output can subsequently be loaded in parallel). All you have to do is include a 
.bz2 extension in the name of your output file. Your store function (if any) 
should simply write uncompressed data, and Pig will compress it on the fly.
+ For example,
+ {{{
+ STORE A into 'output.bz2' USING myStore();
+ }}}
  === Experimenting with Pig Latin syntax ===

Reply via email to