Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.
The "Pig070IncompatibleChanges" page has been changed by OlgaN. http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=1&rev2=2 -------------------------------------------------- = Backward incompatible changes in Pig 0.7.0 = - Pig 0.7.0 will include some major changes to Pig most of them driven by the [[LoadStoreRedesignProposal | Load-Store redesign]]. Some of this changes will not be backward compatible and will require users to change the pig scripts or their UDFs. This document is intended to keep track of this changes to that we can document them for the release. + Pig 0.7.0 will include some major changes to Pig most of them driven by the [[LoadStoreRedesignProposal | Load-Store redesign]]. Some of these changes will not be backward compatible and will require users to change their pig scripts or their UDFs. This document is intended to keep track of such changes so that we can document them for the release. - == Changes to the Load and Store functions == + == Changes to the Load and Store Functions == == Handling Compressed Data == + + In 0.6.0 or earlier versions Pig supported bzip compressed files with extensions of .bz or .bz2 as well as gzip compressed files with .gz extension. Pig was able to both read and write files in this format with the understanding that gzip compressed files could not be split across multiple maps while bzip compressed files could. Also, data compression was completely decoupled from the data format and Load/Store functions meaning that any loader could read compressed data and any store function could write it just by the virtue of having the right extension on the files it was reading or writing. + + With Pig 0.7.0 the read/write functionality is taking over by Hadoop's Input/OutputFormat and how compression is handled or whether it is handled at all depends on the Input/OutputFormat used by the loader/store function. + + The main input format that supports compression is TextInputFormat. It supports bzip files with .bz2 extension and gzip files with .gz extension. '''Note that it does not support .bz files'''. PigStorage is the only loader that comes with Pig that is derived from TextInputFormat which means it will be able to handle .bz2 and .gz files. Other laders such as BinStorage will no longer support compression. + + On the store side, TextOutputFormat also supports compression but the store function needs do to additional work to enable it. Again, PigStorage will support compressions while other functions will not. + + If you have a custom load/store function that needs to support compression, you would need to make sure that the underlying Input/OutputFormat supports this type of compression. + == Local Mode == == Streaming == == Other Changes ==