Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.
The "Pig070IncompatibleChanges" page has been changed by OlgaN. http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=3&rev2=4 -------------------------------------------------- Pig 0.7.0 will include some major changes to Pig most of them driven by the [[LoadStoreRedesignProposal | Load-Store redesign]]. Some of these changes will not be backward compatible and will require users to change their pig scripts or their UDFs. This document is intended to keep track of such changes so that we can document them for the release. == Changes to the Load and Store Functions == + + TBW + == Handling Compressed Data == In 0.6.0 or earlier versions Pig supported bzip compressed files with extensions of .bz or .bz2 as well as gzip compressed files with .gz extension. Pig was able to both read and write files in this format with the understanding that gzip compressed files could not be split across multiple maps while bzip compressed files could. Also, data compression was completely decoupled from the data format and Load/Store functions meaning that any loader could read compressed data and any store function could write it just by the virtue of having the right extension on the files it was reading or writing. @@ -19, +22 @@ == Local Mode == == Streaming == + + There are two things that are changing in streaming. + + First, in the initial (0.7.0) release, '''we will not support for optimization''' where if streaming follows load of compatible format or is followed by format compatible store the data is not parsed but passed in chunks from the loader or to the store. The main reason we are not porting the optimization is that the work is not trivial and that the optimization was never documented and so unlekly to be used. + + Second, '''you can no longer use load/store functions for (de)serialization.''' + == Split by File == In the earlier versions of Pig, a user could specify "split by file" on the loader statement which would make sure that each map got the entire file rather than the files were further divided into blocks. This feature was primarily design for streaming optimization but could also be used with loaders that can't deal with incomplete records. We don't believe that this functionality has been widely used.