Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 

The "Pig070IncompatibleChanges" page has been changed by OlgaN.


  Pig 0.7.0 will include some major changes to Pig most of them driven by the 
[[LoadStoreRedesignProposal | Load-Store redesign]]. Some of these changes will 
not be backward compatible and will require users to change their pig scripts 
or their UDFs. This document is intended to keep track of such changes so that 
we can document them for the release.
  == Changes to the Load and Store Functions ==
  == Handling Compressed Data ==
  In 0.6.0 or earlier versions Pig supported bzip compressed files with 
extensions of .bz or .bz2 as well as gzip compressed files with .gz extension. 
Pig was able to both read and write files in this format with the understanding 
that gzip compressed files could not be split across multiple maps while bzip 
compressed files could. Also, data compression was completely decoupled from 
the data format and Load/Store functions meaning that any loader could read 
compressed data and any store function could write it just by the virtue of 
having the right extension on the files it was reading or writing.
@@ -19, +22 @@

  == Local Mode ==
  == Streaming ==
+ There are two things that are changing in streaming.
+ First, in the initial (0.7.0) release, '''we will not support for 
optimization''' where if streaming follows load of compatible format or is 
followed by format compatible store the data is not parsed but passed in chunks 
from the loader or to the store. The main reason we are not porting the 
optimization is that the work is not trivial and that the optimization was 
never documented and so unlekly to be used.
+ Second, '''you can no longer use load/store functions for 
  == Split by File ==
  In the earlier versions of Pig, a user could specify "split by file" on the 
loader statement which would make sure that each map got the entire file rather 
than the files were further divided into blocks. This feature was primarily 
design for streaming optimization but could also be used with loaders that 
can't deal with incomplete records. We don't believe that this functionality 
has been widely used.

Reply via email to