Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "Pig070IncompatibleChanges" page has been changed by PradeepKamath.
http://wiki.apache.org/pig/Pig070IncompatibleChanges?action=diff&rev1=18&rev2=19

--------------------------------------------------

  
  In the earlier versions of Pig, a user could specify "split by file" on the 
loader statement which would make sure that each map got the entire file rather 
than the files were further divided into blocks. This feature was primarily 
design for streaming optimization but could also be used with loaders that 
can't deal with incomplete records. We don't believe that this functionality 
has been widely used.
  
- Because the slicing of the data is no longer in Pig's control, we can't 
support this feature generically for every loader. If a particular loader needs 
this functionality, it will need to make sure that the underlying InputFormat 
supports it. 
+ Because the slicing of the data is no longer in Pig's control, we can't 
support this feature generically for every loader. If a particular loader needs 
this functionality, it will need to make sure that the underlying InputFormat 
supports it. (Any !InputFormat based on !FileInputFormat will support this 
through the mapred.min.split.size - if this property is set to a value greater 
than the size of any of the files to be loaded then each file will be split as 
a different split. This property can be provided on the pig command line as a 
java -D property - note that this will apply to all jobs that will be run as 
part of that script.
  
  We will have a different approach for streaming optimization if that 
functionality is necessary.
  

Reply via email to