Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 

The following page has been changed by AlanGates:

  || Optimization || The Pig optimizers needs to call fieldsToRead so that Load 
functions that can do column skipping do it. || || || gates || ||
  || Scalability || Pig's default join (symmetric hash) currently depends on 
being able to fit all of the values for a given join key for one of the inputs 
into memory.  (It does try to spill to disk in the case where it cannot fit 
them all into memory.  In practice this often fails as it is not good at 
understanding when memory is low enough that it should spill.  Even in the case 
where it does not fail, spilling to disk and rereading from disk is very slow.) 
 If instances of keys with a large number of values were broken up so that the 
row set could fit in memory and then shipped to multiple reducers.  A sampling 
pass would need to be done first to determine which keys to break up.  See || || || chris olston || gates ||
  || Scalability || Improve memory footprint for a tuple.  See || 
[ 793] || || olgan || ||
+ || Optimization || Change physical operators to pass list of tuples in 
getNext instead of one tuple at a time. || 
[ 688] || || Thejas || ||

Reply via email to