Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 

The following page has been changed by daijy:

  || Scalability || Pig's default join (symmetric hash) currently depends on 
being able to fit all of the values for a given join key for one of the inputs 
into memory.  (It does try to spill to disk in the case where it cannot fit 
them all into memory.  In practice this often fails as it is not good at 
understanding when memory is low enough that it should spill.  Even in the case 
where it does not fail, spilling to disk and rereading from disk is very slow.) 
 If instances of keys with a large number of values were broken up so that the 
row set could fit in memory and then shipped to multiple reducers.  A sampling 
pass would need to be done first to determine which keys to break up.  See || || || chris olston || gates ||
  || Scalability || Improve memory footprint for a tuple.  See || 
[ 793] || || olgan || ||
  || Optimization || Change physical operators to pass list of tuples in 
getNext instead of one tuple at a time. || 
[ 688] || || Thejas || ||
+ || Usability || Fixing dfs commands in Pig || 
[ 891] || || Daniel || ||

Reply via email to