[Pig Wiki] Update of "ProposedProjects" by daijy

Apache Wiki Fri, 17 Jul 2009 17:30:22 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.


The following page has been changed by daijy:
http://wiki.apache.org/pig/ProposedProjects

------------------------------------------------------------------------------
  || Scalability || Pig's default join (symmetric hash) currently depends on 
being able to fit all of the values for a given join key for one of the inputs 
into memory.  (It does try to spill to disk in the case where it cannot fit 
them all into memory.  In practice this often fails as it is not good at 
understanding when memory is low enough that it should spill.  Even in the case 
where it does not fail, spilling to disk and rereading from disk is very slow.) 
 If instances of keys with a large number of values were broken up so that the 
row set could fit in memory and then shipped to multiple reducers.  A sampling 
pass would need to be done first to determine which keys to break up.  See 
http://wiki.apache.org/pig/PigSkewedJoinSpec || || || chris olston || gates ||
  || Scalability || Improve memory footprint for a tuple.  See 
http://wiki.apache.org/pig/PigMemory || 
[https://issues.apache.org/jira/browse/PIG-793 793] || || olgan || ||
  || Optimization || Change physical operators to pass list of tuples in 
getNext instead of one tuple at a time. || 
[https://issues.apache.org/jira/browse/PIG-688 688] || || Thejas || ||
+ || Usability || Fixing dfs commands in Pig || 
[https://issues.apache.org/jira/browse/PIG-891 891] || || Daniel || ||

[Pig Wiki] Update of "ProposedProjects" by daijy

Reply via email to