Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by OlgaN:
http://wiki.apache.org/pig/PigUserCookbook

------------------------------------------------------------------------------
  significant.  In one test where the key was null 7% of the time and the data 
was spread across 200 reducers, we saw a about a 10x speed up in the query by 
adding the early
  filters.
  
+ '''Take Advantage of Join Optimization'''
+ 
+ This feature is only available in the new code currently accessible from 
types branch: http://svn.apache.org/viewvc/hadoop/pig/branches/types/.
+ 
+ The optimization insures that the last table in the join is not brought into 
memory but stream through instead. The optimization reduces the amount of 
memory used which means you can avoid spilling the data and also should be able 
to scale your query to larger data volumes.
+ 
+ To take advantage of this optimization, make sure that the table with the 
largest number of tuples per key is the last table in your query.
+ 
+ {{{
+ small = load 'small_file' as (t, u, v);
+ large = load 'large_file' as (x, y, z);
+ C = join small by t, large by x;
+ }}}
  
  '''Prefer DISTINCT over GROUP BY - GENERATE'''
  

Reply via email to