Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.
The following page has been changed by OlgaN: http://wiki.apache.org/pig/PigUserCookbook ------------------------------------------------------------------------------ significant. In one test where the key was null 7% of the time and the data was spread across 200 reducers, we saw a about a 10x speed up in the query by adding the early filters. + '''Take Advantage of Join Optimization''' + + This feature is only available in the new code currently accessible from types branch: http://svn.apache.org/viewvc/hadoop/pig/branches/types/. + + The optimization insures that the last table in the join is not brought into memory but stream through instead. The optimization reduces the amount of memory used which means you can avoid spilling the data and also should be able to scale your query to larger data volumes. + + To take advantage of this optimization, make sure that the table with the largest number of tuples per key is the last table in your query. + + {{{ + small = load 'small_file' as (t, u, v); + large = load 'large_file' as (x, y, z); + C = join small by t, large by x; + }}} '''Prefer DISTINCT over GROUP BY - GENERATE'''