Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by ChrisOlston:
http://wiki.apache.org/pig/PigOptimizationWishList

------------------------------------------------------------------------------
  
     * look for ways to do multiple group/cogroup/join operations in a single 
map-reduce job --- this would occur if the keys share a common prefix. Example: 
group by userid+hour, then count, then group by userid, then take max --- can 
be done in one map-reduce job with userid as the reduce key.
     * choose a join strategy (symmetric hashing, fragment-and-replicate, ...); 
can probably make a reasonable choice based on file sizes [but first, we have 
to implement various join strategies in the execution layer -- currently pig 
only supports symmetric hashing]
+    * choose a CROSS strategy ((1) n-by-m grid, as currently implemented; or 
(2) n-by-1 which is better in the case of crossing a large table with a very 
small table, which can be implemented as map-only) [a common example of 
crossing a large table with a small one is when you have to normalize all 
records of Table A by dividing by a scalar S -- you can do this by crossing 
Table A with a Table B containing a single record that holds S]
  
  === Probably won't get there, and may not even want to go there ===
  

Reply via email to