Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by ChrisOlston:
http://wiki.apache.org/pig/PigOptimizationWishList

------------------------------------------------------------------------------
  
  === Already Implemented ===
  
- * pipeline a sequence of stateless operators into a single Map or single 
Reduce
+    * pipeline a sequence of stateless operators into a single Map or single 
Reduce
  
  === Implemented, but room for improvement ===
  
- * push algebraic functions into combiner, including algebraic UDFs, DISTINCT, 
and other items
+    * push algebraic functions into combiner, including algebraic UDFs, 
DISTINCT, and other items
  
  === Low hanging fruit ===
  
- * System-R optimizer heuristics:
+    * System-R optimizer heuristics:
-    * push projections (move them earlier in the plan)
+       * push projections (move them earlier in the plan)
-    * push cheap filters (move filters known to be cheap, e.g. ones with 
simple logic predicates, earlier in the plan)
+       * push cheap filters (move filters known to be cheap, e.g. ones with 
simple logic predicates, earlier in the plan)
-    * eliminate cartesian products when possible, e.g. convert CROSS followed 
by FILTER into JOIN
+       * eliminate cartesian products when possible, e.g. convert CROSS 
followed by FILTER into JOIN
  
  === Medium hanging fruit ===
  
- * look for ways to do multiple group/cogroup/join operations in a single 
map-reduce job --- this would occur if the keys share a common prefix. Example: 
group by userid+hour, then count, then group by userid, then take max --- can 
be done in one map-reduce job with userid as the reduce key.
+    * look for ways to do multiple group/cogroup/join operations in a single 
map-reduce job --- this would occur if the keys share a common prefix. Example: 
group by userid+hour, then count, then group by userid, then take max --- can 
be done in one map-reduce job with userid as the reduce key.
- * choose a join strategy (symmetric hashing, fragment-and-replicate, ...); 
can probably make a reasonable choice based on file sizes [but first, we have 
to implement various join strategies in the execution layer -- currently pig 
only supports symmetric hashing]
+    * choose a join strategy (symmetric hashing, fragment-and-replicate, ...); 
can probably make a reasonable choice based on file sizes [but first, we have 
to implement various join strategies in the execution layer -- currently pig 
only supports symmetric hashing]
  
  === Probably won't get there, and may not even want to go there ===
  
- * query optimization techniques found in any database textbook
+    * query optimization techniques found in any database textbook
-    * reordering filters (need to estimate selectivity based on histograms, or 
maybe adaptively reorder)
+       * reordering filters (need to estimate selectivity based on histograms, 
or maybe adaptively reorder)
-    * ...
+       * ...
  

Reply via email to