Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by OlgaN:
http://wiki.apache.org/pig/JoinFramework

------------------------------------------------------------------------------
  set optimizations.reorder 'off'
  }}}
  
- Also, a user should be able to specify a particular type of join to perform 
even if it contradicts with the choices that would be made by the optimizer. To 
support this we would need to extend the `JOIN` keyword to support outter joins 
and also to support JOIN type.
+ Also, a user should be able to specify a particular type of join to perform 
even if it contradicts with the choices that would be made by the optimizer. To 
support this we would need to extend the `JOIN` keyword to support outer joins 
and also to support JOIN type.
  
  {{{
  C = JOIN A by name, B by name USING <JOIN TYPE>;
@@ -75, +75 @@

  
  The JOIN TYPE is a string that represents a type of a join like 
`partitioned`, `ordered partitioned` `indexed`, `replicated`, etc.
  
+ For FRJ join type, the order of the tables might have to imply which table is 
to be replicated. For IJ, the index metadata will need to be supplied.
+ 
  === No Metadata Available ===
  
- If no external meta data is available, the user would need to provide 
additional information to help the optimizer to make good choice. The user 
could also explicitely specify the join TYPE to use as shown above.
+ If no external meta data is available, the user would need to provide 
additional information to help the optimizer to make good choice. The user 
could also explicitly specify the join TYPE to use as shown above.
  
  For PPJ, the needed meta data is partition key and sort key. This information 
can be provided by extending `LOAD` statement
  
@@ -95, +97 @@

  
  Question: how far do we want to take that? If we have one table that is 
partitioned and the other one that is both partitioned and ordered and the 
third one that is neither - do we want to come up with name or do we require 
metadata specification in this case? I think we should require metadata.
  
+ For FRJ, we don't really need additional metadata for now. We can just work 
of input data sizes. Eventually, it would be nice to estimate the actual amount 
of data going into join as oppose to just input sizes but for that we would 
need data statistics. For this particular type, explicit user specification via 
join type might be useful.
+ 
+ For indexed join, user needs to provide index information including index key 
and how to find the index. Not going to elaborate here since we are ways of 
from supporting this.
+ 

Reply via email to