Github user ron8hu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/15363#discussion_r106084556
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
 ---
    @@ -51,6 +51,11 @@ case class CostBasedJoinReorder(conf: CatalystConf) 
extends Rule[LogicalPlan] wi
     
       def reorder(plan: LogicalPlan, output: AttributeSet): LogicalPlan = {
         val (items, conditions) = extractInnerJoins(plan)
    +    // Find the star schema joins. Currently, it returns the star join 
with the largest
    +    // fact table. In the future, it can return more than one star join 
(e.g. F1-D1-D2
    +    // and F2-D3-D4).
    +    val starJoinPlans = StarSchemaDetection(conf).findStarJoins(items, 
conditions.toSeq)
    --- End diff --
    
    As discussed earlier, we only need to perform join reorder algorithm once.
    
    CostBasedJoinReorder implemented Dynamic Programming algorithm published in 
the classic paper
    "Access Path Selection in a relational database system" by Patricia 
Selinger.  The same algorithm was used in PostgreSQL.  To my understanding, it 
is a generic algorithm that can work on both star schema and non-star schema.  
For example, it is capable to generate a bushy tree if it is optimal.  That is 
it is not limited to left-deep tree only.
    
    I suggest that we identify the strength of the star join reorder algorithm 
and it can help solve the
    deficiency of the dynamic programming algorithm.  Then we add the necessary 
code to address the deficiency.  There is no need to add code that does the 
same job twice without added value.  
    
    Perhaps running TPC-ds benchmark queries and inspecting the generated query 
plan can help us identify the strength and weakness of both algorithms.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to