[GitHub] spark pull request #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentatio...

cloud-fan Tue, 21 Mar 2017 04:44:59 -0700

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17353#discussion_r107133614
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/CostBasedJoinReorder.scala
 ---
    @@ -119,25 +120,28 @@ case class CostBasedJoinReorder(conf: SQLConf) 
extends Rule[LogicalPlan] with Pr
      * When building m-way joins, we only keep the best plan (with the lowest 
cost) for the same set
      * of m items. E.g., for 3-way joins, we keep only the best plan for items 
{A, B, C} among
      * plans (A J B) J C, (A J C) J B and (B J C) J A.
    - *
    - * Thus the plans maintained for each level when reordering four items A, 
B, C, D are as follows:
    + * We also prune cartesian product candidates when building a new plan if 
there exists no join
    + * condition involving references from both left and right. This pruning 
strategy significantly
    + * reduces the search space.
    + * For example, given A J B J C J D, plans maintained for each level will 
be as follows:
      * level 0: p({A}), p({B}), p({C}), p({D})
    - * level 1: p({A, B}), p({A, C}), p({A, D}), p({B, C}), p({B, D}), p({C, 
D})
    - * level 2: p({A, B, C}), p({A, B, D}), p({A, C, D}), p({B, C, D})
    + * level 1: p({A, B}), p({B, C}), p({C, D})
    + * level 2: p({A, B, C}), p({B, C, D})
      * level 3: p({A, B, C, D})
      * where p({A, B, C, D}) is the final output plan.
      *
      * For cost evaluation, since physical costs for operators are not 
available currently, we use
      * cardinalities and sizes to compute costs.
      */
    -object JoinReorderDP extends PredicateHelper {
    +object JoinReorderDP extends PredicateHelper with Logging {
     
       def search(
           conf: SQLConf,
           items: Seq[LogicalPlan],
           conditions: Set[Expression],
           topOutput: AttributeSet): LogicalPlan = {
     
    +    val startTime = System.nanoTime()
    --- End diff --
    
    use `System.currentTimeMillis` if we only care about the ms level.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17353: [SPARK-17080][SQL][FOLLOWUP] Improve documentatio...

Reply via email to