[GitHub] spark pull request #13505: [SPARK-15764][SQL] Replace N^2 loop in BindRefere...

JoshRosen Fri, 03 Jun 2016 18:15:51 -0700

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13505#discussion_r65794395
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala 
---
    @@ -296,7 +296,7 @@ abstract class QueryPlan[PlanType <: 
QueryPlan[PlanType]] extends TreeNode[PlanT
       /**
        * All the attributes that are used for this plan.
        */
    -  lazy val allAttributes: Seq[Attribute] = children.flatMap(_.output)
    +  lazy val allAttributes: AttributeSeq = children.flatMap(_.output)
    --- End diff --
    
    @ericl and I found another layer of polynomial looping: in 
QueryPlan.cleanArgs we take every expression in the query plan and bind its 
references against `allAttributes`, which can be huge. If we  turn this into an 
`AttributeSeq` once and build the map inside of that wrapper then we amortize 
that cost and remove this expensive loop.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #13505: [SPARK-15764][SQL] Replace N^2 loop in BindRefere...

Reply via email to