j-esse commented on issue #23556: [SPARK-26626][SQL] Maximum size for 
repeatedly substituted aliases in SQL expressions
URL: https://github.com/apache/spark/pull/23556#issuecomment-456285994
 
 
   That's right, the primary concern is memory usage, since the exponential 
increase in memory usage currently causes crashes (due to OOMs), time outs, and 
performance issues.
   
   
   Jesse
   
   ________________________________
   From: Wenchen Fan <notificati...@github.com>
   Sent: Monday, January 21, 2019 9:01:37 PM
   To: apache/spark
   Cc: Jesse Rickard; Author
   Subject: Re: [apache/spark] [SPARK-26626][SQL] Maximum size for repeatedly 
substituted aliases in SQL expressions (#23556)
   
   
   @cloud-fan commented on this pull request.
   
   ________________________________
   
   In 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 
[github.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_spark_pull_23556-23discussion-5Fr249641907&d=DwMCaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=skMh9xHI2SIMonIv-P1W_QMQiiFjKK0OmQnraUo_7JI&m=3REVtlKn0paUzCSKJPrweL6ST1a_EBgoDg_GChQBe7c&s=XTT41b4oenolrX2EOESJyIA6qO6khF8yNipqfSAJVzg&e=>:
   
   > +
   +    // Count how many times each alias is used in the upper Project.
   +    // If an alias is only used once, we can safely substitute it without 
increasing the overall
   +    // tree size
   +    val referenceCounts = AttributeMap(
   +      upper
   +        .flatMap(_.collect { case a: Attribute => a })
   +        .groupBy(identity)
   +        .mapValues(_.size).toSeq
   +    )
   +
   +    // Check for any aliases that are used more than once, and are larger 
than the configured
   +    // maximum size
   +    aliases.exists({ case (attribute, expression) =>
   +      referenceCounts.getOrElse(attribute, 0) > 1 &&
   +        expression.treeSize > SQLConf.get.maxRepeatedAliasSize
   
   
   so your fix only care about memory usage of the expressions, instead of 
execution time?
   
   —
   You are receiving this because you authored the thread.
   Reply to this email directly, view it on GitHub 
[github.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_spark_pull_23556-23discussion-5Fr249641907&d=DwMCaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=skMh9xHI2SIMonIv-P1W_QMQiiFjKK0OmQnraUo_7JI&m=3REVtlKn0paUzCSKJPrweL6ST1a_EBgoDg_GChQBe7c&s=XTT41b4oenolrX2EOESJyIA6qO6khF8yNipqfSAJVzg&e=>,
 or mute the thread 
[github.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AQq8Ae5js2laDyGz2k7oNBSR3s7cu4Wzks5vFpsxgaJpZM4aBo0s&d=DwMCaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=skMh9xHI2SIMonIv-P1W_QMQiiFjKK0OmQnraUo_7JI&m=3REVtlKn0paUzCSKJPrweL6ST1a_EBgoDg_GChQBe7c&s=H-EzUExFQDI9IGa-zPAAxX6eJlfw6lPFhnl45NGJ6uM&e=>.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to