j-esse commented on issue #23556: [SPARK-26626][SQL] Maximum size for repeatedly substituted aliases in SQL expressions URL: https://github.com/apache/spark/pull/23556#issuecomment-456285994 That's right, the primary concern is memory usage, since the exponential increase in memory usage currently causes crashes (due to OOMs), time outs, and performance issues. Jesse ________________________________ From: Wenchen Fan <notificati...@github.com> Sent: Monday, January 21, 2019 9:01:37 PM To: apache/spark Cc: Jesse Rickard; Author Subject: Re: [apache/spark] [SPARK-26626][SQL] Maximum size for repeatedly substituted aliases in SQL expressions (#23556) @cloud-fan commented on this pull request. ________________________________ In sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala [github.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_spark_pull_23556-23discussion-5Fr249641907&d=DwMCaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=skMh9xHI2SIMonIv-P1W_QMQiiFjKK0OmQnraUo_7JI&m=3REVtlKn0paUzCSKJPrweL6ST1a_EBgoDg_GChQBe7c&s=XTT41b4oenolrX2EOESJyIA6qO6khF8yNipqfSAJVzg&e=>: > + + // Count how many times each alias is used in the upper Project. + // If an alias is only used once, we can safely substitute it without increasing the overall + // tree size + val referenceCounts = AttributeMap( + upper + .flatMap(_.collect { case a: Attribute => a }) + .groupBy(identity) + .mapValues(_.size).toSeq + ) + + // Check for any aliases that are used more than once, and are larger than the configured + // maximum size + aliases.exists({ case (attribute, expression) => + referenceCounts.getOrElse(attribute, 0) > 1 && + expression.treeSize > SQLConf.get.maxRepeatedAliasSize so your fix only care about memory usage of the expressions, instead of execution time? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub [github.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_spark_pull_23556-23discussion-5Fr249641907&d=DwMCaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=skMh9xHI2SIMonIv-P1W_QMQiiFjKK0OmQnraUo_7JI&m=3REVtlKn0paUzCSKJPrweL6ST1a_EBgoDg_GChQBe7c&s=XTT41b4oenolrX2EOESJyIA6qO6khF8yNipqfSAJVzg&e=>, or mute the thread [github.com]<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AQq8Ae5js2laDyGz2k7oNBSR3s7cu4Wzks5vFpsxgaJpZM4aBo0s&d=DwMCaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=skMh9xHI2SIMonIv-P1W_QMQiiFjKK0OmQnraUo_7JI&m=3REVtlKn0paUzCSKJPrweL6ST1a_EBgoDg_GChQBe7c&s=H-EzUExFQDI9IGa-zPAAxX6eJlfw6lPFhnl45NGJ6uM&e=>.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org