Re: [PR] [SPARK-51114] [SQL] Refactor PullOutNondeterministic rule [spark]
cloud-fan closed pull request #49837: [SPARK-51114] [SQL] Refactor PullOutNondeterministic rule URL: https://github.com/apache/spark/pull/49837 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51114] [SQL] Refactor PullOutNondeterministic rule [spark]
cloud-fan commented on PR #49837: URL: https://github.com/apache/spark/pull/49837#issuecomment-2652492679 The AQE test failure is unrelated, thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51114] [SQL] Refactor PullOutNondeterministic rule [spark]
the-sakthi commented on PR #49837: URL: https://github.com/apache/spark/pull/49837#issuecomment-2651800069 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51114] [SQL] Refactor PullOutNondeterministic rule [spark]
mihailoale-db commented on code in PR #49837: URL: https://github.com/apache/spark/pull/49837#discussion_r1950859900 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/PullOutNondeterministic.scala: ## @@ -51,27 +52,12 @@ object PullOutNondeterministic extends Rule[LogicalPlan] { // from LogicalPlan, currently we only do it for UnaryNode which has same output // schema with its child. case p: UnaryNode if p.output == p.child.output && p.expressions.exists(!_.deterministic) => - val nondeterToAttr = getNondeterToAttr(p.expressions) + val nondeterToAttr = + NondeterministicExpressionCollection.getNondeterministicToAttributes(p.expressions) val newPlan = p.transformExpressions { case e => nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e) } val newChild = Project(p.child.output ++ nondeterToAttr.values, p.child) Project(p.output, newPlan.withNewChildren(newChild :: Nil)) } - - private def getNondeterToAttr(exprs: Seq[Expression]): Map[Expression, NamedExpression] = { -exprs.filterNot(_.deterministic).flatMap { expr => - val leafNondeterministic = expr.collect { -case n: Nondeterministic => n -case udf: UserDefinedExpression if !udf.deterministic => udf - } - leafNondeterministic.distinct.map { e => -val ne = e match { - case n: NamedExpression => n - case _ => Alias(e, "_nondeterministic")() -} -e -> ne - } -}.toMap Review Comment: Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51114] [SQL] Refactor PullOutNondeterministic rule [spark]
cloud-fan commented on code in PR #49837: URL: https://github.com/apache/spark/pull/49837#discussion_r1950851713 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/PullOutNondeterministic.scala: ## @@ -51,27 +52,12 @@ object PullOutNondeterministic extends Rule[LogicalPlan] { // from LogicalPlan, currently we only do it for UnaryNode which has same output // schema with its child. case p: UnaryNode if p.output == p.child.output && p.expressions.exists(!_.deterministic) => - val nondeterToAttr = getNondeterToAttr(p.expressions) + val nondeterToAttr = + NondeterministicExpressionCollection.getNondeterministicToAttributes(p.expressions) val newPlan = p.transformExpressions { case e => nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e) } val newChild = Project(p.child.output ++ nondeterToAttr.values, p.child) Project(p.output, newPlan.withNewChildren(newChild :: Nil)) } - - private def getNondeterToAttr(exprs: Seq[Expression]): Map[Expression, NamedExpression] = { -exprs.filterNot(_.deterministic).flatMap { expr => - val leafNondeterministic = expr.collect { -case n: Nondeterministic => n -case udf: UserDefinedExpression if !udf.deterministic => udf - } - leafNondeterministic.distinct.map { e => -val ne = e match { - case n: NamedExpression => n - case _ => Alias(e, "_nondeterministic")() -} -e -> ne - } -}.toMap Review Comment: shall we also keep the previous code of creating an immutable map? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51114] [SQL] Refactor PullOutNondeterministic rule [spark]
mihailoale-db commented on code in PR #49837: URL: https://github.com/apache/spark/pull/49837#discussion_r1950496924 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/PullOutNondeterministic.scala: ## @@ -34,10 +36,14 @@ object PullOutNondeterministic extends Rule[LogicalPlan] { case f: Filter => f case a: Aggregate if a.groupingExpressions.exists(!_.deterministic) => - val nondeterToAttr = getNondeterToAttr(a.groupingExpressions) - val newChild = Project(a.child.output ++ nondeterToAttr.values, a.child) + val nondeterToAttr = + NondeterministicExpressionCollection.getNondeterministicToAttributes(a.groupingExpressions) + val newChild = Project(a.child.output ++ nondeterToAttr.values.asScala.toSeq, a.child) a.transformExpressions { case e => -nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e) Review Comment: Please check this comment https://github.com/apache/spark/pull/49837#discussion_r1949481697 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51114] [SQL] Refactor PullOutNondeterministic rule [spark]
cloud-fan commented on code in PR #49837: URL: https://github.com/apache/spark/pull/49837#discussion_r1950410685 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/PullOutNondeterministic.scala: ## @@ -34,10 +36,14 @@ object PullOutNondeterministic extends Rule[LogicalPlan] { case f: Filter => f case a: Aggregate if a.groupingExpressions.exists(!_.deterministic) => - val nondeterToAttr = getNondeterToAttr(a.groupingExpressions) - val newChild = Project(a.child.output ++ nondeterToAttr.values, a.child) + val nondeterToAttr = + NondeterministicExpressionCollection.getNondeterministicToAttributes(a.groupingExpressions) + val newChild = Project(a.child.output ++ nondeterToAttr.values.asScala.toSeq, a.child) Review Comment: +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51114] [SQL] Refactor PullOutNondeterministic rule [spark]
cloud-fan commented on code in PR #49837: URL: https://github.com/apache/spark/pull/49837#discussion_r1950410321 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/PullOutNondeterministic.scala: ## @@ -34,10 +36,14 @@ object PullOutNondeterministic extends Rule[LogicalPlan] { case f: Filter => f case a: Aggregate if a.groupingExpressions.exists(!_.deterministic) => - val nondeterToAttr = getNondeterToAttr(a.groupingExpressions) - val newChild = Project(a.child.output ++ nondeterToAttr.values, a.child) + val nondeterToAttr = + NondeterministicExpressionCollection.getNondeterministicToAttributes(a.groupingExpressions) + val newChild = Project(a.child.output ++ nondeterToAttr.values.asScala.toSeq, a.child) a.transformExpressions { case e => -nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e) Review Comment: It looks the previous code is simpler than calling a shared method. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51114] [SQL] Refactor PullOutNondeterministic rule [spark]
mihailoale-db commented on code in PR #49837: URL: https://github.com/apache/spark/pull/49837#discussion_r1949484325 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/NondeterministicExpressionCollection.scala: ## @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import java.util.HashMap + +import org.apache.spark.sql.catalyst.expressions._ + +object NondeterministicExpressionCollection { + def getNondeterministicToAttributes( + expressions: Seq[Expression]): HashMap[Expression, NamedExpression] = { +val nondeterministicToAttributes: HashMap[Expression, NamedExpression] = + new HashMap[Expression, NamedExpression] +expressions.filterNot(_.deterministic).foreach { expression => Review Comment: Done. ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/NondeterministicExpressionCollection.scala: ## @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import java.util.HashMap + +import org.apache.spark.sql.catalyst.expressions._ + +object NondeterministicExpressionCollection { + def getNondeterministicToAttributes( + expressions: Seq[Expression]): HashMap[Expression, NamedExpression] = { +val nondeterministicToAttributes: HashMap[Expression, NamedExpression] = + new HashMap[Expression, NamedExpression] +expressions.filterNot(_.deterministic).foreach { expression => + val leafNondeterministic = expression.collect { +case n: Nondeterministic => n +case udf: UserDefinedExpression if !udf.deterministic => udf + } + leafNondeterministic.distinct.foreach { +case n: NamedExpression => nondeterministicToAttributes.put(expression, n) +case other => + nondeterministicToAttributes.put(other, Alias(other, "_nondeterministic")()) + } +} +nondeterministicToAttributes + } + + def tryConvertNondeterministicToAttribute( Review Comment: `getOrDefault` can't be used here as default value is `Expression` type but `getOrDefault` expects a `NamedExpression`. I would leave it like it is ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/PullOutNondeterministic.scala: ## @@ -34,10 +36,14 @@ object PullOutNondeterministic extends Rule[LogicalPlan] { case f: Filter => f case a: Aggregate if a.groupingExpressions.exists(!_.deterministic) => - val nondeterToAttr = getNondeterToAttr(a.groupingExpressions) - val newChild = Project(a.child.output ++ nondeterToAttr.values, a.child) + val nondeterToAttr = + NondeterministicExpressionCollection.getNondeterministicToAttributes(a.groupingExpressions) + val newChild = Project(a.child.output ++ nondeterToAttr.values.asScala.toSeq, a.child) Review Comment: Yeah makes sense. Will make a followup after this one is merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma
Re: [PR] [SPARK-51114] [SQL] Refactor PullOutNondeterministic rule [spark]
vladimirg-db commented on code in PR #49837: URL: https://github.com/apache/spark/pull/49837#discussion_r1949191870 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/NondeterministicExpressionCollection.scala: ## @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import java.util.HashMap + +import org.apache.spark.sql.catalyst.expressions._ + +object NondeterministicExpressionCollection { + def getNondeterministicToAttributes( + expressions: Seq[Expression]): HashMap[Expression, NamedExpression] = { +val nondeterministicToAttributes: HashMap[Expression, NamedExpression] = + new HashMap[Expression, NamedExpression] +expressions.filterNot(_.deterministic).foreach { expression => + val leafNondeterministic = expression.collect { +case n: Nondeterministic => n +case udf: UserDefinedExpression if !udf.deterministic => udf + } + leafNondeterministic.distinct.foreach { +case n: NamedExpression => nondeterministicToAttributes.put(expression, n) Review Comment: ```suggestion case n: NamedExpression => nondeterministicToAttributes.put(expression, n) ``` ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/NondeterministicExpressionCollection.scala: ## @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import java.util.HashMap + +import org.apache.spark.sql.catalyst.expressions._ + +object NondeterministicExpressionCollection { + def getNondeterministicToAttributes( + expressions: Seq[Expression]): HashMap[Expression, NamedExpression] = { +val nondeterministicToAttributes: HashMap[Expression, NamedExpression] = + new HashMap[Expression, NamedExpression] +expressions.filterNot(_.deterministic).foreach { expression => Review Comment: We can do a simple `for` loop with an `if` check instead of `filterNot` to avoid copying a `Seq` ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/NondeterministicExpressionCollection.scala: ## @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import java.util.HashMap + +import org.apache.spark.sql.catalyst.expressions._ + +object NondeterministicExpressionCollection { + def getNondeterministicToAttributes( + expressions: Seq[Expression]): HashMap[Expression, NamedExpression] = { +val nondeterministicToAttributes: HashMap[Expression, NamedExpression] = + new HashMap[Expression, NamedExpress
Re: [PR] [SPARK-51114] [SQL] Refactor PullOutNondeterministic rule [spark]
mihailoale-db commented on PR #49837: URL: https://github.com/apache/spark/pull/49837#issuecomment-2648160532 Hi @MaxGekk could you please review when you have time? Thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51114] [SQL] Refactor PullOutNondeterministic rule [spark]
mihailoale-db commented on code in PR #49837: URL: https://github.com/apache/spark/pull/49837#discussion_r1948804599 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/PullOutNondeterministic.scala: ## @@ -51,27 +55,39 @@ object PullOutNondeterministic extends Rule[LogicalPlan] { // from LogicalPlan, currently we only do it for UnaryNode which has same output // schema with its child. case p: UnaryNode if p.output == p.child.output && p.expressions.exists(!_.deterministic) => - val nondeterToAttr = getNondeterToAttr(p.expressions) + val nondeterToAttr = getNondeterministicToAttributes(p.expressions) val newPlan = p.transformExpressions { case e => -nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e) +tryConvertNondeterministicToAttribute(e, nondeterToAttr) } - val newChild = Project(p.child.output ++ nondeterToAttr.values, p.child) + val newChild = Project(p.child.output ++ nondeterToAttr.values.asScala.toSeq, p.child) Project(p.output, newPlan.withNewChildren(newChild :: Nil)) } - private def getNondeterToAttr(exprs: Seq[Expression]): Map[Expression, NamedExpression] = { -exprs.filterNot(_.deterministic).flatMap { expr => - val leafNondeterministic = expr.collect { + def getNondeterministicToAttributes( Review Comment: Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51114] [SQL] Refactor PullOutNondeterministic rule [spark]
vladimirg-db commented on code in PR #49837: URL: https://github.com/apache/spark/pull/49837#discussion_r1945454956 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/PullOutNondeterministic.scala: ## @@ -51,27 +55,39 @@ object PullOutNondeterministic extends Rule[LogicalPlan] { // from LogicalPlan, currently we only do it for UnaryNode which has same output // schema with its child. case p: UnaryNode if p.output == p.child.output && p.expressions.exists(!_.deterministic) => - val nondeterToAttr = getNondeterToAttr(p.expressions) + val nondeterToAttr = getNondeterministicToAttributes(p.expressions) val newPlan = p.transformExpressions { case e => -nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e) +tryConvertNondeterministicToAttribute(e, nondeterToAttr) } - val newChild = Project(p.child.output ++ nondeterToAttr.values, p.child) + val newChild = Project(p.child.output ++ nondeterToAttr.values.asScala.toSeq, p.child) Project(p.output, newPlan.withNewChildren(newChild :: Nil)) } - private def getNondeterToAttr(exprs: Seq[Expression]): Map[Expression, NamedExpression] = { -exprs.filterNot(_.deterministic).flatMap { expr => - val leafNondeterministic = expr.collect { + def getNondeterministicToAttributes( Review Comment: Please move this to a separate `NondeterministicExpressionCollection` object (and a separate file). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-51114] [SQL] Refactor PullOutNondeterministic rule [spark]
vladimirg-db commented on PR #49837: URL: https://github.com/apache/spark/pull/49837#issuecomment-2641040587 Please specify that this is for single-pass Analyzer in the PR description. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org