[GitHub] [spark] maropu commented on a change in pull request #25710: [SPARK-29008][SQL] Define an individual method for each common subexpression in HashAggregateExec
maropu commented on a change in pull request #25710: [SPARK-29008][SQL] Define an individual method for each common subexpression in HashAggregateExec URL: https://github.com/apache/spark/pull/25710#discussion_r323718636 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala ## @@ -419,4 +419,27 @@ class WholeStageCodegenSuite extends QueryTest with SharedSparkSession { } } } + + test("Give up splitting subexpression code if a parameter length goes over the limit") { +withSQLConf( +SQLConf.CODEGEN_SPLIT_AGGREGATE_FUNC.key -> "false", Review comment: Yea, we need to. If that flag is true, `HashAggregateExec` throws an exception in this test: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L327 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25710: [SPARK-29008][SQL] Define an individual method for each common subexpression in HashAggregateExec
maropu commented on a change in pull request #25710: [SPARK-29008][SQL] Define an individual method for each common subexpression in HashAggregateExec URL: https://github.com/apache/spark/pull/25710#discussion_r322693867 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ## @@ -1028,13 +1028,67 @@ class CodegenContext { // Get all the expressions that appear at least twice and set up the state for subexpression // elimination. val commonExprs = equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1) -val codes = commonExprs.map { e => - val expr = e.head - // Generate the code for this expression tree. - val eval = expr.genCode(this) - val state = SubExprEliminationState(eval.isNull, eval.value) - e.foreach(localSubExprEliminationExprs.put(_, state)) - eval.code.toString +val commonExprVals = commonExprs.map(_.head.genCode(this)) + +lazy val nonSplitExprCode = { + commonExprs.zip(commonExprVals).map { case (exprs, eval) => +// Generate the code for this expression tree. +val state = SubExprEliminationState(eval.isNull, eval.value) +exprs.foreach(localSubExprEliminationExprs.put(_, state)) +eval.code.toString + } +} + +val codes = if (commonExprVals.map(_.code.length).sum > SQLConf.get.methodSplitThreshold) { + if (commonExprs.map(calculateParamLength).forall(isValidParamLength)) { +commonExprs.zipWithIndex.map { case (exprs, i) => + val expr = exprs.head + val eval = commonExprVals(i) + + val isNullLiteral = eval.isNull match { +case TrueLiteral | FalseLiteral => true +case _ => false + } + val (isNull, isNullEvalCode) = if (!isNullLiteral) { +val v = addMutableState(JAVA_BOOLEAN, "subExprIsNull") +(JavaCode.isNullGlobal(v), s"$v = ${eval.isNull};") + } else { +(eval.isNull, "") + } + + // Generate the code for this expression tree and wrap it in a function. + val fnName = freshName("subExpr") + val inputVars = getLocalInputVariableValues(this, expr).toSeq + val argList = inputVars.map(v => s"${v.javaType.getName} ${v.variableName}") + val returnType = javaType(expr.dataType) + val fn = +s""" + |private $returnType $fnName(${argList.mkString(", ")}) { + | ${eval.code} + | $isNullEvalCode + | return ${eval.value}; + |} + """.stripMargin + + val value = freshName("subExprValue") + val state = SubExprEliminationState(isNull, JavaCode.variable(value, expr.dataType)) Review comment: Yea, I see. But, I just want add more pressure on the constant pool WDYT? @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25710: [SPARK-29008][SQL] Define an individual method for each common subexpression in HashAggregateExec
maropu commented on a change in pull request #25710: [SPARK-29008][SQL] Define an individual method for each common subexpression in HashAggregateExec URL: https://github.com/apache/spark/pull/25710#discussion_r321991542 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ## @@ -1028,13 +1028,67 @@ class CodegenContext { // Get all the expressions that appear at least twice and set up the state for subexpression // elimination. val commonExprs = equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1) -val codes = commonExprs.map { e => - val expr = e.head - // Generate the code for this expression tree. - val eval = expr.genCode(this) - val state = SubExprEliminationState(eval.isNull, eval.value) - e.foreach(localSubExprEliminationExprs.put(_, state)) - eval.code.toString +val commonExprVals = commonExprs.map(_.head.genCode(this)) + +lazy val nonSplitExprCode = { + commonExprs.zip(commonExprVals).map { case (exprs, eval) => +// Generate the code for this expression tree. +val state = SubExprEliminationState(eval.isNull, eval.value) +exprs.foreach(localSubExprEliminationExprs.put(_, state)) +eval.code.toString + } +} + +val codes = if (commonExprVals.map(_.code.length).sum > SQLConf.get.methodSplitThreshold) { + if (commonExprs.map(calculateParamLength).forall(isValidParamLength)) { +commonExprs.zipWithIndex.map { case (exprs, i) => + val expr = exprs.head + val eval = commonExprVals(i) + + val isNullLiteral = eval.isNull match { +case TrueLiteral | FalseLiteral => true +case _ => false + } + val (isNull, isNullEvalCode) = if (!isNullLiteral) { +val v = addMutableState(JAVA_BOOLEAN, "subExprIsNull") +(JavaCode.isNullGlobal(v), s"$v = ${eval.isNull};") + } else { +(eval.isNull, "") + } + + // Generate the code for this expression tree and wrap it in a function. + val fnName = freshName("subExpr") + val inputVars = getLocalInputVariableValues(this, expr).toSeq + val argList = inputVars.map(v => s"${v.javaType.getName} ${v.variableName}") + val returnType = javaType(expr.dataType) + val fn = +s""" + |private $returnType $fnName(${argList.mkString(", ")}) { + | ${eval.code} + | $isNullEvalCode + | return ${eval.value}; + |} + """.stripMargin Review comment: ISTM we might be able to apply the same change in https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L1060-L1069 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25710: [SPARK-29008][SQL] Define an individual method for each common subexpression in HashAggregateExec
maropu commented on a change in pull request #25710: [SPARK-29008][SQL] Define an individual method for each common subexpression in HashAggregateExec URL: https://github.com/apache/spark/pull/25710#discussion_r321991542 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ## @@ -1028,13 +1028,67 @@ class CodegenContext { // Get all the expressions that appear at least twice and set up the state for subexpression // elimination. val commonExprs = equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1) -val codes = commonExprs.map { e => - val expr = e.head - // Generate the code for this expression tree. - val eval = expr.genCode(this) - val state = SubExprEliminationState(eval.isNull, eval.value) - e.foreach(localSubExprEliminationExprs.put(_, state)) - eval.code.toString +val commonExprVals = commonExprs.map(_.head.genCode(this)) + +lazy val nonSplitExprCode = { + commonExprs.zip(commonExprVals).map { case (exprs, eval) => +// Generate the code for this expression tree. +val state = SubExprEliminationState(eval.isNull, eval.value) +exprs.foreach(localSubExprEliminationExprs.put(_, state)) +eval.code.toString + } +} + +val codes = if (commonExprVals.map(_.code.length).sum > SQLConf.get.methodSplitThreshold) { + if (commonExprs.map(calculateParamLength).forall(isValidParamLength)) { +commonExprs.zipWithIndex.map { case (exprs, i) => + val expr = exprs.head + val eval = commonExprVals(i) + + val isNullLiteral = eval.isNull match { +case TrueLiteral | FalseLiteral => true +case _ => false + } + val (isNull, isNullEvalCode) = if (!isNullLiteral) { +val v = addMutableState(JAVA_BOOLEAN, "subExprIsNull") +(JavaCode.isNullGlobal(v), s"$v = ${eval.isNull};") + } else { +(eval.isNull, "") + } + + // Generate the code for this expression tree and wrap it in a function. + val fnName = freshName("subExpr") + val inputVars = getLocalInputVariableValues(this, expr).toSeq + val argList = inputVars.map(v => s"${v.javaType.getName} ${v.variableName}") + val returnType = javaType(expr.dataType) + val fn = +s""" + |private $returnType $fnName(${argList.mkString(", ")}) { + | ${eval.code} + | $isNullEvalCode + | return ${eval.value}; + |} + """.stripMargin Review comment: We can apply the same change in https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L1060-L1069 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25710: [SPARK-29008][SQL] Define an individual method for each common subexpression in HashAggregateExec
maropu commented on a change in pull request #25710: [SPARK-29008][SQL] Define an individual method for each common subexpression in HashAggregateExec URL: https://github.com/apache/spark/pull/25710#discussion_r321931750 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ## @@ -1028,13 +1028,57 @@ class CodegenContext { // Get all the expressions that appear at least twice and set up the state for subexpression // elimination. val commonExprs = equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1) -val codes = commonExprs.map { e => - val expr = e.head - // Generate the code for this expression tree. - val eval = expr.genCode(this) - val state = SubExprEliminationState(eval.isNull, eval.value) - e.foreach(localSubExprEliminationExprs.put(_, state)) - eval.code.toString +val commonExprVals = commonExprs.map(_.head.genCode(this)) + +lazy val nonSplitExprCode = { + commonExprs.zip(commonExprVals).map { case (exprs, eval) => +// Generate the code for this expression tree. +val state = SubExprEliminationState(eval.isNull, eval.value) +exprs.foreach(localSubExprEliminationExprs.put(_, state)) +eval.code.toString + } +} + +val codes = if (commonExprVals.map(_.code.length).sum > SQLConf.get.methodSplitThreshold) { + if (commonExprs.map(calculateParamLength).forall(isValidParamLength)) { +commonExprs.zipWithIndex.map { case (exprs, i) => + val expr = exprs.head + val eval = commonExprVals(i) + val inputVars = getLocalInputVariableValues(this, expr).toSeq + val fnName = freshName("subExpr") + val isNull = addMutableState(JAVA_BOOLEAN, "subExprIsNull") + val value = addMutableState(javaType(expr.dataType), "subExprValue") + + // Generate the code for this expression tree and wrap it in a function. + val argList = inputVars.map(v => s"${v.javaType.getName} ${v.variableName}") + val fn = +s""" + |private void $fnName(${argList.mkString(", ")}) { + | ${eval.code} + | $isNull = ${eval.isNull}; + | $value = ${eval.value}; + |} + """.stripMargin + + val state = SubExprEliminationState( +JavaCode.isNullGlobal(isNull), JavaCode.global(value, expr.dataType)) + exprs.foreach(localSubExprEliminationExprs.put(_, state)) + val inputVariables = inputVars.map(_.variableName).mkString(", ") + s"${addNewFunction(fnName, fn)}($inputVariables);" Review comment: I think we might be able to split more as @mgaido91 did so in #25642. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #25710: [SPARK-29008][SQL] Define an individual method for each common subexpression in HashAggregateExec
maropu commented on a change in pull request #25710: [SPARK-29008][SQL] Define an individual method for each common subexpression in HashAggregateExec URL: https://github.com/apache/spark/pull/25710#discussion_r321931750 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala ## @@ -1028,13 +1028,57 @@ class CodegenContext { // Get all the expressions that appear at least twice and set up the state for subexpression // elimination. val commonExprs = equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1) -val codes = commonExprs.map { e => - val expr = e.head - // Generate the code for this expression tree. - val eval = expr.genCode(this) - val state = SubExprEliminationState(eval.isNull, eval.value) - e.foreach(localSubExprEliminationExprs.put(_, state)) - eval.code.toString +val commonExprVals = commonExprs.map(_.head.genCode(this)) + +lazy val nonSplitExprCode = { + commonExprs.zip(commonExprVals).map { case (exprs, eval) => +// Generate the code for this expression tree. +val state = SubExprEliminationState(eval.isNull, eval.value) +exprs.foreach(localSubExprEliminationExprs.put(_, state)) +eval.code.toString + } +} + +val codes = if (commonExprVals.map(_.code.length).sum > SQLConf.get.methodSplitThreshold) { + if (commonExprs.map(calculateParamLength).forall(isValidParamLength)) { +commonExprs.zipWithIndex.map { case (exprs, i) => + val expr = exprs.head + val eval = commonExprVals(i) + val inputVars = getLocalInputVariableValues(this, expr).toSeq + val fnName = freshName("subExpr") + val isNull = addMutableState(JAVA_BOOLEAN, "subExprIsNull") + val value = addMutableState(javaType(expr.dataType), "subExprValue") + + // Generate the code for this expression tree and wrap it in a function. + val argList = inputVars.map(v => s"${v.javaType.getName} ${v.variableName}") + val fn = +s""" + |private void $fnName(${argList.mkString(", ")}) { + | ${eval.code} + | $isNull = ${eval.isNull}; + | $value = ${eval.value}; + |} + """.stripMargin + + val state = SubExprEliminationState( +JavaCode.isNullGlobal(isNull), JavaCode.global(value, expr.dataType)) + exprs.foreach(localSubExprEliminationExprs.put(_, state)) + val inputVariables = inputVars.map(_.variableName).mkString(", ") + s"${addNewFunction(fnName, fn)}($inputVariables);" Review comment: I think we might be able to split more as @mgaido91 did in #25642. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org