[GitHub] [spark] maropu commented on a change in pull request #25710: [SPARK-29008][SQL] Define an individual method for each common subexpression in HashAggregateExec

2019-09-12 Thread GitBox
maropu commented on a change in pull request #25710: [SPARK-29008][SQL] Define 
an individual method for each common subexpression in HashAggregateExec
URL: https://github.com/apache/spark/pull/25710#discussion_r323718636
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/WholeStageCodegenSuite.scala
 ##
 @@ -419,4 +419,27 @@ class WholeStageCodegenSuite extends QueryTest with 
SharedSparkSession {
   }
 }
   }
+
+  test("Give up splitting subexpression code if a parameter length goes over 
the limit") {
+withSQLConf(
+SQLConf.CODEGEN_SPLIT_AGGREGATE_FUNC.key -> "false",
 
 Review comment:
   Yea, we need to. If that flag is true, `HashAggregateExec` throws an 
exception in this test: 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala#L327


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #25710: [SPARK-29008][SQL] Define an individual method for each common subexpression in HashAggregateExec

2019-09-10 Thread GitBox
maropu commented on a change in pull request #25710: [SPARK-29008][SQL] Define 
an individual method for each common subexpression in HashAggregateExec
URL: https://github.com/apache/spark/pull/25710#discussion_r322693867
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ##
 @@ -1028,13 +1028,67 @@ class CodegenContext {
 // Get all the expressions that appear at least twice and set up the state 
for subexpression
 // elimination.
 val commonExprs = 
equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1)
-val codes = commonExprs.map { e =>
-  val expr = e.head
-  // Generate the code for this expression tree.
-  val eval = expr.genCode(this)
-  val state = SubExprEliminationState(eval.isNull, eval.value)
-  e.foreach(localSubExprEliminationExprs.put(_, state))
-  eval.code.toString
+val commonExprVals = commonExprs.map(_.head.genCode(this))
+
+lazy val nonSplitExprCode = {
+  commonExprs.zip(commonExprVals).map { case (exprs, eval) =>
+// Generate the code for this expression tree.
+val state = SubExprEliminationState(eval.isNull, eval.value)
+exprs.foreach(localSubExprEliminationExprs.put(_, state))
+eval.code.toString
+  }
+}
+
+val codes = if (commonExprVals.map(_.code.length).sum > 
SQLConf.get.methodSplitThreshold) {
+  if (commonExprs.map(calculateParamLength).forall(isValidParamLength)) {
+commonExprs.zipWithIndex.map { case (exprs, i) =>
+  val expr = exprs.head
+  val eval = commonExprVals(i)
+
+  val isNullLiteral = eval.isNull match {
+case TrueLiteral | FalseLiteral => true
+case _ => false
+  }
+  val (isNull, isNullEvalCode) = if (!isNullLiteral) {
+val v = addMutableState(JAVA_BOOLEAN, "subExprIsNull")
+(JavaCode.isNullGlobal(v), s"$v = ${eval.isNull};")
+  } else {
+(eval.isNull, "")
+  }
+
+  // Generate the code for this expression tree and wrap it in a 
function.
+  val fnName = freshName("subExpr")
+  val inputVars = getLocalInputVariableValues(this, expr).toSeq
+  val argList = inputVars.map(v => s"${v.javaType.getName} 
${v.variableName}")
+  val returnType = javaType(expr.dataType)
+  val fn =
+s"""
+   |private $returnType $fnName(${argList.mkString(", ")}) {
+   |  ${eval.code}
+   |  $isNullEvalCode
+   |  return ${eval.value};
+   |}
+   """.stripMargin
+
+  val value = freshName("subExprValue")
+  val state = SubExprEliminationState(isNull, JavaCode.variable(value, 
expr.dataType))
 
 Review comment:
   Yea, I see. But, I just want add more pressure on the constant pool 
WDYT? @cloud-fan 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #25710: [SPARK-29008][SQL] Define an individual method for each common subexpression in HashAggregateExec

2019-09-07 Thread GitBox
maropu commented on a change in pull request #25710: [SPARK-29008][SQL] Define 
an individual method for each common subexpression in HashAggregateExec
URL: https://github.com/apache/spark/pull/25710#discussion_r321991542
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ##
 @@ -1028,13 +1028,67 @@ class CodegenContext {
 // Get all the expressions that appear at least twice and set up the state 
for subexpression
 // elimination.
 val commonExprs = 
equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1)
-val codes = commonExprs.map { e =>
-  val expr = e.head
-  // Generate the code for this expression tree.
-  val eval = expr.genCode(this)
-  val state = SubExprEliminationState(eval.isNull, eval.value)
-  e.foreach(localSubExprEliminationExprs.put(_, state))
-  eval.code.toString
+val commonExprVals = commonExprs.map(_.head.genCode(this))
+
+lazy val nonSplitExprCode = {
+  commonExprs.zip(commonExprVals).map { case (exprs, eval) =>
+// Generate the code for this expression tree.
+val state = SubExprEliminationState(eval.isNull, eval.value)
+exprs.foreach(localSubExprEliminationExprs.put(_, state))
+eval.code.toString
+  }
+}
+
+val codes = if (commonExprVals.map(_.code.length).sum > 
SQLConf.get.methodSplitThreshold) {
+  if (commonExprs.map(calculateParamLength).forall(isValidParamLength)) {
+commonExprs.zipWithIndex.map { case (exprs, i) =>
+  val expr = exprs.head
+  val eval = commonExprVals(i)
+
+  val isNullLiteral = eval.isNull match {
+case TrueLiteral | FalseLiteral => true
+case _ => false
+  }
+  val (isNull, isNullEvalCode) = if (!isNullLiteral) {
+val v = addMutableState(JAVA_BOOLEAN, "subExprIsNull")
+(JavaCode.isNullGlobal(v), s"$v = ${eval.isNull};")
+  } else {
+(eval.isNull, "")
+  }
+
+  // Generate the code for this expression tree and wrap it in a 
function.
+  val fnName = freshName("subExpr")
+  val inputVars = getLocalInputVariableValues(this, expr).toSeq
+  val argList = inputVars.map(v => s"${v.javaType.getName} 
${v.variableName}")
+  val returnType = javaType(expr.dataType)
+  val fn =
+s"""
+   |private $returnType $fnName(${argList.mkString(", ")}) {
+   |  ${eval.code}
+   |  $isNullEvalCode
+   |  return ${eval.value};
+   |}
+   """.stripMargin
 
 Review comment:
   ISTM we might be able to apply the same change in 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L1060-L1069


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #25710: [SPARK-29008][SQL] Define an individual method for each common subexpression in HashAggregateExec

2019-09-07 Thread GitBox
maropu commented on a change in pull request #25710: [SPARK-29008][SQL] Define 
an individual method for each common subexpression in HashAggregateExec
URL: https://github.com/apache/spark/pull/25710#discussion_r321991542
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ##
 @@ -1028,13 +1028,67 @@ class CodegenContext {
 // Get all the expressions that appear at least twice and set up the state 
for subexpression
 // elimination.
 val commonExprs = 
equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1)
-val codes = commonExprs.map { e =>
-  val expr = e.head
-  // Generate the code for this expression tree.
-  val eval = expr.genCode(this)
-  val state = SubExprEliminationState(eval.isNull, eval.value)
-  e.foreach(localSubExprEliminationExprs.put(_, state))
-  eval.code.toString
+val commonExprVals = commonExprs.map(_.head.genCode(this))
+
+lazy val nonSplitExprCode = {
+  commonExprs.zip(commonExprVals).map { case (exprs, eval) =>
+// Generate the code for this expression tree.
+val state = SubExprEliminationState(eval.isNull, eval.value)
+exprs.foreach(localSubExprEliminationExprs.put(_, state))
+eval.code.toString
+  }
+}
+
+val codes = if (commonExprVals.map(_.code.length).sum > 
SQLConf.get.methodSplitThreshold) {
+  if (commonExprs.map(calculateParamLength).forall(isValidParamLength)) {
+commonExprs.zipWithIndex.map { case (exprs, i) =>
+  val expr = exprs.head
+  val eval = commonExprVals(i)
+
+  val isNullLiteral = eval.isNull match {
+case TrueLiteral | FalseLiteral => true
+case _ => false
+  }
+  val (isNull, isNullEvalCode) = if (!isNullLiteral) {
+val v = addMutableState(JAVA_BOOLEAN, "subExprIsNull")
+(JavaCode.isNullGlobal(v), s"$v = ${eval.isNull};")
+  } else {
+(eval.isNull, "")
+  }
+
+  // Generate the code for this expression tree and wrap it in a 
function.
+  val fnName = freshName("subExpr")
+  val inputVars = getLocalInputVariableValues(this, expr).toSeq
+  val argList = inputVars.map(v => s"${v.javaType.getName} 
${v.variableName}")
+  val returnType = javaType(expr.dataType)
+  val fn =
+s"""
+   |private $returnType $fnName(${argList.mkString(", ")}) {
+   |  ${eval.code}
+   |  $isNullEvalCode
+   |  return ${eval.value};
+   |}
+   """.stripMargin
 
 Review comment:
   We can apply the same change in 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala#L1060-L1069


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #25710: [SPARK-29008][SQL] Define an individual method for each common subexpression in HashAggregateExec

2019-09-06 Thread GitBox
maropu commented on a change in pull request #25710: [SPARK-29008][SQL] Define 
an individual method for each common subexpression in HashAggregateExec
URL: https://github.com/apache/spark/pull/25710#discussion_r321931750
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ##
 @@ -1028,13 +1028,57 @@ class CodegenContext {
 // Get all the expressions that appear at least twice and set up the state 
for subexpression
 // elimination.
 val commonExprs = 
equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1)
-val codes = commonExprs.map { e =>
-  val expr = e.head
-  // Generate the code for this expression tree.
-  val eval = expr.genCode(this)
-  val state = SubExprEliminationState(eval.isNull, eval.value)
-  e.foreach(localSubExprEliminationExprs.put(_, state))
-  eval.code.toString
+val commonExprVals = commonExprs.map(_.head.genCode(this))
+
+lazy val nonSplitExprCode = {
+  commonExprs.zip(commonExprVals).map { case (exprs, eval) =>
+// Generate the code for this expression tree.
+val state = SubExprEliminationState(eval.isNull, eval.value)
+exprs.foreach(localSubExprEliminationExprs.put(_, state))
+eval.code.toString
+  }
+}
+
+val codes = if (commonExprVals.map(_.code.length).sum > 
SQLConf.get.methodSplitThreshold) {
+  if (commonExprs.map(calculateParamLength).forall(isValidParamLength)) {
+commonExprs.zipWithIndex.map { case (exprs, i) =>
+  val expr = exprs.head
+  val eval = commonExprVals(i)
+  val inputVars = getLocalInputVariableValues(this, expr).toSeq
+  val fnName = freshName("subExpr")
+  val isNull = addMutableState(JAVA_BOOLEAN, "subExprIsNull")
+  val value = addMutableState(javaType(expr.dataType), "subExprValue")
+
+  // Generate the code for this expression tree and wrap it in a 
function.
+  val argList = inputVars.map(v => s"${v.javaType.getName} 
${v.variableName}")
+  val fn =
+s"""
+   |private void $fnName(${argList.mkString(", ")}) {
+   |  ${eval.code}
+   |  $isNull = ${eval.isNull};
+   |  $value = ${eval.value};
+   |}
+   """.stripMargin
+
+  val state = SubExprEliminationState(
+JavaCode.isNullGlobal(isNull), JavaCode.global(value, 
expr.dataType))
+  exprs.foreach(localSubExprEliminationExprs.put(_, state))
+  val inputVariables = inputVars.map(_.variableName).mkString(", ")
+  s"${addNewFunction(fnName, fn)}($inputVariables);"
 
 Review comment:
   I think we might be able to split more as @mgaido91 did so in #25642.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #25710: [SPARK-29008][SQL] Define an individual method for each common subexpression in HashAggregateExec

2019-09-06 Thread GitBox
maropu commented on a change in pull request #25710: [SPARK-29008][SQL] Define 
an individual method for each common subexpression in HashAggregateExec
URL: https://github.com/apache/spark/pull/25710#discussion_r321931750
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ##
 @@ -1028,13 +1028,57 @@ class CodegenContext {
 // Get all the expressions that appear at least twice and set up the state 
for subexpression
 // elimination.
 val commonExprs = 
equivalentExpressions.getAllEquivalentExprs.filter(_.size > 1)
-val codes = commonExprs.map { e =>
-  val expr = e.head
-  // Generate the code for this expression tree.
-  val eval = expr.genCode(this)
-  val state = SubExprEliminationState(eval.isNull, eval.value)
-  e.foreach(localSubExprEliminationExprs.put(_, state))
-  eval.code.toString
+val commonExprVals = commonExprs.map(_.head.genCode(this))
+
+lazy val nonSplitExprCode = {
+  commonExprs.zip(commonExprVals).map { case (exprs, eval) =>
+// Generate the code for this expression tree.
+val state = SubExprEliminationState(eval.isNull, eval.value)
+exprs.foreach(localSubExprEliminationExprs.put(_, state))
+eval.code.toString
+  }
+}
+
+val codes = if (commonExprVals.map(_.code.length).sum > 
SQLConf.get.methodSplitThreshold) {
+  if (commonExprs.map(calculateParamLength).forall(isValidParamLength)) {
+commonExprs.zipWithIndex.map { case (exprs, i) =>
+  val expr = exprs.head
+  val eval = commonExprVals(i)
+  val inputVars = getLocalInputVariableValues(this, expr).toSeq
+  val fnName = freshName("subExpr")
+  val isNull = addMutableState(JAVA_BOOLEAN, "subExprIsNull")
+  val value = addMutableState(javaType(expr.dataType), "subExprValue")
+
+  // Generate the code for this expression tree and wrap it in a 
function.
+  val argList = inputVars.map(v => s"${v.javaType.getName} 
${v.variableName}")
+  val fn =
+s"""
+   |private void $fnName(${argList.mkString(", ")}) {
+   |  ${eval.code}
+   |  $isNull = ${eval.isNull};
+   |  $value = ${eval.value};
+   |}
+   """.stripMargin
+
+  val state = SubExprEliminationState(
+JavaCode.isNullGlobal(isNull), JavaCode.global(value, 
expr.dataType))
+  exprs.foreach(localSubExprEliminationExprs.put(_, state))
+  val inputVariables = inputVars.map(_.variableName).mkString(", ")
+  s"${addNewFunction(fnName, fn)}($inputVariables);"
 
 Review comment:
   I think we might be able to split more as @mgaido91 did in #25642.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org