[GitHub] spark pull request #19777: [SPARK-22549][SQL] Fix 64KB JVM bytecode limit pr...

2018-08-01 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/19777#discussion_r207077435
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -125,13 +125,34 @@ case class ConcatWs(children: Seq[Expression])
 if (children.forall(_.dataType == StringType)) {
   // All children are strings. In that case we can construct a fixed 
size array.
   val evals = children.map(_.genCode(ctx))
-
-  val inputs = evals.map { eval =>
-s"${eval.isNull} ? (UTF8String) null : ${eval.value}"
-  }.mkString(", ")
-
-  ev.copy(evals.map(_.code).mkString("\n") + s"""
-UTF8String ${ev.value} = UTF8String.concatWs($inputs);
+  val separator = evals.head
+  val strings = evals.tail
+  val numArgs = strings.length
+  val args = ctx.freshName("args")
+
+  val inputs = strings.zipWithIndex.map { case (eval, index) =>
+if (eval.isNull != "true") {
--- End diff --

@kiszk I was looking at build warnings and it notes that this compares a 
`ExprValue` and `String` and they will always not be equal. Should it be 
`eval.isNull.code != "true"` maybe?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19777: [SPARK-22549][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-20 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19777


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19777: [SPARK-22549][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19777#discussion_r152045624
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -163,13 +184,36 @@ case class ConcatWs(children: Seq[Expression])
 }
   }.unzip
 
-  ev.copy(evals.map(_.code).mkString("\n") +
-  s"""
+  val codes = ctx.splitExpressions(ctx.INPUT_ROW, evals.map(_.code))
+  val varargCounts = ctx.splitExpressions(varargCount, 
"varargCountsConcatWs",
+("InternalRow", ctx.INPUT_ROW) :: Nil,
+"int",
+{ body =>
+  s"""
+   int $varargNum = 0;
+   $body
+   return $varargNum;
+ """
+},
+_.mkString(s"$varargNum += ", s";\n$varargNum += ", ";"))
+  val varargBuilds = ctx.splitExpressions(varargBuild, 
"varargBuildsConcatWs",
--- End diff --

shall we optimize for `eval.isNull == "true"` in `varargCount` and 
`varargBuild`? since you already did it for the all string cases.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19777: [SPARK-22549][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-20 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19777#discussion_r151953946
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -125,19 +125,43 @@ case class ConcatWs(children: Seq[Expression])
 if (children.forall(_.dataType == StringType)) {
   // All children are strings. In that case we can construct a fixed 
size array.
   val evals = children.map(_.genCode(ctx))
-
-  val inputs = evals.map { eval =>
-s"${eval.isNull} ? (UTF8String) null : ${eval.value}"
-  }.mkString(", ")
-
-  ev.copy(evals.map(_.code).mkString("\n") + s"""
-UTF8String ${ev.value} = UTF8String.concatWs($inputs);
+  val separator = evals.head
+  val strings = evals.tail
+  val numArgs = strings.length
+  val args = ctx.freshName("args")
+
+  val inputs = strings.zipWithIndex.map { case (eval, index) =>
+if (eval.isNull != "true") {
+  s"""
+ ${eval.code}
+ if (!${eval.isNull}) {
+   $args[$index] = ${eval.value};
+ }
+   """
+} else {
+  ""
+}
+  }
+  val codes = if (ctx.INPUT_ROW != null && ctx.currentVars == null) {
+ctx.splitExpressions(inputs, "valueConcatWs",
+  ("InternalRow", ctx.INPUT_ROW) :: ("UTF8String[]", args) :: Nil)
+  } else {
+inputs.mkString("\n")
+  }
+  ev.copy(s"""
+UTF8String[] $args = new UTF8String[$numArgs];
+${separator.code}
+$codes
+UTF8String ${ev.value} = UTF8String.concatWs(${separator.value}, 
$args);
 boolean ${ev.isNull} = ${ev.value} == null;
   """)
 } else {
   val array = ctx.freshName("array")
+  ctx.addMutableState("UTF8String[]", array, "")
   val varargNum = ctx.freshName("varargNum")
+  ctx.addMutableState("int", varargNum, "")
   val idxInVararg = ctx.freshName("idxInVararg")
+  ctx.addMutableState("int", idxInVararg, "")
--- End diff --

Sure, let me do it later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19777: [SPARK-22549][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19777#discussion_r151951926
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -125,19 +125,43 @@ case class ConcatWs(children: Seq[Expression])
 if (children.forall(_.dataType == StringType)) {
   // All children are strings. In that case we can construct a fixed 
size array.
   val evals = children.map(_.genCode(ctx))
-
-  val inputs = evals.map { eval =>
-s"${eval.isNull} ? (UTF8String) null : ${eval.value}"
-  }.mkString(", ")
-
-  ev.copy(evals.map(_.code).mkString("\n") + s"""
-UTF8String ${ev.value} = UTF8String.concatWs($inputs);
+  val separator = evals.head
+  val strings = evals.tail
+  val numArgs = strings.length
+  val args = ctx.freshName("args")
+
+  val inputs = strings.zipWithIndex.map { case (eval, index) =>
+if (eval.isNull != "true") {
+  s"""
+ ${eval.code}
+ if (!${eval.isNull}) {
+   $args[$index] = ${eval.value};
+ }
+   """
+} else {
+  ""
+}
+  }
+  val codes = if (ctx.INPUT_ROW != null && ctx.currentVars == null) {
+ctx.splitExpressions(inputs, "valueConcatWs",
+  ("InternalRow", ctx.INPUT_ROW) :: ("UTF8String[]", args) :: Nil)
+  } else {
+inputs.mkString("\n")
+  }
+  ev.copy(s"""
+UTF8String[] $args = new UTF8String[$numArgs];
+${separator.code}
+$codes
+UTF8String ${ev.value} = UTF8String.concatWs(${separator.value}, 
$args);
 boolean ${ev.isNull} = ${ev.value} == null;
   """)
 } else {
   val array = ctx.freshName("array")
+  ctx.addMutableState("UTF8String[]", array, "")
   val varargNum = ctx.freshName("varargNum")
+  ctx.addMutableState("int", varargNum, "")
   val idxInVararg = ctx.freshName("idxInVararg")
+  ctx.addMutableState("int", idxInVararg, "")
--- End diff --

can we do better? I think we can avoid these global vaiables.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19777: [SPARK-22549][SQL] Fix 64KB JVM bytecode limit pr...

2017-11-17 Thread kiszk
GitHub user kiszk opened a pull request:

https://github.com/apache/spark/pull/19777

[SPARK-22549][SQL] Fix 64KB JVM bytecode limit problem with concat_ws

## What changes were proposed in this pull request?

This PR changes `concat_ws` code generation to place generated code for 
expression for arguments into separated methods if these size could be large.
This PR resolved the case of `concat_ws` with a lot of argument

## How was this patch tested?

Added new test cases into `StringExpressionsSuite`

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kiszk/spark SPARK-22549

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19777.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19777


commit 1714c883d193f96e43c19e72a96c15c23d5fc193
Author: Kazuaki Ishizaki 
Date:   2017-11-18T02:40:00Z

initial commit




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org