[GitHub] spark pull request #14277: [SPARK-16640][SQL] Add codegen for Elt function
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14277 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14277: [SPARK-16640][SQL] Add codegen for Elt function
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14277#discussion_r71651284 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -204,6 +204,32 @@ case class Elt(children: Seq[Expression]) } } } + + override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val index = indexExpr.genCode(ctx) +val strings = stringExprs.map(_.genCode(ctx)) +val assignStringValue = strings.zipWithIndex.map { case (eval, index) => + s""" +case ${index + 1}: + ${ev.value} = ${eval.isNull} ? null : ${eval.value}; + break; + """ +}.mkString("\n") +val indexVal = ctx.freshName("index") +val stringArray = ctx.freshName("strings"); + +ev.copy(index.code + "\n" + strings.map(_.code).mkString("\n") + s""" + int $indexVal = ${index.value}; + boolean ${ev.isNull} = false; --- End diff -- we can define this after the switch: `final boolean ${ev.isNull} = ${ev.value} == null;` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14277: [SPARK-16640][SQL] Add codegen for Elt function
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14277#discussion_r71651050 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -204,6 +204,32 @@ case class Elt(children: Seq[Expression]) } } } + + override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val index = indexExpr.genCode(ctx) +val strings = stringExprs.map(_.genCode(ctx)) +val assignStringValue = strings.zipWithIndex.map { case (eval, index) => + s""" +case ${index + 1}: + ${ev.value} = ${eval.isNull} ? null : ${eval.value}; + break; + """ +}.mkString("\n") +val indexVal = ctx.freshName("index") +val stringArray = ctx.freshName("strings"); + +ev.copy(index.code + "\n" + strings.map(_.code).mkString("\n") + s""" + int $indexVal = ${index.value}; --- End diff -- this can be `final` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14277: [SPARK-16640][SQL] Add codegen for Elt function
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14277#discussion_r71650993 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -204,6 +204,32 @@ case class Elt(children: Seq[Expression]) } } } + + override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val index = indexExpr.genCode(ctx) +val strings = stringExprs.map(_.genCode(ctx)) +val assignStringValue = strings.zipWithIndex.map { case (eval, index) => + s""" +case ${index + 1}: + ${ev.value} = ${eval.isNull} ? null : ${eval.value}; + break; --- End diff -- ident is wrong here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14277: [SPARK-16640][SQL] Add codegen for Elt function
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/14277#discussion_r71649272 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -204,6 +204,29 @@ case class Elt(children: Seq[Expression]) } } } + + override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val index = children.head.map(_.genCode(ctx))(0) +val strings = children.tail.map(_.genCode(ctx)) +val stringValues = strings.map { eval => + s"${eval.isNull} ? null : ${eval.value}" +}.mkString(", ") +val indexVal = ctx.freshName("index") +val stringArray = ctx.freshName("strings"); + +ev.copy(index.code + "\n" + strings.map(_.code).mkString("\n") + s""" + int $indexVal = ${index.value} - 1; + UTF8String[] $stringArray = {$stringValues}; --- End diff -- After re-checking the wholestage code, I find that is not correct. If any expression in a sub plan will fall back, then the sub plan will not be wholestage codegen. The whole plan can be wholestage codegen, but for these no wholestage codegen sub plans, they will be wrapped with input adapter. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14277: [SPARK-16640][SQL] Add codegen for Elt function
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/14277#discussion_r71648748 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -204,6 +204,29 @@ case class Elt(children: Seq[Expression]) } } } + + override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val index = children.head.map(_.genCode(ctx))(0) +val strings = children.tail.map(_.genCode(ctx)) +val stringValues = strings.map { eval => + s"${eval.isNull} ? null : ${eval.value}" +}.mkString(", ") +val indexVal = ctx.freshName("index") +val stringArray = ctx.freshName("strings"); + +ev.copy(index.code + "\n" + strings.map(_.code).mkString("\n") + s""" + int $indexVal = ${index.value} - 1; + UTF8String[] $stringArray = {$stringValues}; --- End diff -- Let me do a simple benchmark for this. Actually I think it is no harm to have codegen for this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14277: [SPARK-16640][SQL] Add codegen for Elt function
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/14277#discussion_r71648516 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -204,6 +204,29 @@ case class Elt(children: Seq[Expression]) } } } + + override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val index = children.head.map(_.genCode(ctx))(0) +val strings = children.tail.map(_.genCode(ctx)) +val stringValues = strings.map { eval => + s"${eval.isNull} ? null : ${eval.value}" +}.mkString(", ") +val indexVal = ctx.freshName("index") +val stringArray = ctx.freshName("strings"); + +ev.copy(index.code + "\n" + strings.map(_.code).mkString("\n") + s""" + int $indexVal = ${index.value} - 1; + UTF8String[] $stringArray = {$stringValues}; --- End diff -- ok. `CodegenFallback` does it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14277: [SPARK-16640][SQL] Add codegen for Elt function
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14277#discussion_r71648156 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -204,6 +204,29 @@ case class Elt(children: Seq[Expression]) } } } + + override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val index = children.head.map(_.genCode(ctx))(0) +val strings = children.tail.map(_.genCode(ctx)) +val stringValues = strings.map { eval => + s"${eval.isNull} ? null : ${eval.value}" +}.mkString(", ") +val indexVal = ctx.freshName("index") +val stringArray = ctx.freshName("strings"); + +ev.copy(index.code + "\n" + strings.map(_.code).mkString("\n") + s""" + int $indexVal = ${index.value} - 1; + UTF8String[] $stringArray = {$stringValues}; --- End diff -- no, expressions can always codegen, by default we just pass the expression reference into generated java code and call its `eval` method. So it's only about performance for the expression. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14277: [SPARK-16640][SQL] Add codegen for Elt function
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/14277#discussion_r71646763 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -204,6 +204,29 @@ case class Elt(children: Seq[Expression]) } } } + + override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val index = children.head.map(_.genCode(ctx))(0) +val strings = children.tail.map(_.genCode(ctx)) +val stringValues = strings.map { eval => + s"${eval.isNull} ? null : ${eval.value}" +}.mkString(", ") +val indexVal = ctx.freshName("index") +val stringArray = ctx.freshName("strings"); + +ev.copy(index.code + "\n" + strings.map(_.code).mkString("\n") + s""" + int $indexVal = ${index.value} - 1; + UTF8String[] $stringArray = {$stringValues}; --- End diff -- My main concern is not performance improvement for elt function. As you know, if any expression in a plan doesn't have codegen, the plan can't be wholestage codegen. So I think it is better to have as many codegen expressions as we can. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14277: [SPARK-16640][SQL] Add codegen for Elt function
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14277#discussion_r71646550 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -204,6 +204,29 @@ case class Elt(children: Seq[Expression]) } } } + + override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val index = children.head.map(_.genCode(ctx))(0) +val strings = children.tail.map(_.genCode(ctx)) +val stringValues = strings.map { eval => + s"${eval.isNull} ? null : ${eval.value}" +}.mkString(", ") +val indexVal = ctx.freshName("index") +val stringArray = ctx.freshName("strings"); + +ev.copy(index.code + "\n" + strings.map(_.code).mkString("\n") + s""" + int $indexVal = ${index.value} - 1; + UTF8String[] $stringArray = {$stringValues}; --- End diff -- a `switch` or `if` should work, or we can just fallback to `eval`. Can you benchmark how much speedup we can get by codegen for this expression? If it's only a little, I think it's ok to not have codegen here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14277: [SPARK-16640][SQL] Add codegen for Elt function
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/14277#discussion_r71638204 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -204,6 +204,29 @@ case class Elt(children: Seq[Expression]) } } } + + override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val index = children.head.map(_.genCode(ctx))(0) +val strings = children.tail.map(_.genCode(ctx)) +val stringValues = strings.map { eval => + s"${eval.isNull} ? null : ${eval.value}" +}.mkString(", ") +val indexVal = ctx.freshName("index") +val stringArray = ctx.freshName("strings"); + +ev.copy(index.code + "\n" + strings.map(_.code).mkString("\n") + s""" + int $indexVal = ${index.value} - 1; + UTF8String[] $stringArray = {$stringValues}; --- End diff -- Because the index is only evaluated runtime, I think we can' just evaluate the string we need? Or we can wrap these string expressions in an if/else if or switch block, then we can only evaluate the string expression specified by the index. What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14277: [SPARK-16640][SQL] Add codegen for Elt function
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14277#discussion_r71512431 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -204,6 +204,29 @@ case class Elt(children: Seq[Expression]) } } } + + override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val index = children.head.map(_.genCode(ctx))(0) +val strings = children.tail.map(_.genCode(ctx)) +val stringValues = strings.map { eval => + s"${eval.isNull} ? null : ${eval.value}" +}.mkString(", ") +val indexVal = ctx.freshName("index") +val stringArray = ctx.freshName("strings"); + +ev.copy(index.code + "\n" + strings.map(_.code).mkString("\n") + s""" + int $indexVal = ${index.value} - 1; + UTF8String[] $stringArray = {$stringValues}; --- End diff -- so we will evaluate all string expressions? In `eval` we only evaluate the string expression we need. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14277: [SPARK-16640][SQL] Add codegen for Elt function
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14277#discussion_r71511742 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -204,6 +204,29 @@ case class Elt(children: Seq[Expression]) } } } + + override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val index = children.head.map(_.genCode(ctx))(0) +val strings = children.tail.map(_.genCode(ctx)) --- End diff -- `stringExprs.map(_.genCode(ctx))`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14277: [SPARK-16640][SQL] Add codegen for Elt function
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14277#discussion_r71511689 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -204,6 +204,29 @@ case class Elt(children: Seq[Expression]) } } } + + override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val index = children.head.map(_.genCode(ctx))(0) --- End diff -- `indexExpr.genCode(ctx)`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14277: [SPARK-16640][SQL] Add codegen for Elt function
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/14277 [SPARK-16640][SQL] Add codegen for Elt function ## What changes were proposed in this pull request? Elt function doesn't support codegen execution now. We should add the support. ## How was this patch tested? Jenkins tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 elt-codegen Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14277.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14277 commit c517addc2a00fca7578b4fcb1f47a7ef6f337e5c Author: Liang-Chi HsiehDate: 2016-07-20T05:06:27Z Add codegen for Elt function. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org