[GitHub] spark pull request #18418: [SPARK-19104][SQL] Lambda variables should work w...

2017-06-27 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18418#discussion_r124223602
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -342,19 +343,38 @@ case class CreateNamedStruct(children: 
Seq[Expression]) extends CreateNamedStruc
 val values = ctx.freshName("values")
 ctx.addMutableState("Object[]", values, s"$values = null;")
 
+// `splitExpressions` might split codes to multiple functions. The 
local variables of
+// `LambdaVariable` can't be accessed in the functions. We need to add 
them into the parameters
+// of the functions.
+val (valExprCodes, valExprParams) = valExprs.map { expr =>
+  val exprCode = expr.genCode(ctx)
+  val lambdaVars = expr.collect {
--- End diff --

This approach doesn't work well after rethinking about this and more 
experiments.

A simpler approach is letting those lambda variables global as MapObjects` 
and `CollectObjectsToMap` do.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18418: [SPARK-19104][SQL] Lambda variables should work w...

2017-06-26 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18418#discussion_r124177003
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -342,19 +343,38 @@ case class CreateNamedStruct(children: 
Seq[Expression]) extends CreateNamedStruc
 val values = ctx.freshName("values")
 ctx.addMutableState("Object[]", values, s"$values = null;")
 
+// `splitExpressions` might split codes to multiple functions. The 
local variables of
+// `LambdaVariable` can't be accessed in the functions. We need to add 
them into the parameters
+// of the functions.
+val (valExprCodes, valExprParams) = valExprs.map { expr =>
+  val exprCode = expr.genCode(ctx)
+  val lambdaVars = expr.collect {
--- End diff --

I'm not very sure about other places. I move the collecting of lambda 
variable to `splitExpressions` and let the possible places to do this check.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18418: [SPARK-19104][SQL] Lambda variables should work w...

2017-06-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18418#discussion_r124168400
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -342,19 +343,38 @@ case class CreateNamedStruct(children: 
Seq[Expression]) extends CreateNamedStruc
 val values = ctx.freshName("values")
 ctx.addMutableState("Object[]", values, s"$values = null;")
 
+// `splitExpressions` might split codes to multiple functions. The 
local variables of
+// `LambdaVariable` can't be accessed in the functions. We need to add 
them into the parameters
+// of the functions.
+val (valExprCodes, valExprParams) = valExprs.map { expr =>
+  val exprCode = expr.genCode(ctx)
+  val lambdaVars = expr.collect {
--- End diff --

IIRC there are a lot of places we use `splitExpressions`, shall we check 
all of them?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18418: [SPARK-19104][SQL] Lambda variables should work w...

2017-06-26 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18418#discussion_r124165135
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -342,19 +343,38 @@ case class CreateNamedStruct(children: 
Seq[Expression]) extends CreateNamedStruc
 val values = ctx.freshName("values")
 ctx.addMutableState("Object[]", values, s"$values = null;")
 
+// `splitExpressions` might split codes to multiple functions. The 
local variables of
+// `LambdaVariable` can't be accessed in the functions. We need to add 
them into the parameters
+// of the functions.
+val (valExprCodes, valExprParams) = valExprs.map { expr =>
+  val exprCode = expr.genCode(ctx)
+  val lambdaVars = expr.collect {
--- End diff --

For example, `CreateArray` is another expression using `splitExpressions`. 
However, if the external type is an array, `MapObjects` will be used to 
construct the internal array. `CreateMap` is the same, I think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18418: [SPARK-19104][SQL] Lambda variables should work w...

2017-06-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18418#discussion_r124163076
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -342,19 +343,38 @@ case class CreateNamedStruct(children: 
Seq[Expression]) extends CreateNamedStruc
 val values = ctx.freshName("values")
 ctx.addMutableState("Object[]", values, s"$values = null;")
 
+// `splitExpressions` might split codes to multiple functions. The 
local variables of
+// `LambdaVariable` can't be accessed in the functions. We need to add 
them into the parameters
+// of the functions.
+val (valExprCodes, valExprParams) = valExprs.map { expr =>
+  val exprCode = expr.genCode(ctx)
+  val lambdaVars = expr.collect {
--- End diff --

this looks hacky, and why you only do it for `CreateNamedStruct`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18418: [SPARK-19104][SQL] Lambda variables should work w...

2017-06-26 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18418#discussion_r124152021
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -736,7 +736,27 @@ class CodegenContext {
   // Cannot split these expressions because they are not created from 
a row object.
   return expressions.mkString("\n")
 }
-splitExpressions(expressions, "apply", ("InternalRow", row) :: Nil)
+splitExpressions(row, expressions, Seq.empty)
--- End diff --

Removed, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18418: [SPARK-19104][SQL] Lambda variables should work w...

2017-06-26 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/18418#discussion_r124081617
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -736,7 +736,27 @@ class CodegenContext {
   // Cannot split these expressions because they are not created from 
a row object.
   return expressions.mkString("\n")
 }
-splitExpressions(expressions, "apply", ("InternalRow", row) :: Nil)
+splitExpressions(row, expressions, Seq.empty)
--- End diff --

Do we need the check in lines 735-738? Now, we do the same check at lines 
754-757, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18418: [SPARK-19104][SQL] Lambda variables should work w...

2017-06-25 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/18418

[SPARK-19104][SQL] Lambda variables should work when parent expression 
splits generated codes

## What changes were proposed in this pull request?

When an expression using lambda variables split the generated codes, the 
generated local variables of lambda variables can't be accessed in the 
generated functions. This patch fixes this issue by adding the lambda variables 
into the function parameter list.

## How was this patch tested?

Jenkins tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 SPARK-19104

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18418.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18418


commit c417e229f4563a6ee857d7ee55582e3b6ca2ed6b
Author: Liang-Chi Hsieh 
Date:   2017-06-26T04:26:00Z

Add lambda variables into the parameters of functions generated by 
splitExpressions.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org