spark git commit: [SPARK-20534][SQL] Make outer generate exec return empty rows

2017-05-01 Thread lixiao
Repository: spark
Updated Branches:
  refs/heads/branch-2.2 c890e938c -> 813abd2db


[SPARK-20534][SQL] Make outer generate exec return empty rows

## What changes were proposed in this pull request?
Generate exec does not produce `null` values if the generator for the input row 
is empty and the generate operates in outer mode without join. This is caused 
by the fact that the `join=false` code path is different from the `join=true` 
code path, and that the `join=false` code path did deal with outer properly. 
This PR addresses this issue.

## How was this patch tested?
Updated `outer*` tests in `GeneratorFunctionSuite`.

Author: Herman van Hovell 

Closes #17810 from hvanhovell/SPARK-20534.

(cherry picked from commit 6b44c4d63ab14162e338c5f1ac77333956870a90)
Signed-off-by: gatorsmile 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/813abd2d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/813abd2d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/813abd2d

Branch: refs/heads/branch-2.2
Commit: 813abd2db6140c4a294cdbeca2303dbfb7903107
Parents: c890e93
Author: Herman van Hovell 
Authored: Mon May 1 09:46:35 2017 -0700
Committer: gatorsmile 
Committed: Mon May 1 09:46:44 2017 -0700

--
 .../sql/catalyst/optimizer/Optimizer.scala  |  3 +-
 .../plans/logical/basicLogicalOperators.scala   |  2 +-
 .../spark/sql/execution/GenerateExec.scala  | 33 +++-
 .../spark/sql/GeneratorFunctionSuite.scala  | 12 +++
 4 files changed, 26 insertions(+), 24 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/813abd2d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index dd768d1..f2b9764 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -441,8 +441,7 @@ object ColumnPruning extends Rule[LogicalPlan] {
   g.copy(child = prunedChild(g.child, g.references))
 
 // Turn off `join` for Generate if no column from it's child is used
-case p @ Project(_, g: Generate)
-if g.join && !g.outer && p.references.subsetOf(g.generatedSet) =>
+case p @ Project(_, g: Generate) if g.join && 
p.references.subsetOf(g.generatedSet) =>
   p.copy(child = g.copy(join = false))
 
 // Eliminate unneeded attributes from right side of a Left Existence Join.

http://git-wip-us.apache.org/repos/asf/spark/blob/813abd2d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
index 3ad757e..f663d7b 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
@@ -83,7 +83,7 @@ case class Project(projectList: Seq[NamedExpression], child: 
LogicalPlan) extend
  * @param join  when true, each output row is implicitly joined with the input 
tuple that produced
  *  it.
  * @param outer when true, each input row will be output at least once, even 
if the output of the
- *  given `generator` is empty. `outer` has no effect when `join` 
is false.
+ *  given `generator` is empty.
  * @param qualifier Qualifier for the attributes of generator(UDTF)
  * @param generatorOutput The output schema of the Generator.
  * @param child Children logical plan node

http://git-wip-us.apache.org/repos/asf/spark/blob/813abd2d/sql/core/src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala
index f87d058..1812a11 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala
@@ -32,7 +32,7 @@ import org.apache.spark.sql.types.{ArrayType, DataType, 
MapType, StructType}
 private[execution] sealed case 

spark git commit: [SPARK-20534][SQL] Make outer generate exec return empty rows

2017-05-01 Thread lixiao
Repository: spark
Updated Branches:
  refs/heads/master f0169a1c6 -> 6b44c4d63


[SPARK-20534][SQL] Make outer generate exec return empty rows

## What changes were proposed in this pull request?
Generate exec does not produce `null` values if the generator for the input row 
is empty and the generate operates in outer mode without join. This is caused 
by the fact that the `join=false` code path is different from the `join=true` 
code path, and that the `join=false` code path did deal with outer properly. 
This PR addresses this issue.

## How was this patch tested?
Updated `outer*` tests in `GeneratorFunctionSuite`.

Author: Herman van Hovell 

Closes #17810 from hvanhovell/SPARK-20534.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6b44c4d6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6b44c4d6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6b44c4d6

Branch: refs/heads/master
Commit: 6b44c4d63ab14162e338c5f1ac77333956870a90
Parents: f0169a1
Author: Herman van Hovell 
Authored: Mon May 1 09:46:35 2017 -0700
Committer: gatorsmile 
Committed: Mon May 1 09:46:35 2017 -0700

--
 .../sql/catalyst/optimizer/Optimizer.scala  |  3 +-
 .../plans/logical/basicLogicalOperators.scala   |  2 +-
 .../spark/sql/execution/GenerateExec.scala  | 33 +++-
 .../spark/sql/GeneratorFunctionSuite.scala  | 12 +++
 4 files changed, 26 insertions(+), 24 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6b44c4d6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index dd768d1..f2b9764 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -441,8 +441,7 @@ object ColumnPruning extends Rule[LogicalPlan] {
   g.copy(child = prunedChild(g.child, g.references))
 
 // Turn off `join` for Generate if no column from it's child is used
-case p @ Project(_, g: Generate)
-if g.join && !g.outer && p.references.subsetOf(g.generatedSet) =>
+case p @ Project(_, g: Generate) if g.join && 
p.references.subsetOf(g.generatedSet) =>
   p.copy(child = g.copy(join = false))
 
 // Eliminate unneeded attributes from right side of a Left Existence Join.

http://git-wip-us.apache.org/repos/asf/spark/blob/6b44c4d6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
index 3ad757e..f663d7b 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
@@ -83,7 +83,7 @@ case class Project(projectList: Seq[NamedExpression], child: 
LogicalPlan) extend
  * @param join  when true, each output row is implicitly joined with the input 
tuple that produced
  *  it.
  * @param outer when true, each input row will be output at least once, even 
if the output of the
- *  given `generator` is empty. `outer` has no effect when `join` 
is false.
+ *  given `generator` is empty.
  * @param qualifier Qualifier for the attributes of generator(UDTF)
  * @param generatorOutput The output schema of the Generator.
  * @param child Children logical plan node

http://git-wip-us.apache.org/repos/asf/spark/blob/6b44c4d6/sql/core/src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala
index f87d058..1812a11 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/GenerateExec.scala
@@ -32,7 +32,7 @@ import org.apache.spark.sql.types.{ArrayType, DataType, 
MapType, StructType}
 private[execution] sealed case class LazyIterator(func: () => 
TraversableOnce[InternalRow])
   extends Iterator[InternalRow] {
 
-  lazy val results =