spark git commit: [SPARK-16291][SQL] CheckAnalysis should capture nested aggregate functions that reference no input attributes

2016-06-29 Thread wenchen
Repository: spark
Updated Branches:
  refs/heads/branch-2.0 904122335 -> 1b4d63f6f


[SPARK-16291][SQL] CheckAnalysis should capture nested aggregate functions that 
reference no input attributes

## What changes were proposed in this pull request?

`MAX(COUNT(*))` is invalid since aggregate expression can't be nested within 
another aggregate expression. This case should be captured at analysis phase, 
but somehow sneaks off to runtime.

The reason is that when checking aggregate expressions in `CheckAnalysis`, a 
checking branch treats all expressions that reference no input attributes as 
valid ones. However, `MAX(COUNT(*))` is translated into `MAX(COUNT(1))` at 
analysis phase and also references no input attribute.

This PR fixes this issue by removing the aforementioned branch.

## How was this patch tested?

New test case added in `AnalysisErrorSuite`.

Author: Cheng Lian 

Closes #13968 from liancheng/spark-16291-nested-agg-functions.

(cherry picked from commit d1e8108854deba3de8e2d87eb4389d11fb17ee57)
Signed-off-by: Wenchen Fan 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1b4d63f6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1b4d63f6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1b4d63f6

Branch: refs/heads/branch-2.0
Commit: 1b4d63f6f1e9f5aa819a149e1f5e45bba7d865bb
Parents: 9041223
Author: Cheng Lian 
Authored: Wed Jun 29 19:08:36 2016 +0800
Committer: Wenchen Fan 
Committed: Wed Jun 29 19:09:00 2016 +0800

--
 .../spark/sql/catalyst/analysis/CheckAnalysis.scala |  1 -
 .../sql/catalyst/analysis/AnalysisErrorSuite.scala  | 12 +++-
 .../test/scala/org/apache/spark/sql/SQLQuerySuite.scala |  4 +---
 3 files changed, 12 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1b4d63f6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
index ac9693e..7b30fcc 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
@@ -206,7 +206,6 @@ trait CheckAnalysis extends PredicateHelper {
 "Add to group by or wrap in first() (or first_value) if 
you don't care " +
 "which value you get.")
   case e if groupingExprs.exists(_.semanticEquals(e)) => // OK
-  case e if e.references.isEmpty => // OK
   case e => e.children.foreach(checkValidAggregateExpression)
 }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/1b4d63f6/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
index a41383f..a9cde1e 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
@@ -23,7 +23,7 @@ import org.apache.spark.sql.AnalysisException
 import org.apache.spark.sql.catalyst.dsl.expressions._
 import org.apache.spark.sql.catalyst.dsl.plans._
 import org.apache.spark.sql.catalyst.expressions._
-import 
org.apache.spark.sql.catalyst.expressions.aggregate.{AggregateExpression, 
Complete, Count}
+import 
org.apache.spark.sql.catalyst.expressions.aggregate.{AggregateExpression, 
Complete, Count, Max}
 import org.apache.spark.sql.catalyst.plans.{Inner, LeftOuter, RightOuter}
 import org.apache.spark.sql.catalyst.plans.logical._
 import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, 
GenericArrayData, MapData}
@@ -163,6 +163,16 @@ class AnalysisErrorSuite extends AnalysisTest {
 "Distinct window functions are not supported" :: Nil)
 
   errorTest(
+"nested aggregate functions",
+testRelation.groupBy('a)(
+  AggregateExpression(
+Max(AggregateExpression(Count(Literal(1)), Complete, isDistinct = 
false)),
+Complete,
+isDistinct = false)),
+"not allowed to use an aggregate function in the argument of another 
aggregate function." :: Nil
+  )
+
+  errorTest(
 "offset window function",
 testRelation2.select(
   WindowExpression(


spark git commit: [SPARK-16291][SQL] CheckAnalysis should capture nested aggregate functions that reference no input attributes

2016-06-29 Thread wenchen
Repository: spark
Updated Branches:
  refs/heads/master 757dc2c09 -> d1e810885


[SPARK-16291][SQL] CheckAnalysis should capture nested aggregate functions that 
reference no input attributes

## What changes were proposed in this pull request?

`MAX(COUNT(*))` is invalid since aggregate expression can't be nested within 
another aggregate expression. This case should be captured at analysis phase, 
but somehow sneaks off to runtime.

The reason is that when checking aggregate expressions in `CheckAnalysis`, a 
checking branch treats all expressions that reference no input attributes as 
valid ones. However, `MAX(COUNT(*))` is translated into `MAX(COUNT(1))` at 
analysis phase and also references no input attribute.

This PR fixes this issue by removing the aforementioned branch.

## How was this patch tested?

New test case added in `AnalysisErrorSuite`.

Author: Cheng Lian 

Closes #13968 from liancheng/spark-16291-nested-agg-functions.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d1e81088
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d1e81088
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d1e81088

Branch: refs/heads/master
Commit: d1e8108854deba3de8e2d87eb4389d11fb17ee57
Parents: 757dc2c
Author: Cheng Lian 
Authored: Wed Jun 29 19:08:36 2016 +0800
Committer: Wenchen Fan 
Committed: Wed Jun 29 19:08:36 2016 +0800

--
 .../spark/sql/catalyst/analysis/CheckAnalysis.scala |  1 -
 .../sql/catalyst/analysis/AnalysisErrorSuite.scala  | 12 +++-
 .../test/scala/org/apache/spark/sql/SQLQuerySuite.scala |  4 +---
 3 files changed, 12 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d1e81088/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
index ac9693e..7b30fcc 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala
@@ -206,7 +206,6 @@ trait CheckAnalysis extends PredicateHelper {
 "Add to group by or wrap in first() (or first_value) if 
you don't care " +
 "which value you get.")
   case e if groupingExprs.exists(_.semanticEquals(e)) => // OK
-  case e if e.references.isEmpty => // OK
   case e => e.children.foreach(checkValidAggregateExpression)
 }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/d1e81088/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
index a41383f..a9cde1e 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisErrorSuite.scala
@@ -23,7 +23,7 @@ import org.apache.spark.sql.AnalysisException
 import org.apache.spark.sql.catalyst.dsl.expressions._
 import org.apache.spark.sql.catalyst.dsl.plans._
 import org.apache.spark.sql.catalyst.expressions._
-import 
org.apache.spark.sql.catalyst.expressions.aggregate.{AggregateExpression, 
Complete, Count}
+import 
org.apache.spark.sql.catalyst.expressions.aggregate.{AggregateExpression, 
Complete, Count, Max}
 import org.apache.spark.sql.catalyst.plans.{Inner, LeftOuter, RightOuter}
 import org.apache.spark.sql.catalyst.plans.logical._
 import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, 
GenericArrayData, MapData}
@@ -163,6 +163,16 @@ class AnalysisErrorSuite extends AnalysisTest {
 "Distinct window functions are not supported" :: Nil)
 
   errorTest(
+"nested aggregate functions",
+testRelation.groupBy('a)(
+  AggregateExpression(
+Max(AggregateExpression(Count(Literal(1)), Complete, isDistinct = 
false)),
+Complete,
+isDistinct = false)),
+"not allowed to use an aggregate function in the argument of another 
aggregate function." :: Nil
+  )
+
+  errorTest(
 "offset window function",
 testRelation2.select(
   WindowExpression(

http://git-wip-us.apache.org/repos/asf/spark/blob/d1e81088/sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala