dilipbiswal opened a new pull request #24209: [SPARK-27255] Aggregate functions 
should not be allowed in WHERE
URL: https://github.com/apache/spark/pull/24209
 
 
   ## What changes were proposed in this pull request?
   In the PR, we raise an AnalysisError when we detect the presense of 
aggregate expressions in where clause. Here is the problem description from the 
JIRA.
   
   Aggregate functions should not be allowed in WHERE clause. But Spark SQL 
throws an exception when generating codes. It is supposed to throw an exception 
during parsing or analyzing.
   
   Here is an example:
   ```
   val df = spark.sql("select * from t where sum(ta) > 0")
   df.explain(true)
   df.show()
   ```
   Resulting exception: 
   ```
   Exception in thread "main" java.lang.UnsupportedOperationException: Cannot 
generate code for expression: sum(cast(input[0, int, false] as bigint))
        at 
org.apache.spark.sql.catalyst.expressions.Unevaluable.doGenCode(Expression.scala:291)
        at 
org.apache.spark.sql.catalyst.expressions.Unevaluable.doGenCode$(Expression.scala:290)
        at 
org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression.doGenCode(interfaces.scala:87)
        at 
org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:138)
        at scala.Option.getOrElse(Option.scala:138)
   ```
   Checked the behaviour of other database and all of them return an exception:
   **Postgress**
   ```
   select * from foo where max(c1) > 0;
   Error
   ERROR: aggregate functions are not allowed in WHERE Position: 25
   ```
   **DB2**
   ```
   db2 => select * from foo where max(c1) > 0;
   SQL0120N  Invalid use of an aggregate function or OLAP function.  
   ```
   **Oracle**
   ```
   select * from foo where max(c1) > 0;
   ORA-00934: group function is not allowed here
   ```
   **MySql**
   ```
   select * from foo where max(c1) > 0;
   Invalid use of group function
   ```
   
   ## How was this patch tested?
   Added tests in AnalysisErrorSuite and group-by.sql

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to