Tien-Dung LE created SPARK-13932: ------------------------------------ Summary: CUBE Query with filter (HAVING) and condition (IF) raises an AnalysisException Key: SPARK-13932 URL: https://issues.apache.org/jira/browse/SPARK-13932 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.6.1, 1.6.0 Reporter: Tien-Dung LE
A complex aggregate query using condition in the aggregate function and GROUP BY HAVING clause raises an exception. Here is a typical erro message {code} org.apache.spark.sql.AnalysisException: Reference 'b' is ambiguous, could be: b#55, b#124.; line 1 pos 178 at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:287) {code} Here is a code snippet to re-produce the error in a spark-shell session: {code} import sqlContext.implicits._ case class Toto( a: String = f"${(math.random*1e6).toLong}%06.0f", b: Int = (math.random*1e3).toInt, n: Int = (math.random*1e3).toInt, m: Double = (math.random*1e3)) val data = sc.parallelize(1 to 1e6.toInt).map(i => Toto()) val df: org.apache.spark.sql.DataFrame = sqlContext.createDataFrame( data ) df.registerTempTable( "toto" ) val sqlSelect1 = "SELECT a, b, COUNT(1) AS k1, COUNT(1) AS k2, SUM(m) AS k3, GROUPING__ID" val sqlSelect2 = "SELECT a, b, COUNT(1) AS k1, COUNT(IF(n > 500,1,0)) AS k2, SUM(m) AS k3, GROUPING__ID" val sqlGroupBy = "FROM toto GROUP BY a, b GROUPING SETS ((a,b),(a),(b))" val sqlHaving = "HAVING ((GROUPING__ID & 1) == 1) AND (b > 500)" sqlContext.sql( s"$sqlSelect1 $sqlGroupBy $sqlHaving" ) // OK sqlContext.sql( s"$sqlSelect2 $sqlGroupBy" ) // OK sqlContext.sql( s"$sqlSelect2 $sqlGroupBy $sqlHaving" ) // ERROR {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org