[GitHub] [spark] dtenedor commented on a diff in pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

2023-03-21 Thread via GitHub


dtenedor commented on code in PR #40496:
URL: https://github.com/apache/spark/pull/40496#discussion_r1144040124


##
sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestHelper.scala:
##
@@ -59,10 +61,39 @@ trait SQLQueryTestHelper extends Logging {
*/
   protected def getNormalizedQueryAnalysisResult(
   session: SparkSession, sql: String): (String, Seq[String]) = {
+// Note that creating the following DataFrame includes eager execution for 
commands that create
+// objects such as views. Therefore any following queries that reference 
these objects should
+// find them in the catalog.
 val df = session.sql(sql)
 val schema = df.schema.catalogString
-// Get the answer, but also get rid of the #1234 expression IDs that show 
up in analyzer plans.
-(schema, Seq(replaceNotIncludedMsg(df.queryExecution.analyzed.toString)))
+val analyzed = df.queryExecution.analyzed
+// Determine if the analyzed plan contains any nondeterministic 
expressions.
+var deterministic = true
+analyzed.transformAllExpressionsWithSubqueries {
+  case expr: CurrentDate =>
+deterministic = false
+expr
+  case expr: CurrentTimestampLike =>
+deterministic = false
+expr
+  case expr: CurrentUser =>
+deterministic = false
+expr
+  case expr: Literal if expr.dataType == DateType || expr.dataType == 
TimestampType =>
+deterministic = false
+expr
+  case expr if !expr.deterministic =>
+deterministic = false
+expr
+}
+if (deterministic) {
+  // Perform query analysis, but also get rid of the #1234 expression IDs 
that show up in the
+  // resolved plans.
+  (schema, Seq(replaceNotIncludedMsg(analyzed.toString)))
+} else {
+  // The analyzed plan is nondeterministic so elide it from the result to 
keep tests reliable.
+  (schema, Seq("[Analyzer test output redacted due to nondeterminism]"))
+}

Review Comment:
   Oh yeah it's only used in the SQL golden test runner. It's in 
`SQLQueryTestHelper.scala` :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dtenedor commented on a diff in pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

2023-03-21 Thread via GitHub


dtenedor commented on code in PR #40496:
URL: https://github.com/apache/spark/pull/40496#discussion_r1144023834


##
sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestHelper.scala:
##
@@ -59,10 +61,39 @@ trait SQLQueryTestHelper extends Logging {
*/
   protected def getNormalizedQueryAnalysisResult(
   session: SparkSession, sql: String): (String, Seq[String]) = {
+// Note that creating the following DataFrame includes eager execution for 
commands that create
+// objects such as views. Therefore any following queries that reference 
these objects should
+// find them in the catalog.
 val df = session.sql(sql)
 val schema = df.schema.catalogString
-// Get the answer, but also get rid of the #1234 expression IDs that show 
up in analyzer plans.
-(schema, Seq(replaceNotIncludedMsg(df.queryExecution.analyzed.toString)))
+val analyzed = df.queryExecution.analyzed
+// Determine if the analyzed plan contains any nondeterministic 
expressions.
+var deterministic = true
+analyzed.transformAllExpressionsWithSubqueries {
+  case expr: CurrentDate =>
+deterministic = false
+expr
+  case expr: CurrentTimestampLike =>
+deterministic = false
+expr
+  case expr: CurrentUser =>
+deterministic = false
+expr
+  case expr: Literal if expr.dataType == DateType || expr.dataType == 
TimestampType =>
+deterministic = false
+expr
+  case expr if !expr.deterministic =>
+deterministic = false
+expr
+}
+if (deterministic) {
+  // Perform query analysis, but also get rid of the #1234 expression IDs 
that show up in the
+  // resolved plans.
+  (schema, Seq(replaceNotIncludedMsg(analyzed.toString)))
+} else {
+  // The analyzed plan is nondeterministic so elide it from the result to 
keep tests reliable.
+  (schema, Seq("[Analyzer test output redacted due to nondeterminism]"))
+}

Review Comment:
   Hi @amaliujia I am a bit confused by your question, would you mind to 
clarify? This redaction is for the generated analyzer test result files only, 
in order to prevent the tests from becoming flaky from things like 
`current_date()` changing each day. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dtenedor commented on a diff in pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

2023-03-21 Thread via GitHub


dtenedor commented on code in PR #40496:
URL: https://github.com/apache/spark/pull/40496#discussion_r1144021973


##
sql/core/src/test/resources/sql-tests/analyzer-results/ansi/cast.sql.out:
##
@@ -0,0 +1,881 @@
+-- Automatically generated by SQLQueryTestSuite
+-- !query
+SELECT CAST('1.23' AS int)
+-- !query analysis
+Project [cast(1.23 as int) AS CAST(1.23 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('1.23' AS long)
+-- !query analysis
+Project [cast(1.23 as bigint) AS CAST(1.23 AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('-4.56' AS int)
+-- !query analysis
+Project [cast(-4.56 as int) AS CAST(-4.56 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('-4.56' AS long)
+-- !query analysis
+Project [cast(-4.56 as bigint) AS CAST(-4.56 AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('abc' AS int)
+-- !query analysis
+Project [cast(abc as int) AS CAST(abc AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('abc' AS long)
+-- !query analysis
+Project [cast(abc as bigint) AS CAST(abc AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('abc' AS float)
+-- !query analysis
+Project [cast(abc as float) AS CAST(abc AS FLOAT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('abc' AS double)
+-- !query analysis
+Project [cast(abc as double) AS CAST(abc AS DOUBLE)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('1234567890123' AS int)
+-- !query analysis
+Project [cast(1234567890123 as int) AS CAST(1234567890123 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('12345678901234567890123' AS long)
+-- !query analysis
+Project [cast(12345678901234567890123 as bigint) AS 
CAST(12345678901234567890123 AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('' AS int)
+-- !query analysis
+Project [cast( as int) AS CAST( AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('' AS long)
+-- !query analysis
+Project [cast( as bigint) AS CAST( AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('' AS float)
+-- !query analysis
+Project [cast( as float) AS CAST( AS FLOAT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('' AS double)
+-- !query analysis
+Project [cast( as double) AS CAST( AS DOUBLE)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST(NULL AS int)
+-- !query analysis
+Project [cast(null as int) AS CAST(NULL AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST(NULL AS long)
+-- !query analysis
+Project [cast(null as bigint) AS CAST(NULL AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('123.a' AS int)
+-- !query analysis
+Project [cast(123.a as int) AS CAST(123.a AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('123.a' AS long)
+-- !query analysis
+Project [cast(123.a as bigint) AS CAST(123.a AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('123.a' AS float)
+-- !query analysis
+Project [cast(123.a as float) AS CAST(123.a AS FLOAT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('123.a' AS double)
+-- !query analysis
+Project [cast(123.a as double) AS CAST(123.a AS DOUBLE)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('-2147483648' AS int)
+-- !query analysis
+Project [cast(-2147483648 as int) AS CAST(-2147483648 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('-2147483649' AS int)
+-- !query analysis
+Project [cast(-2147483649 as int) AS CAST(-2147483649 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('2147483647' AS int)
+-- !query analysis
+Project [cast(2147483647 as int) AS CAST(2147483647 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('2147483648' AS int)
+-- !query analysis
+Project [cast(2147483648 as int) AS CAST(2147483648 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('-9223372036854775808' AS long)
+-- !query analysis
+Project [cast(-9223372036854775808 as bigint) AS CAST(-9223372036854775808 AS 
BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('-9223372036854775809' AS long)
+-- !query analysis
+Project [cast(-9223372036854775809 as bigint) AS CAST(-9223372036854775809 AS 
BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('9223372036854775807' AS long)
+-- !query analysis
+Project [cast(9223372036854775807 as bigint) AS CAST(9223372036854775807 AS 
BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('9223372036854775808' AS long)
+-- !query analysis
+Project [cast(9223372036854775808 as bigint) AS CAST(9223372036854775808 AS 
BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT HEX(CAST('abc' AS binary))
+-- !query analysis
+Project [hex(cast(abc as binary)) AS hex(CAST(abc AS BINARY))#x]
++- OneRowRelation
+
+
+-- !query
+SELECT HEX(CAST(CAST(123 AS byte) AS binary))
+-- !query analysis
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "DATATYPE_MISMATCH.CAST_WITH_CONF_SUGGESTION",
+  "sqlState" : "42K09",
+  "messageParameters" : {
+"config" : "\"spark.sql.ansi.enabled\"",
+"configVal" : "'false'",
+"sqlExpr" : "\"CAST(CAST(123 AS TINYINT) AS 

[GitHub] [spark] dtenedor commented on a diff in pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

2023-03-20 Thread via GitHub


dtenedor commented on code in PR #40496:
URL: https://github.com/apache/spark/pull/40496#discussion_r1142891754


##
sql/core/src/test/resources/sql-tests/analyzer-results/ansi/cast.sql.out:
##
@@ -0,0 +1,881 @@
+-- Automatically generated by SQLQueryTestSuite
+-- !query
+SELECT CAST('1.23' AS int)
+-- !query analysis
+Project [cast(1.23 as int) AS CAST(1.23 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('1.23' AS long)
+-- !query analysis
+Project [cast(1.23 as bigint) AS CAST(1.23 AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('-4.56' AS int)
+-- !query analysis
+Project [cast(-4.56 as int) AS CAST(-4.56 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('-4.56' AS long)
+-- !query analysis
+Project [cast(-4.56 as bigint) AS CAST(-4.56 AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('abc' AS int)
+-- !query analysis
+Project [cast(abc as int) AS CAST(abc AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('abc' AS long)
+-- !query analysis
+Project [cast(abc as bigint) AS CAST(abc AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('abc' AS float)
+-- !query analysis
+Project [cast(abc as float) AS CAST(abc AS FLOAT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('abc' AS double)
+-- !query analysis
+Project [cast(abc as double) AS CAST(abc AS DOUBLE)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('1234567890123' AS int)
+-- !query analysis
+Project [cast(1234567890123 as int) AS CAST(1234567890123 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('12345678901234567890123' AS long)
+-- !query analysis
+Project [cast(12345678901234567890123 as bigint) AS 
CAST(12345678901234567890123 AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('' AS int)
+-- !query analysis
+Project [cast( as int) AS CAST( AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('' AS long)
+-- !query analysis
+Project [cast( as bigint) AS CAST( AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('' AS float)
+-- !query analysis
+Project [cast( as float) AS CAST( AS FLOAT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('' AS double)
+-- !query analysis
+Project [cast( as double) AS CAST( AS DOUBLE)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST(NULL AS int)
+-- !query analysis
+Project [cast(null as int) AS CAST(NULL AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST(NULL AS long)
+-- !query analysis
+Project [cast(null as bigint) AS CAST(NULL AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('123.a' AS int)
+-- !query analysis
+Project [cast(123.a as int) AS CAST(123.a AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('123.a' AS long)
+-- !query analysis
+Project [cast(123.a as bigint) AS CAST(123.a AS BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('123.a' AS float)
+-- !query analysis
+Project [cast(123.a as float) AS CAST(123.a AS FLOAT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('123.a' AS double)
+-- !query analysis
+Project [cast(123.a as double) AS CAST(123.a AS DOUBLE)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('-2147483648' AS int)
+-- !query analysis
+Project [cast(-2147483648 as int) AS CAST(-2147483648 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('-2147483649' AS int)
+-- !query analysis
+Project [cast(-2147483649 as int) AS CAST(-2147483649 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('2147483647' AS int)
+-- !query analysis
+Project [cast(2147483647 as int) AS CAST(2147483647 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('2147483648' AS int)
+-- !query analysis
+Project [cast(2147483648 as int) AS CAST(2147483648 AS INT)#x]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('-9223372036854775808' AS long)
+-- !query analysis
+Project [cast(-9223372036854775808 as bigint) AS CAST(-9223372036854775808 AS 
BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('-9223372036854775809' AS long)
+-- !query analysis
+Project [cast(-9223372036854775809 as bigint) AS CAST(-9223372036854775809 AS 
BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('9223372036854775807' AS long)
+-- !query analysis
+Project [cast(9223372036854775807 as bigint) AS CAST(9223372036854775807 AS 
BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT CAST('9223372036854775808' AS long)
+-- !query analysis
+Project [cast(9223372036854775808 as bigint) AS CAST(9223372036854775808 AS 
BIGINT)#xL]
++- OneRowRelation
+
+
+-- !query
+SELECT HEX(CAST('abc' AS binary))
+-- !query analysis
+Project [hex(cast(abc as binary)) AS hex(CAST(abc AS BINARY))#x]
++- OneRowRelation
+
+
+-- !query
+SELECT HEX(CAST(CAST(123 AS byte) AS binary))
+-- !query analysis
+org.apache.spark.sql.AnalysisException
+{
+  "errorClass" : "DATATYPE_MISMATCH.CAST_WITH_CONF_SUGGESTION",
+  "sqlState" : "42K09",
+  "messageParameters" : {
+"config" : "\"spark.sql.ansi.enabled\"",
+"configVal" : "'false'",
+"sqlExpr" : "\"CAST(CAST(123 AS TINYINT) AS