skambha commented on a change in pull request #24593: [SPARK-27692][SQL] Add
new optimizer rule to evaluate the deterministic scala udf only once if all
inputs are literals
URL: https://github.com/apache/spark/pull/24593#discussion_r283573248
##########
File path:
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
##########
@@ -892,7 +893,7 @@ class AvroSuite extends QueryTest with SharedSQLContext
with SQLTestUtils {
assert(msg.contains("Cannot save interval data type into external
storage."))
msg = intercept[AnalysisException] {
- spark.udf.register("testType", () => new IntervalData())
+ spark.udf.register("testType", udf(() => new
IntervalData()).asNondeterministic())
Review comment:
Thanks for the question. The reason this test is changed is for the
following reason.
```
msg = intercept[AnalysisException] {
spark.udf.register("testType", () => new IntervalData())
sql("select
testType()").write.format("avro").mode("overwrite").save(tempDir)
}.getMessage
assert(msg.toLowerCase(Locale.ROOT)
.contains(s"avro data source does
not support calendarinterval data type."))
}
```
This is the **original** test case. It is testing an error code path for
the datasource. It triggers this codepath by calling a udf that returns the
IntervalData. However the IntervalData and the corresponding UDT does not
support the serialize or deserialize methods.
Now with the new optimization rule in this pr, an evaluation of the udf will
happen during optimization phase if the udf is deterministic and the inputs are
literals. In this case, both those conditions satisfy and it will try to
evaluate the udf, but since in this case the serialize methods are not
implemented for this udt, it will fail. Thus we get a different error than the
error that this test case was trying to test.
In order to have the test case try to **test the original error codepath**,
I have changed the udf to be non deterministic. This is done by this line:
`spark.udf.register("testType", udf(() => new
IntervalData()).asNondeterministic())`
Hope this helps.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]