[GitHub] [spark] skambha commented on a change in pull request #24593: [SPARK-27692][SQL] Add new optimizer rule to evaluate the deterministic scala udf only once if all inputs are literals

GitBox Mon, 13 May 2019 16:18:54 -0700

skambha commented on a change in pull request #24593: [SPARK-27692][SQL] Add 
new optimizer rule to evaluate the deterministic scala udf only once if all 
inputs are literals
URL: https://github.com/apache/spark/pull/24593#discussion_r283573248


 ##########
 File path: 
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala
 ##########
 @@ -892,7 +893,7 @@ class AvroSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils {
       assert(msg.contains("Cannot save interval data type into external 
storage."))
 
       msg = intercept[AnalysisException] {
-        spark.udf.register("testType", () => new IntervalData())
+        spark.udf.register("testType", udf(() => new 
IntervalData()).asNondeterministic())
 
 Review comment:
   Thanks for the question.  The reason this test is changed is for the 
following reason. 
   
   ```
   msg = intercept[AnalysisException] {     
   spark.udf.register("testType", () => new IntervalData())    
   
    sql("select 
testType()").write.format("avro").mode("overwrite").save(tempDir)  
    }.getMessage
     assert(msg.toLowerCase(Locale.ROOT)     .contains(s"avro data source does 
not support calendarinterval data type.")) }
   ```
   
   
   This is the **original** test case.  It is testing an error code path for 
the datasource.   It triggers this codepath by calling a udf that returns the 
IntervalData. However  the IntervalData and the corresponding UDT does not 
support the serialize or deserialize methods.  
   
   Now with the new optimization rule in this pr, an evaluation of the udf will 
happen during optimization phase if the udf is deterministic and the inputs are 
literals.  In this case, both those conditions satisfy and it will try to 
evaluate the udf, but since in this case the serialize methods are not 
implemented for this udt, it will fail.  Thus we get a different error than the 
error that this test case was trying to test. 
   
   In order to have the test case try to **test the original error codepath**,  
I have changed the udf to be non deterministic.  This is done by this line: 
   
   `spark.udf.register("testType", udf(() => new 
IntervalData()).asNondeterministic())`
   
   Hope this helps.  

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] skambha commented on a change in pull request #24593: [SPARK-27692][SQL] Add new optimizer rule to evaluate the deterministic scala udf only once if all inputs are literals

Reply via email to