[GitHub] [spark] viirya opened a new pull request #25204: [SPARK-28441][SQL][Python] Fix error when PythonUDF is used in correlated scalar subquery

GitBox Fri, 19 Jul 2019 07:36:56 -0700

viirya opened a new pull request #25204: [SPARK-28441][SQL][Python] Fix error 
when PythonUDF is used in correlated scalar subquery
URL: https://github.com/apache/spark/pull/25204
 
 
   ## What changes were proposed in this pull request?
   
   In SPARK-15370, We checked the expression at the root of the correlated 
subquery, in order to fix count bug. If a `PythonUDF` in in the checking path, 
evaluating it causes the failure as we can't statically evaluate `PythonUDF`. 
The Python UDF test added at SPARK-28277 shows this issue.
   
   If we can statically evaluate the expression, we intercept NULL values 
coming from the outer join and replace them with the value that the subquery's 
expression like before, if it is not, we replace them with the `PythonUDF` 
expression, with statically evaluated parameters.
   
   After this, the last query in `udf-except.sql` which throws 
`java.lang.UnsupportedOperationException` can be run:
   
   ```
   SELECT t1.k
   FROM   t1
   WHERE  t1.v <= (SELECT   udf(max(udf(t2.v)))
                   FROM     t2
                   WHERE    udf(t2.k) = udf(t1.k))
   MINUS
   SELECT t1.k
   FROM   t1
   WHERE  udf(t1.v) >= (SELECT   min(udf(t2.v))
                   FROM     t2
                   WHERE    t2.k = t1.k)
   -- !query 2 schema
   struct<k:string>
   -- !query 2 output
   two
   ```
   
   ## How was this patch tested?
   
   Added tests.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] viirya opened a new pull request #25204: [SPARK-28441][SQL][Python] Fix error when PythonUDF is used in correlated scalar subquery

Reply via email to