[jira] [Created] (SPARK-37752) Python UDF fails when it should not get evaluated

Ohad Raviv (Jira) Mon, 27 Dec 2021 01:56:12 -0800

Ohad Raviv created SPARK-37752:
----------------------------------

             Summary: Python UDF fails when it should not get evaluated
                 Key: SPARK-37752
                 URL: https://issues.apache.org/jira/browse/SPARK-37752
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.3.4
            Reporter: Ohad Raviv



Haven't checked on newer versions yet.

If i define in Python:
{code:java}
def udf1(col1):
    print(col1[2])
    return "blah"

spark.udf.register("udf1", udf1) {code}
and then use it in SQL:
{code:java}
select case when length(c)>2 then udf1(c) end
from (
    select explode(array("123","234","12")) as c
) {code}
it fails on:
{noformat}
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 253, in 
main
    process()
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 248, in 
process
    serializer.dump_stream(func(split_index, iterator), outfile)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 155, in 
<lambda>
    func = lambda _, it: map(mapper, it)
  File "<string>", line 1, in <lambda>
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 76, in 
<lambda>
    return lambda *a: f(*a)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/util.py", line 55, in 
wrapper
    return f(*args, **kwargs)
  File "<stdin>", line 3, in udf1
IndexError: string index out of range{noformat}
Although in the out-of-range row it should not get evaluated at all as the 
case-when filters for lengths of more than 2 letters.

the same scenario works great when we define instead a Scala UDF.

will check now if it happens also for newer versions.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37752) Python UDF fails when it should not get evaluated

Reply via email to