[PR] [MINOR][PYTHON] Better error message when Python worker crushes [spark]

via GitHub Sun, 12 Nov 2023 20:42:02 -0800


HyukjinKwon opened a new pull request, #43778:
URL: https://github.com/apache/spark/pull/43778


   ### What changes were proposed in this pull request?
   
   This PR improves the Python UDF error messages to be more actionable.
   
   ### Why are the changes needed?
   
   Suppose you face a segfault error:
   
   ```python
   from pyspark.sql.functions import udf
   import ctypes
   spark.range(1).select(udf(lambda x: ctypes.string_at(0))("id")).collect()
   ```
   
   The current error message is not actionable:
   
   ```
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     ...
   get_return_value
       raise Py4JJavaError(
   py4j.protocol.Py4JJavaError: An error occurred while calling 
o82.collectToPython.
   : org.apache.spark.SparkException: Job aborted due to stage failure: Task 15 
in stage 1.0 failed 1 times, most recent failure: Lost task 15.0 in stage 1.0 
(TID 31) (192.168.123.102 executor driver): org.apache.spark.SparkException:
   Python worker exited unexpectedly (crashed)
   ```
   
   After this PR, it fixes the error message as below:
   
   ```
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     ...
   get_return_value
       raise Py4JJavaError(
   py4j.protocol.Py4JJavaError: An error occurred while calling 
o59.collectToPython.
   : org.apache.spark.SparkException: Job aborted due to stage failure: Task 15 
in stage 0.0 failed 1 times, most recent failure: Lost task 15.0 in stage 0.0 
(TID 15) (192.168.123.102 executor driver): org.apache.spark.SparkException:
   Python worker exited unexpectedly (crashed). Consider setting 
'spark.sql.execution.pyspark.udf.faulthandler.enabled'
   or 'spark.python.worker.faulthandler.enabled' configuration to 'true' forthe 
better Python traceback.
   ```
   
   So you can try this out
   
   ```python
   from pyspark.sql.functions import udf
   import ctypes
   spark.conf.set("spark.sql.execution.pyspark.udf.faulthandler.enabled", 
"true")
   spark.range(1).select(udf(lambda x: ctypes.string_at(0))("id")).collect()
   ```
   
   that now shows where the segfault happens:
   
   ```
   Caused by: org.apache.spark.SparkException: Python worker exited 
unexpectedly (crashed): Fatal Python error: Segmentation fault
   
   Current thread 0x00007ff84ae4b700 (most recent call first):
     File "/.../envs/python3.9/lib/python3.9/ctypes/__init__.py", line 525 in 
string_at
     File "<stdin>", line 1 in <lambda>
     File "/.../lib/pyspark.zip/pyspark/util.py", line 88 in wrapper
     File "/.../lib/pyspark.zip/pyspark/worker.py", line 99 in <lambda>
     File "/.../lib/pyspark.zip/pyspark/worker.py", line 1403 in <genexpr>
     File "/.../lib/pyspark.zip/pyspark/worker.py", line 1403 in mapper
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, it fixes the error message actionable.
   
   ### How was this patch tested?
   
   Manually tested as above.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [MINOR][PYTHON] Better error message when Python worker crushes [spark]

Reply via email to