ueshin opened a new pull request, #52689:
URL: https://github.com/apache/spark/pull/52689
### What changes were proposed in this pull request?
Adds basic logging support.
The logs from Python's standard logger or `print` to `stdout` and `stderr`
will be in the `system.session.python_worker_logs` view.
* `spark.sql.pyspark.worker.logging.enabled` (`False` by default)
When set to true, this configuration enables comprehensive logging
within Python worker processes that execute User-Defined Functions (UDFs),
User-Defined Table Functions (UDTFs), and other Python-based operations in
Spark SQL.
For example:
```py
>>> from pyspark.sql.functions import *
>>> import logging
>>>
>>> @udf
... def f(x):
... logger = logging.getLogger("test")
... logger.setLevel(logging.INFO)
... logger.info(f"INFO level message: {x}")
... print(f"PRINT(STDOUT): {x}")
... print(f"PRINT(STDERR): {x} あいうえお", file=sys.stderr)
... try:
... 1 / x
... except:
... logger.exception(f"1 / {x}")
... return str(x)
...
>>> spark.conf.set("spark.sql.pyspark.worker.logging.enabled", True)
>>>
>>> spark.range(2).select(f("id")).show()
+-----+
|f(id)|
+-----+
| 0|
| 1|
+-----+
>>> spark.table("system.session.python_worker_logs").show(truncate=False)
+--------------------------+-----+---------------------------+----------------+--------------------------------------------------------------+------+
|ts |level|msg |context
|exception |logger|
+--------------------------+-----+---------------------------+----------------+--------------------------------------------------------------+------+
|2025-10-21 16:50:36.204272|INFO |INFO level message: 1 |{func_name ->
f}|NULL |test |
|2025-10-21 16:50:36.206179|INFO |PRINT(STDOUT): 1 |{func_name ->
f}|NULL |stdout|
|2025-10-21 16:50:36.208806|ERROR|PRINT(STDERR): 1 あいうえお|{func_name ->
f}|NULL |stderr|
|2025-10-21 16:50:36.199595|INFO |INFO level message: 0 |{func_name ->
f}|NULL |test |
|2025-10-21 16:50:36.201635|INFO |PRINT(STDOUT): 0 |{func_name ->
f}|NULL |stdout|
|2025-10-21 16:50:36.204332|ERROR|PRINT(STDERR): 0 あいうえお|{func_name ->
f}|NULL |stderr|
|2025-10-21 16:50:36.206082|ERROR|1 / 0 |{func_name ->
f}|{ZeroDivisionError, division by zero, [{NULL, f, <stdin>, 9}]}|test |
+--------------------------+-----+---------------------------+----------------+--------------------------------------------------------------+------+
```
### Why are the changes needed?
The logging in UDF is difficult to collect the logs as they will go to the
executor's `stderr` file.
If there are many executors, need to check the `stderr` files one-by-one.
### Does this PR introduce _any_ user-facing change?
Yes, Python UDF logging is available and collect them via a system view.
### How was this patch tested?
Added the related tests.
### Was this patch authored or co-authored using generative AI tooling?
No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]