ueshin opened a new pull request, #49818:
URL: https://github.com/apache/spark/pull/49818
### What changes were proposed in this pull request?
Adds logs when the Python worker looks stuck.
- `spark.python.worker.idleTimeoutSeconds` (default `0` that means no
timeout)
The time (in seconds) Spark will wait for activity (e.g., data transfer
or communication) from a Python worker before considering it potentially idle
or unresponsive. When the timeout is triggered, Spark will log the
network-related status for debugging purposes. The default is `0` that means no
timeout.
For example:
```py
import time
from pyspark.sql import functions as sf
spark.conf.set('spark.sql.execution.pyspark.udf.idleTimeoutSeconds', '1s')
@sf.udf
def f(x):
time.sleep(2)
return str(x)
spark.range(1).select(f("id")).show()
```
will show a warning message:
```
... WARN PythonUDFWithNamedArgumentsRunner: Idle timeout reached for Python
worker (timeout: 1 seconds). No data received from the worker process:
handle.map(_.isAlive) = Some(true), channel.isConnected = true,
channel.isBlocking = false, selector.isOpen = true, selectionKey.isValid =
true, selectionKey.interestOps = 1, hasInputs = false
```
### Why are the changes needed?
For the monitoring of the Python worker.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Manually checked the logs.
### Was this patch authored or co-authored using generative AI tooling?
No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]