GitHub user icexelloss opened a pull request:
https://github.com/apache/spark/pull/19852
[SPARK-22655][PySpark] Throw exception rather than exit silently in
PythonRunner when Spark â¦
â¦session is stopped
## What changes were proposed in this pull request?
We have observed in our production environment that during Spark shutdown,
if there are some active tasks, sometimes they will complete with incorrect
results. We've tracked down the issue to a PythonRunner where it is returning
partial result instead of throwing exception during Spark shutdown.
I think the better way to handle this is to have these tasks fail instead
of complete with partial results (complete with partial is always bad IMHO)
## How was this patch tested?
Existing tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/icexelloss/spark python-runner-shutdown
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19852.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19852
----
commit 75c8191ff51c0f7802ce592fdecca6f551a60687
Author: Li Jin <[email protected]>
Date: 2017-11-29T22:00:42Z
Throw exception rather than exit silently in PythonRunner when Spark
session is stopped
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]