Github user kxepal commented on the issue:
https://github.com/apache/spark/pull/15961
@holdenk
Thanks for a warning message text. Nice one!
> I indicated above this swallows all of the Py4J errors and there are a
host of things which could cause the Py4J bridge to break down.
Suddenly, as you may see in [issue's
traceback](https://issues.apache.org/jira/browse/SPARK-18523), it's py4j who
raises too general exception on such kind of problem. I was too expected to see
there Py4JNetworkError since it's network communication issue, but this didn't
happen. The really good exceptions are get swallowed somewhere in the middle
and get just printed to stderr via logging, but I'm not sure how to reraise
them and how much things this will break.
> It seems like the correct action for the user to take when the Py4J
bridge breaks is starting over from scratch, either by exiting and re-running
their notebook or otherwise re-submitting there job.
Yes, that's what happens now: in case of failure we have to shutdown
notebook, start it and re-run all the cells again. If we're not running in
notebook - crash whole the script. Here comes two issues:
1. Usability. If you made some mistake or Spark job eventually fails, you
wouldn't restart whole the notebook, but run cell with `sc.stop` and else
cleanup stuff and re-run your Spark cells. That's simple procedure. But when
Spark context stop fails, you have to follow plan B: restart all the things and
re-run all the cells. That's could be quite boring and actually it is.
2. Correctness. SparkContext is a global shared mutable object and if we
cannot correctly reset it state to default to start over that feels like
something really wrong here. Should we run all the code that uses SparkContext
in subprocesses just to be able to implement retry logic otherwise?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]