Github user kxepal commented on the issue:

    https://github.com/apache/spark/pull/15961
  
    @holdenk 
    Thanks for a warning message text. Nice one!
    
    > I indicated above this swallows all of the Py4J errors and there are a 
host of things which could cause the Py4J bridge to break down. 
    
    Suddenly, as you may see in [issue's 
traceback](https://issues.apache.org/jira/browse/SPARK-18523), it's py4j who 
raises too general exception on such kind of problem. I was too expected to see 
there Py4JNetworkError since it's network communication issue, but this didn't 
happen. The really good exceptions are get swallowed somewhere in the middle 
and get just printed to stderr via logging, but I'm not sure how to reraise 
them and how much things this will break.
    
    > It seems like the correct action for the user to take when the Py4J 
bridge breaks is starting over from scratch, either by exiting and re-running 
their notebook or otherwise re-submitting there job.
    
    Yes, that's what happens now: in case of failure we have to shutdown 
notebook, start it and re-run all the cells again. If we're not running in 
notebook - crash whole the script. Here comes two issues:
    
    1. Usability. If you made some mistake or Spark job eventually fails, you 
wouldn't restart whole the notebook, but run cell with `sc.stop` and else 
cleanup stuff and re-run your Spark cells. That's simple procedure. But when 
Spark context stop fails, you have to follow plan B: restart all the things and 
re-run all the cells. That's could be quite boring and actually it is.
    
    2. Correctness. SparkContext is a global shared mutable object and if we 
cannot correctly reset it state to default to start over that feels like 
something really wrong here. Should we run all the code that uses SparkContext 
in subprocesses just to be able to implement retry logic otherwise? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to