[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

kxepal Tue, 02 May 2017 14:36:07 -0700

Github user kxepal commented on the issue:

    https://github.com/apache/spark/pull/17671
  
    Hm...I'd read about broadcast variables, but never tried to use them. 
However, after quick look and try, I found that this won't change things too 
much. 
    
    Yes, you will be able to pass client instance to all the executors, but 
still you'll have to modify all the UDF and rest functions to capture exception 
with sentry client by wrapping all the body with `try: ... except: 
raven_client.captureException()`. And if we have a lambdas, we'll have to 
rewrite those completely. 
    
    In the best case, this could be reduced to some decorator which will take 
care about all these routines, but still you'll have to remember to use it all 
the time. And also, you can easily hit the same issue I did by using default 
threaded Sentry client transport - in some cases it isn't able to send an 
exception to service before pyspark.worker calls `sys.exit(1)`. Such gotchas 
quite hard to catch.
    
    This way may be good from point of, say, design, but the goal to simplify 
pyspark developing experience will not be reached in this case. Well, at least, 
we can have it better.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

Reply via email to