Github user kxepal commented on the issue:
https://github.com/apache/spark/pull/17671
Hm...I'd read about broadcast variables, but never tried to use them.
However, after quick look and try, I found that this won't change things too
much.
Yes, you will be able to pass client instance to all the executors, but
still you'll have to modify all the UDF and rest functions to capture exception
with sentry client by wrapping all the body with `try: ... except:
raven_client.captureException()`. And if we have a lambdas, we'll have to
rewrite those completely.
In the best case, this could be reduced to some decorator which will take
care about all these routines, but still you'll have to remember to use it all
the time. And also, you can easily hit the same issue I did by using default
threaded Sentry client transport - in some cases it isn't able to send an
exception to service before pyspark.worker calls `sys.exit(1)`. Such gotchas
quite hard to catch.
This way may be good from point of, say, design, but the goal to simplify
pyspark developing experience will not be reached in this case. Well, at least,
we can have it better.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]