Github user kxepal commented on the issue:
https://github.com/apache/spark/pull/17671
> If this is the reason to add the support of thirdparty library, it sounds
not quite compelling. I think you can even just simply monkey-patch udf or
UserDefinedFunction. It wouldn't be too difficult.
No, the main reason is to greatly improve debugging experience for pyspark
UDFs without a lot of code change. Pyspark worker is a perfect place to handle
whose errors.
I don't think that monkey-patch is a good way to go. It's basically,
hackery, which is unstable and can be eventually broken. And you'll have to
copy-paste it from project to project to have good error reporting.
Compare this with simply install debugger package on worker side (raven for
this PR) and pass at least one configuration option via SparkConfig - that's
enough to let all your errors being caught.
> I wonder if we could maybe make a mechanism for this that would be useful
beyond just sentry but also things like connecting Python debuggers
That would be a great, but I'm not familiar with the others error
management systems like Sentry. We can start with the few now (Sentry will
cover most of Python users) and then figure something else. Like plugins via
entry points which are provided by setuptools / pkg_resources - in this case
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]