[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

kxepal Tue, 05 Dec 2017 06:36:49 -0800

Github user kxepal commented on the issue:

    https://github.com/apache/spark/pull/17671
  
    > If this is the reason to add the support of thirdparty library, it sounds 
not quite compelling. I think you can even just simply monkey-patch udf or 
UserDefinedFunction. It wouldn't be too difficult.
    
    No, the main reason is to greatly improve debugging experience for pyspark 
UDFs without a lot of code change. Pyspark worker is a perfect place to handle 
whose errors. 
    
    I don't think that monkey-patch is a good way to go. It's basically, 
hackery, which is unstable and can be eventually broken. And you'll have to 
copy-paste it from project to project to have good error reporting. 
    
    Compare this with simply install debugger package on worker side (raven for 
this PR) and pass at least one configuration option via SparkConfig - that's 
enough to let all your errors being caught.
    
    > I wonder if we could maybe make a mechanism for this that would be useful 
beyond just sentry but also things like connecting Python debuggers
    
    That would be a great, but I'm not familiar with the others error 
management systems like Sentry. We can start with the few now (Sentry will 
cover most of Python users) and then figure something else. Like plugins via 
entry points which are provided by setuptools / pkg_resources - in this case



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #17671: [SPARK-20368][PYSPARK] Provide optional support for Sent...

Reply via email to