HyukjinKwon commented on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's URL: https://github.com/apache/spark/pull/24898#issuecomment-529158091 @squito, given the JIRA description at SPARK-29017, seems the analysis is matched with here. I also echo with: > I think the right way to fix this is to keep a python thread-local tracking these properties, and then sending them through to the JVM on calls to submitJob. This is going to be a headache to get right, though; we've also got to handle implicit calls, eg. rdd.collect(), rdd.forEach(), etc. And of course users may have defined their own functions, which will be broken until they fix it to use the same thread-locals. My impression was that, to do this, we should basically land some fixes into Py4J to store and set local properties for every command interaction - in my case, I didn't take a super close look for this yet because I thought this way is easier and cleaner with some minimised changes. so .. It needs some discussion and agreement on the approach we will take.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
