WeichenXu123 commented on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's URL: https://github.com/apache/spark/pull/24898#issuecomment-514085398 @holdenk > (Because if it's just fixing the job group ID we might be able to find a simpler way to track that information in the Python thread and pass it through each time)? +1 Pin python side thread to jvm thread should be the simplest way. Otherwise multithreads in pyspark will cause confusion on thread-local informations, we need to switch JVM side thread-local infos for different python threads. Current master code is in a buggy state, i.e., If python side have multiple threads, if python side function running time do not overlapped with each other, then they reuse the same jvm thread. If any one python thread function occupy the jvm thread, then new python thread function call will use new jvm thread. This buggy state make us no way to control the thread-local infos in pyspark, and any APIs related to thread-local infos won't work correctly. So we have to fix this.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
