HyukjinKwon commented on a change in pull request #24898: [SPARK-22340][PYTHON]
Add a mode to pin Python thread into JVM's
URL: https://github.com/apache/spark/pull/24898#discussion_r338841999
##########
File path: python/pyspark/context.py
##########
@@ -1010,13 +1010,42 @@ def setJobGroup(self, groupId, description,
interruptOnCancel=False):
ensure that the tasks are actually stopped in a timely manner, but is
off by default due
to HDFS-1208, where HDFS may respond to Thread.interrupt() by marking
nodes as dead.
"""
+ warnings.warn(
Review comment:
What I am worries are ..
Firstly, people use it although it's buggy because it kind of works okay in
single thread without a pin-thread mode. Seems like there is still possibility
that another thread is launched and local properties are reset though.
Secondly, even with pin-thread mode, it does not work properly about
inherited threads, yes, as you said.
About warning vs info, PySpark currently does not have a proper logging
system .. so we should rely on manual printing out or warning. If we use manual
printing way, it's tricky for users to control it. In case of warning, they can
control, for instance, if they want to print out the warning only for the
initial call or every call.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]