HyukjinKwon commented on a change in pull request #24898: [SPARK-22340][PYTHON]
Add a mode to pin Python thread into JVM's
URL: https://github.com/apache/spark/pull/24898#discussion_r338841999
##########
File path: python/pyspark/context.py
##########
@@ -1010,13 +1010,42 @@ def setJobGroup(self, groupId, description,
interruptOnCancel=False):
ensure that the tasks are actually stopped in a timely manner, but is
off by default due
to HDFS-1208, where HDFS may respond to Thread.interrupt() by marking
nodes as dead.
"""
+ warnings.warn(
Review comment:
What I am worries are ..
Firstly, people use it although it's buggy because it kind of works okay in
single thread without a pin-thread mode. Seems like there is still possibility
that another thread is launched and local properties are reset though.
Secondly, even with pin-thread mode, it does not work properly about
inherited threads, yes, as you said.
About warning vs info, PySpark currently does not have a proper logging
system .. so we should rely on manual printing out or `warning` module (which
can be integrated logging system later if we happen to add it to PySpark).
If we use manual printing way, it's tricky for users to control it. In case
of warning, they can control, for instance, if they want to print out the
warning only for the initial call or every call.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]