HyukjinKwon commented on a change in pull request #26588:
[SPARK-22340][PYTHON][FOLLOW-UP] Add a better message and improve documentation
for pinned thread mode
URL: https://github.com/apache/spark/pull/26588#discussion_r348283454
##########
File path: python/pyspark/context.py
##########
@@ -1008,60 +1009,61 @@ def setJobGroup(self, groupId, description,
interruptOnCancel=False):
ensure that the tasks are actually stopped in a timely manner, but is
off by default due
to HDFS-1208, where HDFS may respond to Thread.interrupt() by marking
nodes as dead.
- .. note:: Currently, setting a group ID (set to local properties) with
a thread does
- not properly work. Internally threads on PVM and JVM are not
synced, and JVM thread
- can be reused for multiple threads on PVM, which fails to isolate
local properties
- for each thread on PVM. To work around this, you can set
`PYSPARK_PIN_THREAD` to
+ .. note:: Currently, setting a group ID (set to local properties) with
multiple threads
+ does not properly work. Internally threads on PVM and JVM are not
synced, and JVM
+ thread can be reused for multiple threads on PVM, which fails to
isolate local
+ properties for each thread on PVM.
+
+ To work around this, you can set `PYSPARK_PIN_THREAD` to
`'true'` (see SPARK-22340). However, note that it cannot inherit
the local properties
from the parent thread although it isolates each thread on PVM and
JVM with its own
- local properties. To work around this, you should manually copy
and set the local
+ local properties.
+
+ To work around this, you should manually copy and set the local
properties from the parent thread to the child thread when you
create another thread.
- """
- warnings.warn(
- "Currently, setting a group ID (set to local properties) with a
thread does "
- "not properly work. "
- "\n"
- "Internally threads on PVM and JVM are not synced, and JVM thread
can be reused "
- "for multiple threads on PVM, which fails to isolate local
properties for each "
- "thread on PVM. "
- "\n"
- "To work around this, you can set PYSPARK_PIN_THREAD to true (see
SPARK-22340). "
- "However, note that it cannot inherit the local properties from
the parent thread "
- "although it isolates each thread on PVM and JVM with its own
local properties. "
- "\n"
- "To work around this, you should manually copy and set the local
properties from "
- "the parent thread to the child thread when you create another
thread.",
- UserWarning)
+ Workaround class can be written as below (unofficial way):
+
+ >>> class CustomThread(threading.Thread):
+ >>> def __init__(self, sc, target, *args, **kwargs):
+ >>> properties = sc._jsc.sc().getLocalProperties()
+ >>> def copy_local_properties(*a, **k):
+ >>> sc._jsc.sc().setLocalProperties(properties)
Review comment:
Yeah, but not sure how we can get the all keys...
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]