HyukjinKwon commented on a change in pull request #26588: 
[SPARK-22340][PYTHON][FOLLOW-UP] Add a better message and improve documentation 
for pinned thread mode
URL: https://github.com/apache/spark/pull/26588#discussion_r348283454
 
 

 ##########
 File path: python/pyspark/context.py
 ##########
 @@ -1008,60 +1009,61 @@ def setJobGroup(self, groupId, description, 
interruptOnCancel=False):
         ensure that the tasks are actually stopped in a timely manner, but is 
off by default due
         to HDFS-1208, where HDFS may respond to Thread.interrupt() by marking 
nodes as dead.
 
-        .. note:: Currently, setting a group ID (set to local properties) with 
a thread does
-            not properly work. Internally threads on PVM and JVM are not 
synced, and JVM thread
-            can be reused for multiple threads on PVM, which fails to isolate 
local properties
-            for each thread on PVM. To work around this, you can set 
`PYSPARK_PIN_THREAD` to
+        .. note:: Currently, setting a group ID (set to local properties) with 
multiple threads
+            does not properly work. Internally threads on PVM and JVM are not 
synced, and JVM
+            thread can be reused for multiple threads on PVM, which fails to 
isolate local
+            properties for each thread on PVM.
+
+            To work around this, you can set `PYSPARK_PIN_THREAD` to
             `'true'` (see SPARK-22340). However, note that it cannot inherit 
the local properties
             from the parent thread although it isolates each thread on PVM and 
JVM with its own
-            local properties. To work around this, you should manually copy 
and set the local
+            local properties.
+
+            To work around this, you should manually copy and set the local
             properties from the parent thread to the child thread when you 
create another thread.
-        """
-        warnings.warn(
-            "Currently, setting a group ID (set to local properties) with a 
thread does "
-            "not properly work. "
-            "\n"
-            "Internally threads on PVM and JVM are not synced, and JVM thread 
can be reused "
-            "for multiple threads on PVM, which fails to isolate local 
properties for each "
-            "thread on PVM. "
-            "\n"
-            "To work around this, you can set PYSPARK_PIN_THREAD to true (see 
SPARK-22340). "
-            "However, note that it cannot inherit the local properties from 
the parent thread "
-            "although it isolates each thread on PVM and JVM with its own 
local properties. "
-            "\n"
-            "To work around this, you should manually copy and set the local 
properties from "
-            "the parent thread to the child thread when you create another 
thread.",
-            UserWarning)
+            Workaround class can be written as below (unofficial way):
+
+            >>> class CustomThread(threading.Thread):
+            >>>     def __init__(self, sc, target, *args, **kwargs):
+            >>>         properties = sc._jsc.sc().getLocalProperties()
+            >>>         def copy_local_properties(*a, **k):
+            >>>             sc._jsc.sc().setLocalProperties(properties)
 
 Review comment:
   Yeah, but not sure how we can get the all keys...

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to