WeichenXu123 commented on issue #24898: [SPARK-22340][PYTHON] Add a mode to pin Python thread into JVM's URL: https://github.com/apache/spark/pull/24898#issuecomment-503634199 @HyukjinKwon I suggest test pinned threads via the following code: ``` import random import time from functools import reduce import threading num_threads = 100 loop_cnt_each_thread = 10 jvm_thread_id_list = [None] * num_threads is_thread_check_pass = [True] * num_threads def test_jvm_thread_id(thread_idx): time.sleep(random.random()) tid = spark._jvm.java.lang.Thread.currentThread().getId() jvm_thread_id_list[thread_idx] = tid for i in range(loop_cnt_each_thread): cid = spark._jvm.java.lang.Thread.currentThread().getId() if cid != tid: is_thread_check_pass[thread_idx] = False break time.sleep(random.random()) threads = [None] * num_threads for i in range(num_threads): threads[i] = threading.Thread(target=test_jvm_thread_id, args=(i,)) for t in threads: t.start() for t in threads: t.join() assert reduce(lambda x,y: x and y, is_thread_check_pass) assert num_threads == len(set(jvm_thread_id_list)) ``` The above code can check: * In one python thread, if we call jvm methods multiple times, the underlying jvm code will be run in a fixed jvm thread. * If there're multiple python threads, then each python thread corresponds to different underlying jvm thread. My test code use `spark._jvm.java.lang.Thread.currentThread().getId()` so that we can directly check the jvm thread id.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
