HyukjinKwon opened a new pull request #24958: [SPARK-28153][PYTHON] Use 
AtomicReference at InputFileBlockHolder (to support input_file_name with Python 
UDF)
URL: https://github.com/apache/spark/pull/24958
 
 
   ## What changes were proposed in this pull request?
   
   This PR proposes to use `AtomicReference` so that parent and child threads 
can access to the same file block holder.
   
   Python UDF expressions are turned to a plan and then it launches a separate 
thread to consume the input iterator.
   
   1. In this separate child thread, if it happens to call 
`InputFileBlockHolder.set` first without initialization of the parent's thread 
local (which is done when the `ThreadLocal.get()` is first called), the child 
thread seems calling its own `initialValue` to initialize.
   
   2. After that, the parent calls its own `initialValue` to initializes at the 
first call of `ThreadLocal.get()`.
   
   3. Both now have two different references. Updating at child isn't reflected 
to parent.
   
   This PR fixes it via holding one reference of `AtomicReference` to the 
thread local so that they can be used in each task.
   
   ## How was this patch tested?
   
   Manually tested and unittest was added.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to