HyukjinKwon commented on issue #24958: [SPARK-28153][PYTHON] Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF) URL: https://github.com/apache/spark/pull/24958#issuecomment-509637062 @brkyvz, we're not rushing - we're not ignoring any issue or holes actually found or merging it without discussion. Also, using thread local isn't a horrible way although it might be less preferred case by case - we can avoid to have one place that multiple tasks access but run them in parallel separately. I get the suggestion makes sense too but adding new way isn't necessarily safe. There are always new holes that can pop up. One conservative way is usually to keep the codes with less changes (you know for instance bug compatibility). In addition, it's rather a general design issue not specific only to this code path. For instance, the codes below: https://github.com/apache/spark/blob/5264164a67df498b73facae207eda12ee133be7d/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/EpochTracker.scala#L26-L31 have the almost similar issue as SPARK-28153 - the parent thread updates the current epoch but the child thread (Python write thread) cannot read it. Ideally we should identify where to fix as well.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
