HyukjinKwon commented on issue #24958: [SPARK-28153][PYTHON] Use AtomicReference at InputFileBlockHolder (to support input_file_name with Python UDF) URL: https://github.com/apache/spark/pull/24958#issuecomment-505657735 I think Py4J is only used at driver side and we're safe on that. `InputFileBlockHolder.getXXX` is executed within an expression and set happens at iterator: ``` core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala: InputFileBlockHolder.set(fs.getPath.toString, fs.getStart, fs.getLength) core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala: InputFileBlockHolder.set(fs.getPath.toString, fs.getStart, fs.getLength) sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileScanRDD.scala: InputFileBlockHolder.set(currentFile.filePath, currentFile.start, currentFile.length) sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FilePartitionReader.scala: InputFileBlockHolder.set(file.filePath, file.start, file.length) ``` For hadoop, hadoop2, DV1 and DV2, it's set at executor's side. Even if there are some spots I missed, Py4j reuses the same threads for different tasks but the job execution call happens one at one time due to GIL and Py4J launches another thread if one thread is busy on JVM. So, it won't happen that one JVM thread somehow launches multiple jobs at the same time. Moreover, I opened a PR to pin thread between PVM and JVM - https://github.com/apache/spark/pull/24898 which might be more correct behaviour (?). If we could switch the mode, it can permanently get rid of this concern.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
