viirya commented on a change in pull request #24958: [SPARK-28153][PYTHON] Use
AtomicReference at InputFileBlockHolder (to support input_file_name with Python
UDF)
URL: https://github.com/apache/spark/pull/24958#discussion_r297198229
##########
File path: core/src/main/scala/org/apache/spark/rdd/InputFileBlockHolder.scala
##########
@@ -68,11 +71,17 @@ private[spark] object InputFileBlockHolder {
require(filePath != null, "filePath cannot be null")
require(startOffset >= 0, s"startOffset ($startOffset) cannot be negative")
require(length >= 0, s"length ($length) cannot be negative")
- inputBlock.set(new FileBlock(UTF8String.fromString(filePath), startOffset,
length))
+ inputBlock.get().set(new FileBlock(UTF8String.fromString(filePath),
startOffset, length))
}
/**
* Clears the input file block to default value.
*/
def unset(): Unit = inputBlock.remove()
+
+ /**
+ * Initializes thread local by explicitly getting the value. It triggers
ThreadLocal's
+ * initialValue in the parent thread.
+ */
+ def initialize(): Unit = inputBlock.get()
Review comment:
Do we need this? The parent thread should always create thread local
variable before the child thread, right?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]