mwlon commented on a change in pull request #24085: [SPARK-26555][SQL] make
ScalaReflection subtype checking thread safe
URL: https://github.com/apache/spark/pull/24085#discussion_r265754419
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
##########
@@ -66,19 +66,25 @@ object ScalaReflection extends ScalaReflection {
*/
def dataTypeFor[T : TypeTag]: DataType = dataTypeFor(localTypeOf[T])
+ private[catalyst] def isSubtype(tpe1: `Type`, tpe2: `Type`): Boolean = {
+ this.synchronized {
+ tpe1 <:< tpe2
Review comment:
It is important to workaround this in Spark, since it breaks thread safety
in many important functions like `createDataset`. I can give examples of where
it is necessary to use multithreading.
Trying this with concurrency 5, this seems to slow down subtype checking by
~4%, which I wouldn't worry about since it's never the bottleneck in a Spark
application. I'm happy to change this to larger blocks of locking rather than a
helper function, but it's important that we resolve this issue.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]