JoshRosen commented on a change in pull request #24916: [SPARK-28112] [TEST] 
Fix Kryo exception perf. bottleneck in tests due to absence of ML/MLlib classes
URL: https://github.com/apache/spark/pull/24916#discussion_r295557440
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala
 ##########
 @@ -88,6 +88,50 @@ class KryoSerializer(conf: SparkConf)
   private val useUnsafe = conf.get(KRYO_USE_UNSAFE)
   private val usePool = conf.get(KRYO_USE_POOL)
 
+  // classForName() is expensive in case the class is not found, so we filter 
the list of
+  // SQL / ML / MLlib classes once and then re-use that filtered list in 
newInstance() calls.
+  private lazy val loadableClasses: Seq[Class[_]] = {
 
 Review comment:
   Could this be moved into a `private[serializer]` field in a `object 
KryoSerializer` companion? Now that I look at this again, I'm worried that 
it'll be serialized as part of `KryoSerializer` itself, since I think the 
serializer itself is serialized as part of `ShuffleDependency`. I don't think 
that's a _huge_ deal but we could probably shave off some additional work with 
that extra step.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to