Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/6663#discussion_r31790035
  
    --- Diff: core/src/main/scala/org/apache/spark/SerializableWritable.scala 
---
    @@ -41,7 +40,6 @@ class SerializableWritable[T <: Writable](@transient var 
t: T) extends Serializa
       private def readObject(in: ObjectInputStream): Unit = 
Utils.tryOrIOException {
         in.defaultReadObject()
         val ow = new ObjectWritable()
    -    ow.setConf(new Configuration())
    --- End diff --
    
    I think that this, and SerilaizableWritable more generally, may be a huge 
source of perf. bottlenecks for short tasks.  A common use of 
SerializableWritable is in serializing Hadoop Configurations, but it seems kind 
of crazy to create and discard a new Configuration in order to be able to 
deserialize the driver-provided conf.  Maybe we can make a substitute for 
SerializableWritable which only deals with Configuration subclasses and just 
calls `writeFields()` and `readFields()` directly.  This would sidestep a lot 
of the performance penalties involved in creating Configuration objects and 
having them spend tons of time loading defaults.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to