Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/6663#discussion_r31790035
--- Diff: core/src/main/scala/org/apache/spark/SerializableWritable.scala
---
@@ -41,7 +40,6 @@ class SerializableWritable[T <: Writable](@transient var
t: T) extends Serializa
private def readObject(in: ObjectInputStream): Unit =
Utils.tryOrIOException {
in.defaultReadObject()
val ow = new ObjectWritable()
- ow.setConf(new Configuration())
--- End diff --
I think that this, and SerilaizableWritable more generally, may be a huge
source of perf. bottlenecks for short tasks. A common use of
SerializableWritable is in serializing Hadoop Configurations, but it seems kind
of crazy to create and discard a new Configuration in order to be able to
deserialize the driver-provided conf. Maybe we can make a substitute for
SerializableWritable which only deals with Configuration subclasses and just
calls `writeFields()` and `readFields()` directly. This would sidestep a lot
of the performance penalties involved in creating Configuration objects and
having them spend tons of time loading defaults.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]