Github user JoshRosen commented on a diff in the pull request:
https://github.com/apache/spark/pull/12248#discussion_r58966245
--- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
@@ -206,6 +210,11 @@ private[spark] object Task {
dataOut.writeLong(timestamp)
}
+ // Write the task properties separately so it is available before full
task deserialization.
--- End diff --
Since the properties aren't transient in `Task`, I guess this means that
we'll write them out twice. If we want to avoid this, we can make
`localProperties` into a `@transient` `var` which is `private[spark]` then
re-set the field after deserializing the task. Tasks are send to executors
using broadcast variables, so the extra space only makes a different for the
first task from a stage that's run on an executor.
As a result, if we think that these serialized properties will typically be
small then the extra space savings probably aren't a huge deal, but if we want
to heavily optimize then we can do the `var` trick.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]