Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12248#discussion_r58966245
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
    @@ -206,6 +210,11 @@ private[spark] object Task {
           dataOut.writeLong(timestamp)
         }
     
    +    // Write the task properties separately so it is available before full 
task deserialization.
    --- End diff --
    
    Since the properties aren't transient in `Task`, I guess this means that 
we'll write them out twice. If we want to avoid this, we can make 
`localProperties` into a `@transient` `var` which is `private[spark]` then 
re-set the field after deserializing the task. Tasks are send to executors 
using broadcast variables, so the extra space only makes a different for the 
first task from a stage that's run on an executor.
    
    As a result, if we think that these serialized properties will typically be 
small then the extra space savings probably aren't a huge deal, but if we want 
to heavily optimize then we can do the `var` trick.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to