Github user JoshRosen commented on the pull request:

    https://github.com/apache/spark/pull/6293#issuecomment-104142646
  
    @sryza, I considered taking that approach but decided against it for a few 
different reasons:
    
    - Calling `reset()` between creating serialization streams should be pretty 
cheap because it's a relatively inexpensive operation and we only call it once 
per stream.
    - Calling `reset()` once per `serialize()` call is sightly more expensive 
from a relative cost perspective, but we have relatively few calls to 
`serialize()` (I think that we mainly / only use `Serializer.serialize` for 
serializing closures).
    - If we guard the reset call with a check to see whether auto-reset is 
enabled, then we probably need to create a boolean field to track whether 
auto-reset is enabled rather than calling the `getAutoReset()` each time.  It 
might be fine to convert `getAutoReset` into a `lazy val` or to otherwise store 
its return value if there aren't any ordering / initialization concerns here.
    
    As far as I know, auto-reset calls reset after each object in the stream, 
whereas here we're only calling it very few times, so I don't suspect that this 
will have a performance penalty.  Therefore, for simplicity's sake we might 
want to avoid the extra logic / checks if they're not necessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to