Github user squito commented on the issue:
https://github.com/apache/spark/pull/15505
@witgo @kayousterhout where do we stand on this and
https://github.com/apache/spark/pull/16053? Both still viable alternatives?
https://github.com/apache/spark/pull/16053 is still missing performance
benchmarks, and given the entire purpose here is performance, I think we need
to wait for those metrics.
But https://github.com/apache/spark/pull/16053 is a much smaller change. I
actually think its a little clearer overall in this version, that serialization
all happens in one place ... but I'm also biased to go for the smaller change
if there isn't really much difference.
I also feel like we're missing a clear description of the overall flow of
serialization -- its rather complicated, between the task binary broadcast, the
task, the task description, where it all happens, etc. (That goes for both
versions -- really its an existing problem, this just seems like the right time
to address it.)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]