Github user dorx commented on the pull request:
https://github.com/apache/spark/pull/1581#issuecomment-50101155
The issue is what other things we can reasonably serialize into 8 bytes.
Not sure how other types of doubles are relevant here since the size would be
different and cause problems right away. Longs are also 8 bytes, so would some
scheme of serializing an array of 2 shorts/chars. It's a tradeoff between
efficiency and safety. We can remove the magic byte assuming no one's ever
going to serialize an RDD of Doubles into 8-byte arrays and then use a Long
deser on the 8-byte array. A compromise would be embedding the type metadata in
the RDD of bytearray itself so we don't incur the cost of per point blow-up.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---