Github user jkbradley commented on the pull request:
https://github.com/apache/spark/pull/7099#issuecomment-119365405
@dbtsai After discussing with @feynmanliang I think we have a plan. Here's
the breakdown of the issues with transient & serialization:
Serialization could be important for:
(1) Java serialization for persisting models & stats
(2) sending models & stats to workers
Marking some things as transient should fix these issues, at least partly.
Assuming we mark enough data members as transient, then these 2 issues are
answered as follows:
(1) Java serialization:
* Users should be able to save models as POJOs and load them back.
However, stats will not be saved.
* We can later add support for saving stats to our native model save/load.
(2) sending to workers:
* We can send models to workers for predictions.
* We will not assume stats are available on workers.
Given these decisions, it suffices to mark the summary data member stored
in LinearRegressionModel as transient, and not making the results & metrics
classes Serializable.
Later on, we could modify what is transient, or make more things
Serializable. (Changing transient modifiers does not break binary
compatibility.)
@dbtsai Does this summary seem reasonable?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]