srowen commented on issue #23549: [SPARK-26616][MLlib] Expose document frequency in IDFModel URL: https://github.com/apache/spark/pull/23549#issuecomment-455154617 Yeah this is confusing to me, even though I've seen these implementations for years. It seems like the `MLReader` / `MLWriter` traits are the right ones to use, as they superseded `Saveable` and `Loader`. The "V1_0" mechanism also appears to be 'old', as it seems used only in .mllib implementations. I am not sure it would work here as older models weren't written in this way. I'm also ignoring the "GeneralMLWritable" trait as that seems to be there to support PMML. It feels not-great to use the Spark version for versioning, but many newer implementations appear to. And, it may be just fine to use _major_ version, as we can in this case, as this would go into Spark 3. After the email thread and follow-up comments, I now think you were closer to right originally. The saveImpl and load methods can just switch on Spark version to know what to expect. That also feels like less overkill for just a few extra fields. (In fact, I wonder why some of these implementations don't just change behavior based on the presence of fields in the Parquet file.) Could I ask you to back out the most recent change, and try the simpler implementation based on Spark version? In the end that looks more consistent with the newer .ml code. I apologize because I think you already squashed your branch.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
