[GitHub] srowen commented on issue #23549: [SPARK-26616][MLlib] Expose document frequency in IDFModel

GitBox Thu, 17 Jan 2019 04:27:46 -0800

srowen commented on issue #23549: [SPARK-26616][MLlib] Expose document 
frequency in IDFModel
URL: https://github.com/apache/spark/pull/23549#issuecomment-455154617
 
 
   Yeah this is confusing to me, even though I've seen these implementations 
for years. It seems like the `MLReader` / `MLWriter` traits are the right ones 
to use, as they superseded `Saveable` and `Loader`.
   
   The "V1_0" mechanism also appears to be 'old', as it seems used only in 
.mllib implementations. I am not sure it would work here as older models 
weren't written in this way.
   
   I'm also ignoring the "GeneralMLWritable" trait as that seems to be there to 
support PMML.
   
   It feels not-great to use the Spark version for versioning, but many newer 
implementations appear to. And, it may be just fine to use _major_ version, as 
we can in this case, as this would go into Spark 3.
   
   After the email thread and follow-up comments, I now think you were closer 
to right originally. The saveImpl and load methods can just switch on Spark 
version to know what to expect. That also feels like less overkill for just a 
few extra fields. (In fact, I wonder why some of these implementations don't 
just change behavior based on the presence of fields in the Parquet file.)
   
   Could I ask you to back out the most recent change, and try the simpler 
implementation based on Spark version? In the end that looks more consistent 
with the newer .ml code. I apologize because I think you already squashed your 
branch.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] srowen commented on issue #23549: [SPARK-26616][MLlib] Expose document frequency in IDFModel

Reply via email to