imatiach-msft commented on issue #22087: [SPARK-25097][ML] Support prediction 
on single instance in KMeans/BiKMeans/GMM
URL: https://github.com/apache/spark/pull/22087#issuecomment-465847633
 
 
   @erikerlandson 
   I think this PR is good as is, we should decouple what it is fixing and the 
requirement to have a better base class structure.  I agree that longer-term we 
need to figure out a better base class structure that brings some of the parts 
of predictor without the supervised learning requirements.  I suggested to 
break the class hierarchy into unsupervisedmodel/supervisedmodel that inherit 
from predictionmodel above.  Removing training params from the model would be a 
much more radical change across all learners.  In Spark ML we share a lot of 
parameters on both the estimator and model, even when those parameters are only 
used during training, which is a little weird.  I actually do agree that it 
would be a better and more clean change logically to only have training-related 
parameters on the estimator (and base classes of estimator), but it would put 
us in backwards compatibility hell, as there is probably tons of user code that 
relies on this fact.  Moreover, it may be that the label is in some cases used 
in some of the models for some reason, most likely not but we should make sure. 
 I think we would need to do more research into that approach.  However, if it 
were done I don't think it would make a lot of users happy, because suddenly 
parameters would be missing from the model which they could previously access.  
While those parameters logically don't _need_ to be there, they still do offer 
information about how the model was created so in some sense you could argue 
that it is good to have them on the model.  I guess it's just a matter of 
opinion, the important thing is to be consistent across the entire codebase.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to