[GitHub] spark pull request: [SPARK-3251][MLLIB]: Clarify learning interfac...

BigCrunsh Thu, 11 Sep 2014 06:14:37 -0700

Github user BigCrunsh commented on the pull request:

    https://github.com/apache/spark/pull/2137#issuecomment-55261445
  
    I have to admit that this PR may try to address too many issues at once. It 
think the major ones are: Ideally, 
    - the model should be immutable and stateless;
    - the output type of ``predict`` should neither depend on whether 
``threshold`` is set or not nor on the kind of model;
    - the model should provide access to all variables of interest (scores, 
classes, probabilities);
    - we need a distinction between multi-class and binary classification model 
that inherent from GLMs.
    
    My suggestions for models that inherit from GLMs are:
    - introduce more specific  ``predict`` functions that distinguish between 
(inner products) scores, probabilities (no matter of the naming), and the 
classes (might be nice to have some traits for that too);
    - extend the hierarchy of models; it seems to be necessary to have a 
distinction between multi- and binary class.
    - remove ``clearThreshold``.
    
    I think it make sense to address these issues first, before we start 
implementing new algorithms. If we can agree on some of these points, I would 
be happy to help and break down this PR (and also  implementing further 
algorithms as isotonic or multiclass logistic regression).




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-3251][MLLIB]: Clarify learning interfac...

Reply via email to