[GitHub] spark issue #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression interf...

sethah Tue, 06 Sep 2016 20:33:02 -0700

Github user sethah commented on the issue:

    https://github.com/apache/spark/pull/14834
  
    | numClasses        | isMultinomial| coefficientMatrix size|
    | ------------- |:-------------:| -----:|
    |3+|true|3+ x numFeatures|
    |2|true|2 x numFeatures|
    |2|false|1 x numFeatures|
    
    The current behavior is as follows:
    * If it is binary classification trained with multinomial family, then we 
store `2 x numFeatures` coefficients in a matrix. We will predict with this 
matrix (i.e. we do not convert to `1 x numFeatures`). 
    * If it is binary classification trained with binomial family, then we 
store `1 x numFeatures` (i.e. these coefficients are pivoted) and we use a 
`DenseVector` instead of a matrix for prediction.
    
    The coefficients are stored in an array, truly. There is always 
`coefficientMatrix` which is backed by that array and in some cases has only 1 
row. When it is binomial family, we also have a `cofficients` vector which is 
backed by the same array as the matrix. We use that vector for prediction in 
the binomial case. 
    
    Hopefully that clears it up. I don't think it's necessary to convert the 
case of multinomial family but binary classification to `1 x numFeatures` for 
prediction since it won't be a regression and users would have to explicitly 
specify that family (hopefully knowing the consequences of that choice).
    
    I also vote for Option 2 in the original description. We can avoid any 
regressions with past versions and the implementation isn't too messy.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #14834: [SPARK-17163][ML][WIP] Unified LogisticRegression interf...

Reply via email to