Github user sethah commented on the issue:
https://github.com/apache/spark/pull/14834
| numClasses | isMultinomial| coefficientMatrix size|
| ------------- |:-------------:| -----:|
|3+|true|3+ x numFeatures|
|2|true|2 x numFeatures|
|2|false|1 x numFeatures|
The current behavior is as follows:
* If it is binary classification trained with multinomial family, then we
store `2 x numFeatures` coefficients in a matrix. We will predict with this
matrix (i.e. we do not convert to `1 x numFeatures`).
* If it is binary classification trained with binomial family, then we
store `1 x numFeatures` (i.e. these coefficients are pivoted) and we use a
`DenseVector` instead of a matrix for prediction.
The coefficients are stored in an array, truly. There is always
`coefficientMatrix` which is backed by that array and in some cases has only 1
row. When it is binomial family, we also have a `cofficients` vector which is
backed by the same array as the matrix. We use that vector for prediction in
the binomial case.
Hopefully that clears it up. I don't think it's necessary to convert the
case of multinomial family but binary classification to `1 x numFeatures` for
prediction since it won't be a regression and users would have to explicitly
specify that family (hopefully knowing the consequences of that choice).
I also vote for Option 2 in the original description. We can avoid any
regressions with past versions and the implementation isn't too messy.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]