Github user actuaryzhang commented on the issue:
https://github.com/apache/spark/pull/16630
@yanboliang Thanks for the suggestions. I have made a new commit that
addresses your comments.
In the new version, I used an array of tuple to represent the coefficient
matrix. I used tuple because I have mixed type of string and double (it's
necessary to store the feature names since they also depend on whether there is
intercept). I then wrote a `showString` function similar to that in the
`DataSet` class that compiles all summary info into a string, and defined show
methods to print out the estimated model. The output is very similar to that in
R except that I did not show the residuals and significance levels. Please let
me know your thoughts on this update.
Below is an example of the call and the output:
```
model.summary.show()
+-----------+--------+--------+------+------+
| Feature|Estimate|StdError|TValue|PValue|
+-----------+--------+--------+------+------+
|(Intercept)| 0.790| 4.013| 0.197| 0.862|
| features_0| 0.226| 2.115| 0.107| 0.925|
| features_1| 0.468| 0.582| 0.804| 0.506|
+-----------+--------+--------+------+------+
(Dispersion parameter for gaussian family taken to be 14.516)
Null deviance: 46.800 on 2 degrees of freedom
Residual deviance: 29.032 on 2 degrees of freedom
AIC: 30.984
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]