Yes I think that footnote could be a lot more prominent, or pulled up right under the table.
I also think it would be fine to present the {0,1} formulation. It's actually more recognizable, I think, for log-loss in that form. It's probably less recognizable for hinge loss, but, consistency is more important. There's just an extra (2y-1) term, at worst. The loss here is per instance, and implicitly summed over all instances. I think that is probably not confusing for the reader; if they're reading this at all to double-check just what formulation is being used, I think they'd know that. But, it's worth a note. The loss is summed in the case of log-loss, not multiplied (if that's what you're saying). Those are decent improvements, feel free to open a pull request / JIRA. On Mon, Sep 26, 2016 at 6:22 AM, Tobi Bosede <ani.to...@gmail.com> wrote: > The loss function here for logistic regression is confusing. It seems to > imply that spark uses only -1 and 1 class labels. However it uses 0,1 as the > very inconspicuous note quoted below (under Classification) says. We need to > make this point more visible to avoid confusion. > > Better yet, we should replace the loss function listed with that for 0, 1 no > matter how mathematically inconvenient, since that is what is actually > implemented in Spark. > > More problematic, the loss function (even in this "convenient" form) is > actually incorrect. This is because it is missing either a summation (sigma) > in the log or product (pi) outside the log, as the loss for logistic is the > log likelihood. So there are multiple problems with the documentation. > Please advise on steps to fix for all version documentation or if there are > already some in place. > > "Note that, in the mathematical formulation in this guide, a binary label > y is denoted as either +1 (positive) or −1 (negative), which is convenient > for the formulation. However, the negative label is represented by 0 in > spark.mllib instead of −1, to be consistent with multiclass labeling." --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org