Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/4978#discussion_r26296626
--- Diff: docs/mllib-guide.md ---
@@ -102,6 +102,7 @@ In the `spark.mllib` package, there were several
breaking changes. The first ch
* In `DecisionTree`, the deprecated class method `train` has been
removed. (The object/static `train` methods remain.)
* In `Strategy`, the `checkpointDir` parameter has been removed.
Checkpointing is still supported, but the checkpoint directory must be set
before calling tree and tree ensemble training.
* `PythonMLlibAPI` (the interface between Scala/Java and Python for MLlib)
was a public API but is now private, declared `private[python]`. This was
never meant for external use.
+* In linear regression (including Lasso and ridge regression), the squared
loss is now divided by 2. So in order to produce the same result as in 1.2, the
step size you choose needs to be multiplied by 2.
--- End diff --
Hm, it also occurred to me that if the step size doubles, then it affects
the regularization parameter as well. Doesn't it have to be half as large as
well in order to get the same result? I'm probably overlooking something about
the formulation, but I didn't see the reg param updated in
https://github.com/apache/spark/commit/a96b72781ae40bb303613990b8d8b4721b84e1c3
and if the loss term was halved, leaving all else equal, the regularization
term is relatively twice as large right?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]