Github user MLnick commented on a diff in the pull request:
https://github.com/apache/spark/pull/13378#discussion_r65004142
--- Diff: docs/mllib-guide.md ---
@@ -102,32 +102,54 @@ MLlib is under active development.
The APIs marked `Experimental`/`DeveloperApi` may change in future
releases,
and the migration guide below will explain all changes between releases.
-## From 1.5 to 1.6
+## From 1.6 to 2.0
There are no breaking API changes in the `spark.mllib` or `spark.ml`
packages, but there are
deprecations and changes of behavior.
Deprecations:
-* [SPARK-11358](https://issues.apache.org/jira/browse/SPARK-11358):
- In `spark.mllib.clustering.KMeans`, the `runs` parameter has been
deprecated.
-* [SPARK-10592](https://issues.apache.org/jira/browse/SPARK-10592):
- In `spark.ml.classification.LogisticRegressionModel` and
- `spark.ml.regression.LinearRegressionModel`, the `weights` field has been
deprecated in favor of
- the new name `coefficients`. This helps disambiguate from instance (row)
"weights" given to
- algorithms.
+* [SPARK-14984](https://issues.apache.org/jira/browse/SPARK-14984):
+ In `spark.ml.regression.LinearRegressionSummary`, the `model` field has
been deprecated.
+* [SPARK-13784](https://issues.apache.org/jira/browse/SPARK-13784):
+ In `spark.ml.regression.RandomForestRegressionModel` and
`spark.ml.classification.RandomForestClassificationModel`,
+ the `numTrees` parameter has been deprecated in favor of `getNumTrees`
method.
+* [SPARK-13761](https://issues.apache.org/jira/browse/SPARK-13761):
+ In `spark.ml.param.Params`, the `validateParams` method has been
deprecated.
+ We move all functionality in overridden methods to the corresponding
`transformSchema`.
+* [SPARK-14829](https://issues.apache.org/jira/browse/SPARK-14829):
+ In `spark.mllib` package, `LinearRegressionWithSGD`, `LassoWithSGD`,
`RidgeRegressionWithSGD` and `LogisticRegressionWithSGD` have been deprecated.
+ We encourage users to use `spark.ml.regression.LinearRegresson` and
`spark.ml.classification.LogisticRegresson`.
+* [SPARK-14900](https://issues.apache.org/jira/browse/SPARK-14900):
+ In `spark.mllib.evaluation.MulticlassMetrics`, the parameters
`precision`, `recall` and `fMeasure` have been deprecated in favor of
`accuracy`.
Changes of behavior:
-* [SPARK-7770](https://issues.apache.org/jira/browse/SPARK-7770):
- `spark.mllib.tree.GradientBoostedTrees`: `validationTol` has changed
semantics in 1.6.
- Previously, it was a threshold for absolute change in error. Now, it
resembles the behavior of
- `GradientDescent`'s `convergenceTol`: For large errors, it uses relative
error (relative to the
- previous error); for small errors (`< 0.01`), it uses absolute error.
-* [SPARK-11069](https://issues.apache.org/jira/browse/SPARK-11069):
- `spark.ml.feature.RegexTokenizer`: Previously, it did not convert strings
to lowercase before
- tokenizing. Now, it converts to lowercase by default, with an option not
to. This matches the
- behavior of the simpler `Tokenizer` transformer.
+* [SPARK-7780](https://issues.apache.org/jira/browse/SPARK-7780):
+ `spark.mllib.classification.LogisticRegressionWithLBFGS` directly calls
`spark.ml.classification.LogisticRegresson` for binary classification now.
+ This will introduce the following behavior changes for
`spark.mllib.classification.LogisticRegressionWithLBFGS`:
+ * The intercept will not be regularized when training binary
classification model with L1/L2 Updater.
+ * If users set without regularization, training with or without
feature scaling will return the same solution by the same convergence rate.
+* [SPARK-13429](https://issues.apache.org/jira/browse/SPARK-13429):
+ In order to provide better and consistent result with
`spark.ml.classification.LogisticRegresson`,
+ the default value of
`spark.mllib.classification.LogisticRegressionWithLBFGS`: `convergenceTol` has
been changed from 1E-4 to 1E-6.
+* [SPARK-12363](https://issues.apache.org/jira/browse/SPARK-12363):
+ Fix a bug of `PowerIterationClustering` which will likely change its
result.
+* [SPARK-13048](https://issues.apache.org/jira/browse/SPARK-13048):
+ `LDA` using the `EM` optimizer will keep the last checkpoint by default,
if checkpointing is being used.
+* [SPARK-12153](https://issues.apache.org/jira/browse/SPARK-12153):
+ `Word2Vec` now respects sentence boundaries. Previously, it did not
handle them correctly.
+* [SPARK-10574](https://issues.apache.org/jira/browse/SPARK-10574):
+ `HashingTF` uses `MurmurHash3` as default hash algorithm in both
`spark.ml` and `spark.mllib`.
+* [SPARK-14768](https://issues.apache.org/jira/browse/SPARK-14768):
+ We remove `expectedType` argument for PySpark `Param`.
+* [SPARK-14931](https://issues.apache.org/jira/browse/SPARK-14931):
+ We change some default `Param` values which were mismatched between
pipelines in Scala and Python.
--- End diff --
Some default Param values, which were ... Scala and Python, have been
changed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]