Github user feynmanliang commented on a diff in the pull request:
https://github.com/apache/spark/pull/8498#discussion_r38233348
--- Diff: docs/mllib-guide.md ---
@@ -56,71 +63,63 @@ This lists functionality included in `spark.mllib`, the
main MLlib API.
* [limited-memory BFGS
(L-BFGS)](mllib-optimization.html#limited-memory-bfgs-l-bfgs)
* [PMML model export](mllib-pmml-model-export.html)
-MLlib is under active development.
-The APIs marked `Experimental`/`DeveloperApi` may change in future
releases,
-and the migration guide below will explain all changes between releases.
-
# spark.ml: high-level APIs for ML pipelines
-Spark 1.2 introduced a new package called `spark.ml`, which aims to
provide a uniform set of
-high-level APIs that help users create and tune practical machine learning
pipelines.
-
-*Graduated from Alpha!* The Pipelines API is no longer an alpha
component, although many elements of it are still `Experimental` or
`DeveloperApi`.
-
-Note that we will keep supporting and adding features to `spark.mllib`
along with the
-development of `spark.ml`.
-Users should be comfortable using `spark.mllib` features and expect more
features coming.
-Developers should contribute new algorithms to `spark.mllib` and can
optionally contribute
-to `spark.ml`.
-
-Guides for `spark.ml` include:
+**[spark.ml programming guide](ml-guide.html)** provides an overview of
the Pipelines API and major
+concepts. It also contains sections on using algorithms within the
Pipelines API, for example:
-* **[spark.ml programming guide](ml-guide.html)**: overview of the
Pipelines API and major concepts
-* Guides on using algorithms within the Pipelines API:
- * [Feature transformers](ml-features.html), including a few not in the
lower-level `spark.mllib` API
- * [Decision trees](ml-decision-tree.html)
- * [Ensembles](ml-ensembles.html)
- * [Linear methods](ml-linear-methods.html)
+* [Feature extractors and transformers](ml-features.html)
+* [Linear methods](ml-linear-methods.html)
+* [Decision trees](ml-decision-tree.html)
+* [Ensembles](ml-ensembles.html)
+* [Artificial neural network](ml-ann.html)
# Dependencies
-MLlib uses the linear algebra package
-[Breeze](http://www.scalanlp.org/), which depends on
-[netlib-java](https://github.com/fommil/netlib-java) for optimised
-numerical processing. If natives are not available at runtime, you
-will see a warning message and a pure JVM implementation will be used
-instead.
+MLlib uses the linear algebra package [Breeze](http://www.scalanlp.org/),
which depends on
+[netlib-java](https://github.com/fommil/netlib-java) for optimised
numerical processing.
+If natives libraries[^1] are not available at runtime, you will see a
warning message and a pure JVM
+implementation will be used instead.
-To learn more about the benefits and background of system optimised
-natives, you may wish to watch Sam Halliday's ScalaX talk on
-[High Performance Linear Algebra in
Scala](http://fommil.github.io/scalax14/#/)).
+Due to licensing issues with runtime proprietary binaries, we do not
include `netlib-java`'s native
+proxies by default.
+To configure `netlib-java` / Breeze to use system optimised binaries,
include
+`com.github.fommil.netlib:all:1.1.2` (or build Spark with `-Pnetlib-lgpl`)
as a dependency of your
+project and read the [netlib-java](https://github.com/fommil/netlib-java)
documentation for your
+platform's additional installation instructions.
-Due to licensing issues with runtime proprietary binaries, we do not
-include `netlib-java`'s native proxies by default. To configure
-`netlib-java` / Breeze to use system optimised binaries, include
-`com.github.fommil.netlib:all:1.1.2` (or build Spark with
-`-Pnetlib-lgpl`) as a dependency of your project and read the
-[netlib-java](https://github.com/fommil/netlib-java) documentation for
-your platform's additional installation instructions.
+To use MLlib in Python, you will need [NumPy](http://www.numpy.org)
version 1.4 or newer.
-To use MLlib in Python, you will need [NumPy](http://www.numpy.org)
-version 1.4 or newer.
+[^1]: To learn more about the benefits and background of system optimised
natives, you may wish to
+ watch Sam Halliday's ScalaX talk on [High Performance Linear Algebra
in Scala](http://fommil.github.io/scalax14/#/).
----
+# Migration guide
-# Migration Guide
+MLlib is under active development.
+The APIs marked `Experimental`/`DeveloperApi` may change in future
releases,
+and the migration guide below will explain all changes between releases.
+
+## From 1.4 to 1.5
-For the `spark.ml` package, please see the [spark.ml Migration
Guide](ml-guide.html#migration-guide).
+In the `spark.mllib` package, there are no break API changes but several
behavior changes:
-## From 1.3 to 1.4
+* [SPARK-9005](https://issues.apache.org/jira/browse/SPARK-9005):
+ `RegressionMetrics.explainedVariance` returns the average regression sum
of squares.
+* [SPARK-8600](https://issues.apache.org/jira/browse/SPARK-8600):
`NaiveBayesModel.labels` become
+ sorted.
+* [SPARK-3382](https://issues.apache.org/jira/browse/SPARK-3382):
`GradientDescent` has a default
+ convergence tolerance `1e-3`, and hence iterations might end earlier
than 1.4.
-In the `spark.mllib` package, there were several breaking changes, but all
in `DeveloperApi` or `Experimental` APIs:
+In the `spark.ml` package, there exists one break API change and one
behavior change:
-* Gradient-Boosted Trees
- * *(Breaking change)* The signature of the
[`Loss.gradient`](api/scala/index.html#org.apache.spark.mllib.tree.loss.Loss)
method was changed. This is only an issues for users who wrote their own
losses for GBTs.
- * *(Breaking change)* The `apply` and `copy` methods for the case
class
[`BoostingStrategy`](api/scala/index.html#org.apache.spark.mllib.tree.configuration.BoostingStrategy)
have been changed because of a modification to the case class fields. This
could be an issue for users who use `BoostingStrategy` to set GBT parameters.
-* *(Breaking change)* The return value of
[`LDA.run`](api/scala/index.html#org.apache.spark.mllib.clustering.LDA) has
changed. It now returns an abstract class `LDAModel` instead of the concrete
class `DistributedLDAModel`. The object of type `LDAModel` can still be cast
to the appropriate concrete type, which depends on the optimization algorithm.
+* [SPARK-9268](https://issues.apache.org/jira/browse/SPARK-9268): Java's
varargs support is removed
+ from `Params.setDefault` due to a
+ [Scala compiler bug](https://issues.scala-lang.org/browse/SI-9013).
+* [SPARK-10097](https://issues.apache.org/jira/browse/SPARK-10097):
`Evaluator.isLargerBetter` is
+ added to indicate metric ordering. Metrics like RMSE no longer flip
signs as in 1.4.
-## Previous Spark Versions
+## Previous Spark versions
Earlier migration guides are archived [on this
page](mllib-migration-guides.html).
+
+---
--- End diff --
Ditto on divider
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]