[GitHub] [spark] huaxingao commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

GitBox Wed, 04 Mar 2020 22:06:27 -0800

huaxingao commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] 
Update ml-guide and ml-migration-guide for 3.0 release
URL: https://github.com/apache/spark/pull/27785#discussion_r388093510


 ##########
 File path: docs/ml-guide.md
 ##########
 @@ -87,31 +85,41 @@ To use MLlib in Python, you will need 
[NumPy](http://www.numpy.org) version 1.4
 [^1]: To learn more about the benefits and background of system optimised 
natives, you may wish to
     watch Sam Halliday's ScalaX talk on [High Performance Linear Algebra in 
Scala](http://fommil.github.io/scalax14/#/).
 
-# Highlights in 2.3
+# Highlights in 3.0
 
-The list below highlights some of the new features and enhancements added to 
MLlib in the `2.3`
+The list below highlights some of the new features and enhancements added to 
MLlib in the `3.0`
 release of Spark:
 
-* Built-in support for reading images into a `DataFrame` was added
-([SPARK-21866](https://issues.apache.org/jira/browse/SPARK-21866)).
-* [`OneHotEncoderEstimator`](ml-features.html#onehotencoderestimator) was 
added, and should be
-used instead of the existing `OneHotEncoder` transformer. The new estimator 
supports
-transforming multiple columns.
-* Multiple column support was also added to `QuantileDiscretizer` and 
`Bucketizer`
-([SPARK-22397](https://issues.apache.org/jira/browse/SPARK-22397) and
-[SPARK-20542](https://issues.apache.org/jira/browse/SPARK-20542))
-* A new [`FeatureHasher`](ml-features.html#featurehasher) transformer was added
- ([SPARK-13969](https://issues.apache.org/jira/browse/SPARK-13969)).
-* Added support for evaluating multiple models in parallel when performing 
cross-validation using
-[`TrainValidationSplit` or `CrossValidator`](ml-tuning.html)
-([SPARK-19357](https://issues.apache.org/jira/browse/SPARK-19357)).
-* Improved support for custom pipeline components in Python (see
-[SPARK-21633](https://issues.apache.org/jira/browse/SPARK-21633) and 
-[SPARK-21542](https://issues.apache.org/jira/browse/SPARK-21542)).
-* `DataFrame` functions for descriptive summary statistics over vector columns
-([SPARK-19634](https://issues.apache.org/jira/browse/SPARK-19634)).
-* Robust linear regression with Huber loss
-([SPARK-3181](https://issues.apache.org/jira/browse/SPARK-3181)).
+* Multiple columns support was added to `Binarizer`, `StringIndexer`, 
`StopWordsRemover` and PySpark `QuantileDiscretizer`
+([SPARK-23578](https://issues.apache.org/jira/browse/SPARK-23578)),
+([SPARK-11215](https://issues.apache.org/jira/browse/SPARK-11215)),
+([SPARK-29808](https://issues.apache.org/jira/browse/SPARK-29808)),
+([SPARK-22796](https://issues.apache.org/jira/browse/SPARK-22796)).
+* Support Tree-Based Feature Transformation was added
+([SPARK-13677](https://issues.apache.org/jira/browse/SPARK-13677)).
+* Two new evaluators `MultilabelClassificationEvaluator` and 
`RankingEvaluator` were added
+([SPARK-16692](https://issues.apache.org/jira/browse/SPARK-16692)),
+([SPARK-28045](https://issues.apache.org/jira/browse/SPARK-28045)).
+* Sample weights support was added in `DecisionTreeClassifier/Regressor`, 
`RandomForestClassifier/Regressor`, `BisectingKMeans`, `KMeans` and 
`GaussianMixture`
+([SPARK-19591](https://issues.apache.org/jira/browse/SPARK-19591)),
+([SPARK-9478](https://issues.apache.org/jira/browse/SPARK-9478)),
+([SPARK-30351](https://issues.apache.org/jira/browse/SPARK-30351)),
+([SPARK-29967](https://issues.apache.org/jira/browse/SPARK-29967)),
+([SPARK-30102](https://issues.apache.org/jira/browse/SPARK-30102)).
+* R API for `PowerIterationClustering` was added
+([SPARK-19827](https://issues.apache.org/jira/browse/SPARK-19827)).
+* Added Spark ML listener for tracking ML pipeline status
+([SPARK-23674](https://issues.apache.org/jira/browse/SPARK-23674)).
+* Fit with validation set was added to Gradient Boosted Trees in Python
+([SPARK-24333](https://issues.apache.org/jira/browse/SPARK-24333)).
+* [`RobustScaler`](ml-features.html#robustscaler) transformer was added
+([SPARK-28399](https://issues.apache.org/jira/browse/SPARK-28399)).
+* [`Factorization 
Machines`](ml-classification-regression.html#factorization-machines) classifier 
and regressor were added
+([SPARK-29224](https://issues.apache.org/jira/browse/SPARK-29224)).
+* Complement Naive Bayes Classifier was added
+([SPARK-29942](https://issues.apache.org/jira/browse/SPARK-29942)).
+* ML function parity between Scala and Python
+([SPARK-28958](https://issues.apache.org/jira/browse/SPARK-28958)).
 
 
 Review comment:
   Will add all these you have mentioned. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] huaxingao commented on a change in pull request #27785: [SPARK-30934][ML][DOCS] Update ml-guide and ml-migration-guide for 3.0 release

Reply via email to