This is an automated email from the ASF dual-hosted git repository. srowen pushed a commit to branch branch-3.1 in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.1 by this push: new da013d0 [MINOR][DOCS][ML] Doc 'mode' as a supported Imputer strategy in Pyspark da013d0 is described below commit da013d03276f903ac3b38d2c31f83b430ad96772 Author: Sean Owen <sro...@gmail.com> AuthorDate: Sat Mar 20 01:16:49 2021 -0500 [MINOR][DOCS][ML] Doc 'mode' as a supported Imputer strategy in Pyspark ### What changes were proposed in this pull request? Document `mode` as a supported Imputer strategy in Pyspark docs. ### Why are the changes needed? Support was added in 3.1, and documented in Scala, but some Python docs were missed. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests. Closes #31883 from srowen/ImputerModeDocs. Authored-by: Sean Owen <sro...@gmail.com> Signed-off-by: Sean Owen <sro...@gmail.com> (cherry picked from commit ed641fbad69197dc0da0073245adcc9387d03e8e) Signed-off-by: Sean Owen <sro...@gmail.com> --- docs/ml-features.md | 4 ++-- mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala | 2 +- python/pyspark/ml/feature.py | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/ml-features.md b/docs/ml-features.md index b36b076..e01acfd0 100644 --- a/docs/ml-features.md +++ b/docs/ml-features.md @@ -1497,8 +1497,8 @@ for more details on the API. ## Imputer -The `Imputer` estimator completes missing values in a dataset, either using the mean or the -median of the columns in which the missing values are located. The input columns should be of +The `Imputer` estimator completes missing values in a dataset, using the mean, median or mode +of the columns in which the missing values are located. The input columns should be of numeric type. Currently `Imputer` does not support categorical features and possibly creates incorrect values for columns containing categorical features. Imputer can impute custom values other than 'NaN' by `.setMissingValue(custom_value)`. For example, `.setMissingValue(0)` will impute diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala b/mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala index d0b6ab1..71403ac 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala @@ -97,7 +97,7 @@ private[feature] trait ImputerParams extends Params with HasInputCol with HasInp } /** - * Imputation estimator for completing missing values, either using the mean or the median + * Imputation estimator for completing missing values, using the mean, median or mode * of the columns in which the missing values are located. The input columns should be of * numeric type. Currently Imputer does not support categorical features * (SPARK-15041) and possibly creates incorrect values for a categorical feature. diff --git a/python/pyspark/ml/feature.py b/python/pyspark/ml/feature.py index 4e8b8b4..cd0e287 100755 --- a/python/pyspark/ml/feature.py +++ b/python/pyspark/ml/feature.py @@ -1536,7 +1536,7 @@ class _ImputerParams(HasInputCol, HasInputCols, HasOutputCol, HasOutputCols, Has @inherit_doc class Imputer(JavaEstimator, _ImputerParams, JavaMLReadable, JavaMLWritable): """ - Imputation estimator for completing missing values, either using the mean or the median + Imputation estimator for completing missing values, using the mean, median or mode of the columns in which the missing values are located. The input columns should be of numeric type. Currently Imputer does not support categorical features and possibly creates incorrect values for a categorical feature. --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org