[spark] branch branch-3.1 updated: [MINOR][DOCS][ML] Doc 'mode' as a supported Imputer strategy in Pyspark

srowen Fri, 19 Mar 2021 23:17:59 -0700

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.1 by this push:
     new da013d0  [MINOR][DOCS][ML] Doc 'mode' as a supported Imputer strategy 
in Pyspark
da013d0 is described below

commit da013d03276f903ac3b38d2c31f83b430ad96772
Author: Sean Owen <sro...@gmail.com>
AuthorDate: Sat Mar 20 01:16:49 2021 -0500

    [MINOR][DOCS][ML] Doc 'mode' as a supported Imputer strategy in Pyspark
    
    ### What changes were proposed in this pull request?
    
    Document `mode` as a supported Imputer strategy in Pyspark docs.
    
    ### Why are the changes needed?
    
    Support was added in 3.1, and documented in Scala, but some Python docs 
were missed.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No
    
    ### How was this patch tested?
    
    Existing tests.
    
    Closes #31883 from srowen/ImputerModeDocs.
    
    Authored-by: Sean Owen <sro...@gmail.com>
    Signed-off-by: Sean Owen <sro...@gmail.com>
    (cherry picked from commit ed641fbad69197dc0da0073245adcc9387d03e8e)
    Signed-off-by: Sean Owen <sro...@gmail.com>
---
 docs/ml-features.md                                            | 4 ++--
 mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala | 2 +-
 python/pyspark/ml/feature.py                                   | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/docs/ml-features.md b/docs/ml-features.md
index b36b076..e01acfd0 100644
--- a/docs/ml-features.md
+++ b/docs/ml-features.md
@@ -1497,8 +1497,8 @@ for more details on the API.
 
 ## Imputer
 
-The `Imputer` estimator completes missing values in a dataset, either using 
the mean or the 
-median of the columns in which the missing values are located. The input 
columns should be of
+The `Imputer` estimator completes missing values in a dataset, using the mean, 
median or mode
+of the columns in which the missing values are located. The input columns 
should be of
 numeric type. Currently `Imputer` does not support categorical features and 
possibly
 creates incorrect values for columns containing categorical features. Imputer 
can impute custom values 
 other than 'NaN' by `.setMissingValue(custom_value)`. For example, 
`.setMissingValue(0)` will impute 
diff --git a/mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala 
b/mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala
index d0b6ab1..71403ac 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/feature/Imputer.scala
@@ -97,7 +97,7 @@ private[feature] trait ImputerParams extends Params with 
HasInputCol with HasInp
 }
 
 /**
- * Imputation estimator for completing missing values, either using the mean 
or the median
+ * Imputation estimator for completing missing values, using the mean, median 
or mode
  * of the columns in which the missing values are located. The input columns 
should be of
  * numeric type. Currently Imputer does not support categorical features
  * (SPARK-15041) and possibly creates incorrect values for a categorical 
feature.
diff --git a/python/pyspark/ml/feature.py b/python/pyspark/ml/feature.py
index 4e8b8b4..cd0e287 100755
--- a/python/pyspark/ml/feature.py
+++ b/python/pyspark/ml/feature.py
@@ -1536,7 +1536,7 @@ class _ImputerParams(HasInputCol, HasInputCols, 
HasOutputCol, HasOutputCols, Has
 @inherit_doc
 class Imputer(JavaEstimator, _ImputerParams, JavaMLReadable, JavaMLWritable):
     """
-    Imputation estimator for completing missing values, either using the mean 
or the median
+    Imputation estimator for completing missing values, using the mean, median 
or mode
     of the columns in which the missing values are located. The input columns 
should be of
     numeric type. Currently Imputer does not support categorical features and
     possibly creates incorrect values for a categorical feature.

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated: [MINOR][DOCS][ML] Doc 'mode' as a supported Imputer strategy in Pyspark

Reply via email to