[ 
https://issues.apache.org/jira/browse/SPARK-23535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-23535.
-------------------------------
    Resolution: Won't Fix

> MinMaxScaler return 0.5 for an all zero column
> ----------------------------------------------
>
>                 Key: SPARK-23535
>                 URL: https://issues.apache.org/jira/browse/SPARK-23535
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 2.0.0
>            Reporter: Yigal Weinberger
>            Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When applying MinMaxScaler on a column that contains only 0 the output is 0.5 
> for all the column. 
> This is inconsistent with sklearn implementation
>  
> Steps to reproduce:
>  
>  
> {code:java}
> from pyspark.ml.feature import MinMaxScaler
> from pyspark.ml.linalg import Vectors
> dataFrame = spark.createDataFrame([
>     (0, Vectors.dense([1.0, 0.1, -1.0]),),
>     (1, Vectors.dense([2.0, 1.1, 1.0]),),
>     (2, Vectors.dense([3.0, 10.1, 3.0]),)
> ], ["id", "features"])
> scaler = MinMaxScaler(inputCol="features", outputCol="scaledFeatures")
> # Compute summary statistics and generate MinMaxScalerModel
> scalerModel = scaler.fit(dataFrame)
> # rescale each feature to range [min, max].
> scaledData = scalerModel.transform(dataFrame)
> print("Features scaled to range: [%f, %f]" % (scaler.getMin(), 
> scaler.getMax()))
> scaledData.select("features", "scaledFeatures").show()
> {code}
> Features scaled to range: [0.000000, 1.000000]
> +--------------+--------------+
> |features|scaledFeatures|
> +--------------+--------------+
> | [1.0,0.1,0.0]| [0.0,0.0,*0.5*]| |
> [2.0,1.1,0.0]| [0.5,0.1,*0.5*]| |
> [3.0,10.1,0.0]| [1.0,1.0,*0.5*]|
> +--------------+--------------+
>  
> VS.
> {code:java}
> from sklearn.preprocessing import MinMaxScaler
> mms = MinMaxScaler(copy=False)
> test = np.array([[1.0, 0.1, 0],[2.0, 1.1, 0],[3.0, 10.1, 0]])
> print (mms.fit_transform(test))
> {code}
>  
> Output:
> [[ 0. 0. *0.* ]
> [ 0.5 0.1 *0.* ]
> [ 1. 1. *0.* ]]
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to