This could be true if you knew you were just going to scale the input to
StandardScaler and nothing else. It's probably more typical you'd scale
some other data. The current behavior is therefore the sensible default,
because the input is a sample of some unknown larger population.
I think it does
Actually I think it is possibly that an user/developer needs the
standardized features with population mean and std in some cases. It would
be better if StandardScaler can offer the option to do that.
Holden Karau wrote
> Hi Gilad,
>
> Spark uses the sample standard variance inside of the Stan
Hi Gilad,
Spark uses the sample standard variance inside of the StandardScaler (see
https://spark.apache.org/docs/2.0.2/api/scala/index.html#org.apache.spark.mllib.feature.StandardScaler
) which I think would explain the results you are seeing you are seeing. I
believe the scalers are intended to
Hi
It seems that the output of MLlib's *StandardScaler*(*withMean=*True,
*withStd*=True)are not as expected.
The above configuration is expected to do the following transformation:
X -> Y = (X-Mean)/Std - Eq.1
This transformation (a.k.a. Standardization) should result in a
"standardized" vecto