Re: [Scikit-learn-general] preprocessing.scaler uses population standard deviation

Robert Kern Tue, 06 Nov 2012 09:09:29 -0800

On Tue, Nov 6, 2012 at 4:17 PM, Doug Coleman <doug.cole...@gmail.com> wrote:
> Actually, from the numpy docs, the ddof=1 for np.std doesn't make it
> unbiased. There's a whole wikipedia article on calculating the unbiased
> standard deviation, and it seems to be different for the normal distribution
> than for others and involves the gamma function--the advice from the wiki is
> not to worry about it.
>
> http://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation
>
> However, it seems that some people define standardization as having a zero
> mean and unit _variance_, which numpy actually supports and is unbiased for
> iid samples. So maybe dividing by the variance and giving the flags
> with_var='population', 'sample', or None is the better solution.
>
> Wikipedia's article on feature scaling defines it as zero-mean and unit
> variance, but then gives the advice to divide by the standard deviation.
> Dividing by std seems like the wrong advice.
>
> http://en.wikipedia.org/wiki/Feature_scaling


No, that's right. You must divide the data by the square root of your
estimate of the variance, not the variance itself, in order to get
unit variance. Remember that variance has units of [data]**2 not
[data]. Whether you treat that square root as a separate parameter
with an estimator that has properties worth caring about (like
biasedness) is up to you and mostly beside the point with respect to
feature scaling.

--
Robert Kern

------------------------------------------------------------------------------
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] preprocessing.scaler uses population standard deviation

Reply via email to