On Tue, Nov 6, 2012 at 4:17 PM, Doug Coleman <doug.cole...@gmail.com> wrote: > Actually, from the numpy docs, the ddof=1 for np.std doesn't make it > unbiased. There's a whole wikipedia article on calculating the unbiased > standard deviation, and it seems to be different for the normal distribution > than for others and involves the gamma function--the advice from the wiki is > not to worry about it. > > http://en.wikipedia.org/wiki/Unbiased_estimation_of_standard_deviation > > However, it seems that some people define standardization as having a zero > mean and unit _variance_, which numpy actually supports and is unbiased for > iid samples. So maybe dividing by the variance and giving the flags > with_var='population', 'sample', or None is the better solution. > > Wikipedia's article on feature scaling defines it as zero-mean and unit > variance, but then gives the advice to divide by the standard deviation. > Dividing by std seems like the wrong advice. > > http://en.wikipedia.org/wiki/Feature_scaling
No, that's right. You must divide the data by the square root of your estimate of the variance, not the variance itself, in order to get unit variance. Remember that variance has units of [data]**2 not [data]. Whether you treat that square root as a separate parameter with an estimator that has properties worth caring about (like biasedness) is up to you and mostly beside the point with respect to feature scaling. -- Robert Kern ------------------------------------------------------------------------------ LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http://p.sf.net/sfu/logmein12331_d2d _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general