preprocessor.scaler calls numpy's default standard deviation, which is the
population standard deviation (delta-degrees-of-freedom is 0). This is
usually reserved for when you have the entire set of data.

It seems this is rarely the case in machine learning, so perhaps it would
be better to scale using the sample standard deviation, which numpy already
supports, or to make it a flag.

At the very least, the docs are ambiguous about which standard deviation is
in use, and most standard commercial numerical computing packages default
to the opposite one, sample std (MATLAB, Mathematica, R).

Doug
------------------------------------------------------------------------------
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to