Have you read my earlier email explaning just that?
> Tree-based methods are the only ones that are invariant towards
feature scaling, do DecisionTree*, RandomForest*, ExtraTrees*, Bagging*
(with trees), GradientBoosting* (with trees).
For all other algorithms, the outcome will be different whether you
scale your data or not.
For algorithms like nearest neighbors, I would not say they require
scaling, but scaling will change the result.
It is then a question on whether you think the range of your features is
meaningful or arbitrary.
I don't think there is currently a chart on the complexity of
algorithms, thought it would be cool to add.
On 06/05/2015 02:19 PM, Yury Zhauniarovich wrote:
Thank you, Sebastian. This is what I want to understand. Considering
the final score, e.g., accuracy, does this mean that with scaling and
without I will get different results for NB and KNN? Or results will
be the same like in case of decision trees?
With gradient descent algorithms it is clear why I need to scale the
features (because as you wrote for convergence). The question is
whether there are similar reasons to scale features for other
algorithms (like I said, KNN, NB or SVM)? May I get different results
(e.g., accuracy) if I scale features or not?
Best Regards,
Yury Zhauniarovich
On 5 June 2015 at 19:58, Sebastian Raschka <se.rasc...@gmail.com
<mailto:se.rasc...@gmail.com>> wrote:
"Need" to be scaled sounds a little bit strong ;) -- feature
scaling is really context-dependend. If you are using stochastic
gradient descent of gradient descent you surely want to
standardize your data or at least center it for technical reasons
and convergence. However, in naive Bayes, you just estimate the
parameters e.g., via MLE so that there is no technical advantage
of feature scaling, however, the results will be different with
and without scaling.
On Jun 5, 2015, at 1:03 PM, Andreas Mueller <t3k...@gmail.com
<mailto:t3k...@gmail.com>> wrote:
The result of scaled an non-scaled data will be different because
the regularization will have a different effect.
On 06/05/2015 03:10 AM, Yury Zhauniarovich wrote:
Thank you all! However, what Sturla wrote is now out of my
understanding.
One more question. It seems also to me that Naive Bayes
classifiers also do not need data to be scaled. Am I correct?
Best Regards,
Yury Zhauniarovich
On 4 June 2015 at 20:55, Sturla Molden <sturla.mol...@gmail.com
<mailto:sturla.mol...@gmail.com>> wrote:
On 04/06/15 20:38, Sturla Molden wrote:
> Component-wise EM (aka CEM2) is a better way of avoiding
the singularity
> disease, though.
The traditional EM for a GMM proceeds like this:
while True:
global_estep(clusters)
for c in clusters:
mstep(c)
This is inherently unstable. Several clusters can become
near-singular in the M-step before there is an E-step
to redistribute the weights. You can get a "cascade of
singularities" where the whole GMM basically dies. Even
if you bias the diagonal of the covariance you still
have the basic algorithmic problem.
CEM2 proceeds like this:
while True:
for c in clusters:
estep(c)
mstep(c)
This improves stability enormously. When a cluster becomes
singular, the memberships are immediately redistributed.
Therefore you will not get a "cascade of singularities"
where the whole GMM basically dies.
Sturla
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
<mailto:Scikit-learn-general@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general