Re: [Scikit-learn-general] Classifiers that do not require feature scaling

Andreas Mueller Fri, 05 Jun 2015 11:26:07 -0700

Have you read my earlier email explaning just that?

> Tree-based methods are the only ones that are invariant towardsfeature scaling, do DecisionTree*, RandomForest*, ExtraTrees*, Bagging*(with trees), GradientBoosting* (with trees).

For all other algorithms, the outcome will be different whether youscale your data or not.For algorithms like nearest neighbors, I would not say they requirescaling, but scaling will change the result.It is then a question on whether you think the range of your features ismeaningful or arbitrary.

I don't think there is currently a chart on the complexity ofalgorithms, thought it would be cool to add.




On 06/05/2015 02:19 PM, Yury Zhauniarovich wrote:

Thank you, Sebastian. This is what I want to understand. Consideringthe final score, e.g., accuracy, does this mean that with scaling andwithout I will get different results for NB and KNN? Or results willbe the same like in case of decision trees?

With gradient descent algorithms it is clear why I need to scale thefeatures (because as you wrote for convergence). The question iswhether there are similar reasons to scale features for otheralgorithms (like I said, KNN, NB or SVM)? May I get different results(e.g., accuracy) if I scale features or not?



Best Regards,
Yury Zhauniarovich

On 5 June 2015 at 19:58, Sebastian Raschka <se.rasc...@gmail.com<mailto:se.rasc...@gmail.com>> wrote:


    "Need" to be scaled sounds a little bit strong ;) -- feature
    scaling is really context-dependend. If you are using stochastic
    gradient descent of gradient descent you surely want to
    standardize your data or at least center it for technical reasons
    and convergence. However, in naive Bayes, you just estimate the
    parameters e.g., via MLE so that there is no technical advantage
    of feature scaling, however, the results will be different with
    and without scaling.

    On Jun 5, 2015, at 1:03 PM, Andreas Mueller <t3k...@gmail.com
    <mailto:t3k...@gmail.com>> wrote:

    The result of scaled an non-scaled data will be different because
    the regularization will have a different effect.

    On 06/05/2015 03:10 AM, Yury Zhauniarovich wrote:

    Thank you all! However, what Sturla wrote is now out of my
    understanding.

    One more question. It seems also to me that Naive Bayes
    classifiers also do not need data to be scaled. Am I correct?


    Best Regards,
    Yury Zhauniarovich

    On 4 June 2015 at 20:55, Sturla Molden <sturla.mol...@gmail.com
    <mailto:sturla.mol...@gmail.com>> wrote:

        On 04/06/15 20:38, Sturla Molden wrote:

        > Component-wise EM (aka CEM2) is a better way of avoiding
        the singularity
        > disease, though.

        The traditional EM for a GMM proceeds like this:

        while True:

            global_estep(clusters)

            for c in clusters:
                mstep(c)

        This is inherently unstable. Several clusters can become
        near-singular in the M-step before there is an E-step
        to redistribute the weights. You can get a "cascade of
        singularities" where the whole GMM basically dies. Even
        if you bias the diagonal of the covariance you still
        have the basic algorithmic problem.

        CEM2 proceeds like this:

        while True:
            for c in clusters:
                estep(c)
                mstep(c)

        This improves stability enormously. When a cluster becomes
        singular, the memberships are immediately redistributed.
        Therefore you will not get a "cascade of singularities"
        where the whole GMM basically dies.


        Sturla


        
------------------------------------------------------------------------------
        _______________________________________________
        Scikit-learn-general mailing list
        Scikit-learn-general@lists.sourceforge.net
        <mailto:Scikit-learn-general@lists.sourceforge.net>
        https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




    
------------------------------------------------------------------------------


    _______________________________________________
    Scikit-learn-general mailing list
    Scikit-learn-general@lists.sourceforge.net  
<mailto:Scikit-learn-general@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


    
------------------------------------------------------------------------------
    _______________________________________________
    Scikit-learn-general mailing list
    Scikit-learn-general@lists.sourceforge.net
    <mailto:Scikit-learn-general@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



    
------------------------------------------------------------------------------

    _______________________________________________
    Scikit-learn-general mailing list
    Scikit-learn-general@lists.sourceforge.net
    <mailto:Scikit-learn-general@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




------------------------------------------------------------------------------


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Classifiers that do not require feature scaling

Reply via email to