Re: [Scikit-learn-general] Classifiers that do not require feature scaling

Yury Zhauniarovich Fri, 05 Jun 2015 11:42:12 -0700

Yes, Andreas. Thank you, just wanted to clarify.

Thank you all for your help and sorry for some silly questions!



Best Regards,
Yury Zhauniarovich

On 5 June 2015 at 20:24, Andreas Mueller <t3k...@gmail.com> wrote:

>  Have you read my earlier email explaning just that?
>
> > Tree-based methods are the only ones that are invariant towards feature
> scaling, do DecisionTree*, RandomForest*, ExtraTrees*, Bagging* (with
> trees), GradientBoosting* (with trees).
>
> For all other algorithms, the outcome will be different whether you scale
> your data or not.
> For algorithms like nearest neighbors, I would not say they require
> scaling, but scaling will change the result.
> It is then a question on whether you think the range of your features is
> meaningful or arbitrary.
>
> I don't think there is currently a chart on the complexity of algorithms,
> thought it would be cool to add.
>
>
>
> On 06/05/2015 02:19 PM, Yury Zhauniarovich wrote:
>
> Thank you, Sebastian. This is what I want to understand. Considering the
> final score, e.g., accuracy, does this mean that with scaling and without I
> will get different results for NB and KNN? Or results will be the same like
> in case of decision trees?
>
>  With gradient descent algorithms it is clear why I need to scale the
> features (because as you wrote for convergence). The question is whether
> there are similar reasons to scale features for other algorithms (like I
> said, KNN, NB or SVM)? May I get different results (e.g., accuracy) if I
> scale features or not?
>
>
> Best Regards,
> Yury Zhauniarovich
>
> On 5 June 2015 at 19:58, Sebastian Raschka <se.rasc...@gmail.com> wrote:
>
>>  "Need" to be scaled sounds a little bit strong ;) -- feature scaling is
>> really context-dependend. If you are using stochastic gradient descent of
>> gradient descent you surely want to standardize your data or at least
>> center it for technical reasons and convergence. However, in naive Bayes,
>> you just estimate the parameters e.g., via MLE so that there is no
>> technical advantage of feature scaling, however, the results will be
>> different with and without scaling.
>>
>>   On Jun 5, 2015, at 1:03 PM, Andreas Mueller <t3k...@gmail.com> wrote:
>>
>>  The result of scaled an non-scaled data will be different because the
>> regularization will have a different effect.
>>
>> On 06/05/2015 03:10 AM, Yury Zhauniarovich wrote:
>>
>> Thank you all! However, what Sturla wrote is now out of my understanding.
>>
>>  One more question. It seems also to me that Naive Bayes classifiers
>> also do not need data to be scaled. Am I correct?
>>
>>
>> Best Regards,
>> Yury Zhauniarovich
>>
>> On 4 June 2015 at 20:55, Sturla Molden <sturla.mol...@gmail.com> wrote:
>>
>>> On 04/06/15 20:38, Sturla Molden wrote:
>>>
>>> > Component-wise EM (aka CEM2) is a better way of avoiding the
>>> singularity
>>> > disease, though.
>>>
>>> The traditional EM for a GMM proceeds like this:
>>>
>>> while True:
>>>
>>>     global_estep(clusters)
>>>
>>>     for c in clusters:
>>>         mstep(c)
>>>
>>> This is inherently unstable. Several clusters can become
>>> near-singular in the M-step before there is an E-step
>>> to redistribute the weights. You can get a "cascade of
>>> singularities" where the whole GMM basically dies. Even
>>> if you bias the diagonal of the covariance you still
>>> have the basic algorithmic problem.
>>>
>>> CEM2 proceeds like this:
>>>
>>> while True:
>>>     for c in clusters:
>>>         estep(c)
>>>         mstep(c)
>>>
>>> This improves stability enormously. When a cluster becomes
>>> singular, the memberships are immediately redistributed.
>>> Therefore you will not get a "cascade of singularities"
>>> where the whole GMM basically dies.
>>>
>>>
>>> Sturla
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>>
>>
>> _______________________________________________
>> Scikit-learn-general mailing 
>> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
>
>
>
> _______________________________________________
> Scikit-learn-general mailing 
> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>

------------------------------------------------------------------------------

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Classifiers that do not require feature scaling

Reply via email to