Re: [Scikit-learn-general] Classifiers that do not require feature scaling

Sebastian Raschka Fri, 05 Jun 2015 12:42:46 -0700

> Considering the final score, e.g., accuracy, does this mean that with scaling 
> and without I will get different results for NB and KNN?


Yes. I think it would really help you to read a little bit about how those 
algorithms work -- to develop an intuition how feature scaling affects the 
outcome, and why it doesn't matter in decision trees.

> With gradient descent algorithms it is clear why I need to scale the features 
> (because as you wrote for convergence). The question is whether there are 
> similar reasons to scale features for other algorithms (like I said, KNN, NB 
> or SVM)?


About SVM & feature scaling: it is roughly speaking the the same as e.g., 
Logistic regression (in the linear case) but minimizing a different cost 
function (hinge loss). I have an example here for Adaline (adaptive linear 
neurons) to illustrate the effect of standardization a little bit if it helps: 
http://sebastianraschka.com/Articles/2015_singlelayer_neurons.html#The-Gradient-Descent-Rule-in-Action

Lastly, there is not a general rule such that feature scaling *always* 
"improves" the predictive performance. You really need to thing about it in 
context of the problem that you want to solve and the model that you are going 
to use.


> On Jun 5, 2015, at 2:19 PM, Yury Zhauniarovich <y.zhalnerov...@gmail.com> 
> wrote:
> 
> Thank you, Sebastian. This is what I want to understand. Considering the 
> final score, e.g., accuracy, does this mean that with scaling and without I 
> will get different results for NB and KNN? Or results will be the same like 
> in case of decision trees? 
> 
> With gradient descent algorithms it is clear why I need to scale the features 
> (because as you wrote for convergence). The question is whether there are 
> similar reasons to scale features for other algorithms (like I said, KNN, NB 
> or SVM)? May I get different results (e.g., accuracy) if I scale features or 
> not?   
> 
> 
> Best Regards,
> Yury Zhauniarovich
> 
> On 5 June 2015 at 19:58, Sebastian Raschka <se.rasc...@gmail.com 
> <mailto:se.rasc...@gmail.com>> wrote:
> "Need" to be scaled sounds a little bit strong ;) -- feature scaling is 
> really context-dependend. If you are using stochastic gradient descent of 
> gradient descent you surely want to standardize your data or at least center 
> it for technical reasons and convergence. However, in naive Bayes, you just 
> estimate the parameters e.g., via MLE so that there is no technical advantage 
> of feature scaling, however, the results will be different with and without 
> scaling. 
> 
>> On Jun 5, 2015, at 1:03 PM, Andreas Mueller <t3k...@gmail.com 
>> <mailto:t3k...@gmail.com>> wrote:
>> 
>> The result of scaled an non-scaled data will be different because the 
>> regularization will have a different effect.
>> 
>> On 06/05/2015 03:10 AM, Yury Zhauniarovich wrote:
>>> Thank you all! However, what Sturla wrote is now out of my understanding.
>>> 
>>> One more question. It seems also to me that Naive Bayes classifiers also do 
>>> not need data to be scaled. Am I correct?
>>> 
>>> 
>>> Best Regards,
>>> Yury Zhauniarovich
>>> 
>>> On 4 June 2015 at 20:55, Sturla Molden <sturla.mol...@gmail.com 
>>> <mailto:sturla.mol...@gmail.com>> wrote:
>>> On 04/06/15 20:38, Sturla Molden wrote:
>>> 
>>> > Component-wise EM (aka CEM2) is a better way of avoiding the singularity
>>> > disease, though.
>>> 
>>> The traditional EM for a GMM proceeds like this:
>>> 
>>> while True:
>>> 
>>>     global_estep(clusters)
>>> 
>>>     for c in clusters:
>>>         mstep(c)
>>> 
>>> This is inherently unstable. Several clusters can become
>>> near-singular in the M-step before there is an E-step
>>> to redistribute the weights. You can get a "cascade of
>>> singularities" where the whole GMM basically dies. Even
>>> if you bias the diagonal of the covariance you still
>>> have the basic algorithmic problem.
>>> 
>>> CEM2 proceeds like this:
>>> 
>>> while True:
>>>     for c in clusters:
>>>         estep(c)
>>>         mstep(c)
>>> 
>>> This improves stability enormously. When a cluster becomes
>>> singular, the memberships are immediately redistributed.
>>> Therefore you will not get a "cascade of singularities"
>>> where the whole GMM basically dies.
>>> 
>>> 
>>> Sturla
>>> 
>>> 
>>> ------------------------------------------------------------------------------
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net 
>>> <mailto:Scikit-learn-general@lists.sourceforge.net>
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
>>> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
>>> 
>>> 
>>> 
>>> ------------------------------------------------------------------------------
>>> 
>>> 
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net 
>>> <mailto:Scikit-learn-general@lists.sourceforge.net>
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
>>> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
>> 
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net 
>> <mailto:Scikit-learn-general@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
>> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> 
> 
> ------------------------------------------------------------------------------
> 
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net 
> <mailto:Scikit-learn-general@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> 
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Classifiers that do not require feature scaling

Reply via email to