"Need" to be scaled sounds a little bit strong ;) -- feature scaling is really 
context-dependend. If you are using stochastic gradient descent of gradient 
descent you surely want to standardize your data or at least center it for 
technical reasons and convergence. However, in naive Bayes, you just estimate 
the parameters e.g., via MLE so that there is no technical advantage of feature 
scaling, however, the results will be different with and without scaling. 

> On Jun 5, 2015, at 1:03 PM, Andreas Mueller <t3k...@gmail.com> wrote:
> 
> The result of scaled an non-scaled data will be different because the 
> regularization will have a different effect.
> 
> On 06/05/2015 03:10 AM, Yury Zhauniarovich wrote:
>> Thank you all! However, what Sturla wrote is now out of my understanding.
>> 
>> One more question. It seems also to me that Naive Bayes classifiers also do 
>> not need data to be scaled. Am I correct?
>> 
>> 
>> Best Regards,
>> Yury Zhauniarovich
>> 
>> On 4 June 2015 at 20:55, Sturla Molden <sturla.mol...@gmail.com 
>> <mailto:sturla.mol...@gmail.com>> wrote:
>> On 04/06/15 20:38, Sturla Molden wrote:
>> 
>> > Component-wise EM (aka CEM2) is a better way of avoiding the singularity
>> > disease, though.
>> 
>> The traditional EM for a GMM proceeds like this:
>> 
>> while True:
>> 
>>     global_estep(clusters)
>> 
>>     for c in clusters:
>>         mstep(c)
>> 
>> This is inherently unstable. Several clusters can become
>> near-singular in the M-step before there is an E-step
>> to redistribute the weights. You can get a "cascade of
>> singularities" where the whole GMM basically dies. Even
>> if you bias the diagonal of the covariance you still
>> have the basic algorithmic problem.
>> 
>> CEM2 proceeds like this:
>> 
>> while True:
>>     for c in clusters:
>>         estep(c)
>>         mstep(c)
>> 
>> This improves stability enormously. When a cluster becomes
>> singular, the memberships are immediately redistributed.
>> Therefore you will not get a "cascade of singularities"
>> where the whole GMM basically dies.
>> 
>> 
>> Sturla
>> 
>> 
>> ------------------------------------------------------------------------------
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net 
>> <mailto:Scikit-learn-general@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
>> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
>> 
>> 
>> 
>> ------------------------------------------------------------------------------
>> 
>> 
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net 
>> <mailto:Scikit-learn-general@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general 
>> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to