Yes, Andreas. Thank you, just wanted to clarify. Thank you all for your help and sorry for some silly questions!
Best Regards, Yury Zhauniarovich On 5 June 2015 at 20:24, Andreas Mueller <t3k...@gmail.com> wrote: > Have you read my earlier email explaning just that? > > > Tree-based methods are the only ones that are invariant towards feature > scaling, do DecisionTree*, RandomForest*, ExtraTrees*, Bagging* (with > trees), GradientBoosting* (with trees). > > For all other algorithms, the outcome will be different whether you scale > your data or not. > For algorithms like nearest neighbors, I would not say they require > scaling, but scaling will change the result. > It is then a question on whether you think the range of your features is > meaningful or arbitrary. > > I don't think there is currently a chart on the complexity of algorithms, > thought it would be cool to add. > > > > On 06/05/2015 02:19 PM, Yury Zhauniarovich wrote: > > Thank you, Sebastian. This is what I want to understand. Considering the > final score, e.g., accuracy, does this mean that with scaling and without I > will get different results for NB and KNN? Or results will be the same like > in case of decision trees? > > With gradient descent algorithms it is clear why I need to scale the > features (because as you wrote for convergence). The question is whether > there are similar reasons to scale features for other algorithms (like I > said, KNN, NB or SVM)? May I get different results (e.g., accuracy) if I > scale features or not? > > > Best Regards, > Yury Zhauniarovich > > On 5 June 2015 at 19:58, Sebastian Raschka <se.rasc...@gmail.com> wrote: > >> "Need" to be scaled sounds a little bit strong ;) -- feature scaling is >> really context-dependend. If you are using stochastic gradient descent of >> gradient descent you surely want to standardize your data or at least >> center it for technical reasons and convergence. However, in naive Bayes, >> you just estimate the parameters e.g., via MLE so that there is no >> technical advantage of feature scaling, however, the results will be >> different with and without scaling. >> >> On Jun 5, 2015, at 1:03 PM, Andreas Mueller <t3k...@gmail.com> wrote: >> >> The result of scaled an non-scaled data will be different because the >> regularization will have a different effect. >> >> On 06/05/2015 03:10 AM, Yury Zhauniarovich wrote: >> >> Thank you all! However, what Sturla wrote is now out of my understanding. >> >> One more question. It seems also to me that Naive Bayes classifiers >> also do not need data to be scaled. Am I correct? >> >> >> Best Regards, >> Yury Zhauniarovich >> >> On 4 June 2015 at 20:55, Sturla Molden <sturla.mol...@gmail.com> wrote: >> >>> On 04/06/15 20:38, Sturla Molden wrote: >>> >>> > Component-wise EM (aka CEM2) is a better way of avoiding the >>> singularity >>> > disease, though. >>> >>> The traditional EM for a GMM proceeds like this: >>> >>> while True: >>> >>> global_estep(clusters) >>> >>> for c in clusters: >>> mstep(c) >>> >>> This is inherently unstable. Several clusters can become >>> near-singular in the M-step before there is an E-step >>> to redistribute the weights. You can get a "cascade of >>> singularities" where the whole GMM basically dies. Even >>> if you bias the diagonal of the covariance you still >>> have the basic algorithmic problem. >>> >>> CEM2 proceeds like this: >>> >>> while True: >>> for c in clusters: >>> estep(c) >>> mstep(c) >>> >>> This improves stability enormously. When a cluster becomes >>> singular, the memberships are immediately redistributed. >>> Therefore you will not get a "cascade of singularities" >>> where the whole GMM basically dies. >>> >>> >>> Sturla >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> _______________________________________________ >>> Scikit-learn-general mailing list >>> Scikit-learn-general@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >>> >> >> >> >> ------------------------------------------------------------------------------ >> >> >> >> _______________________________________________ >> Scikit-learn-general mailing >> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> >> >> ------------------------------------------------------------------------------ >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> >> >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Scikit-learn-general mailing list >> Scikit-learn-general@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general >> >> > > > ------------------------------------------------------------------------------ > > > > _______________________________________________ > Scikit-learn-general mailing > listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > >
------------------------------------------------------------------------------
_______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general