Re: [Scikit-learn-general] normalising/scaling input for SVM or Random Forests

2014-03-19 Thread Satrajit Ghosh
thanks lars. this would mean that any tree-based model could generate differences based on preprocessing differences right? cheers, satra On Sun, Mar 16, 2014 at 3:37 PM, Olivier Grisel olivier.gri...@ensta.orgwrote: 2014-03-16 0:23 GMT+01:00 Lars Buitinck larsm...@gmail.com: 2014-03-15

Re: [Scikit-learn-general] normalising/scaling input for SVM or Random Forests

2014-03-19 Thread Lars Buitinck
2014-03-19 21:40 GMT+01:00 Satrajit Ghosh sa...@mit.edu: this would mean that any tree-based model could generate differences based on preprocessing differences right? Yes. I'm not sure why the threshold is there, but it's probably to prevent generating too many splits in the face of noisy

Re: [Scikit-learn-general] normalising/scaling input for SVM or Random Forests

2014-03-16 Thread Olivier Grisel
2014-03-16 0:23 GMT+01:00 Lars Buitinck larsm...@gmail.com: 2014-03-15 21:53 GMT+01:00 Satrajit Ghosh sa...@mit.edu: in many cases with fat data (small samples50 x many features10) i have found that standardizing helps quite a bit in case of extra trees. i still don't have a good

Re: [Scikit-learn-general] normalising/scaling input for SVM or Random Forests

2014-03-15 Thread Kevin Keraudren
Thanks a lot for this detailed answer! Kind regards, Kevin Le 14/03/2014 16:37, Olivier Grisel a écrit : 2014-03-14 15:34 GMT+01:00 Kevin Keraudren kevin.keraudre...@imperial.ac.uk: Hi, I have a question related to the range of my input data for SVM or Random Forests for classification: I

Re: [Scikit-learn-general] normalising/scaling input for SVM or Random Forests

2014-03-15 Thread Satrajit Ghosh
hi olivier, just a question on this statement: Random Forest (and decision tree-based models in general) are scale independent. in many cases with fat data (small samples50 x many features10) i have found that standardizing helps quite a bit in case of extra trees. i still don't have a

Re: [Scikit-learn-general] normalising/scaling input for SVM or Random Forests

2014-03-15 Thread Gilles Louppe
Hi Satra, In case of Extra-Trees, changing the scale of features might change the result when the transform you apply distorts the original feature space. Drawing a threshold uniformly at random in the original [min;max] interval won't be equivalent to drawing a threshold in [f(min);f(max)] if f

Re: [Scikit-learn-general] normalising/scaling input for SVM or Random Forests

2014-03-15 Thread Satrajit Ghosh
thanks gilles, that makes sense. i haven't checked random forest classification on these data. i'll check that as well. cheers, satra On Sat, Mar 15, 2014 at 5:51 PM, Gilles Louppe g.lou...@gmail.com wrote: Hi Satra, In case of Extra-Trees, changing the scale of features might change the

Re: [Scikit-learn-general] normalising/scaling input for SVM or Random Forests

2014-03-14 Thread Olivier Grisel
2014-03-14 15:34 GMT+01:00 Kevin Keraudren kevin.keraudre...@imperial.ac.uk: Hi, I have a question related to the range of my input data for SVM or Random Forests for classification: I normalise my input vectors so that their euclidean norm is one, for instance to limit the influence of the