Hi,

I have a question related to the range of my input data for SVM or 
Random Forests for classification:
I normalise my input vectors so that their euclidean norm is one, for 
instance to limit the influence of the image size or intensity contrast. 
I took the habit of then scaling them, multiplying them by a factor 1000 
so that I have values between 0 and 1000 instead of 0 and 1, and thus 
less values "close to zero". I guess it does not hurt to do so, but 
would you know if it is useful? Do the SVM and Random Forests already do 
some normalisation before starting to learn the data?

I have a similar questions for the Random Forests for regression: how is 
the minimal MSE required for a split define? Here again, if I scale my 
input by a factor 1000, shall I expect the resulting trees to be 
different (excluding the random aspect of Random Forests)?

Kind regards,

Kevin

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to