Hi All,
I am testing two classifiers [ 1. Random forest 2. SVC with radial basis
kernel ] on a data set via 5 fold cross validation.
The feature matrix contains :
A. 80% features are binary [ 0 or 1 ]
B. 10% are integer values representing counts / occurrences.
C. 10% are continuous values between different ranges.
My prior understanding was that decision tree based algorithms work better on
mixed data types. In this particular case I am noticing
SVC is performing much better than Random forest.
I Z-score normalise the data before I sent it to support vector classifier.
- Binary features ( type A) are left as it it.
- Integer and Continuous features are Z-score normalised [ ( feat -
mean(feat) ) / sd(feat) ) .
I was wondering if anyone can tell me if this normalisation approach it correct
for SVC run.
Thanks in advance for your help.
Regards,
Mamun
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn