Hi All,
I am testing two classifiers  [ 1. Random forest 2. SVC with radial basis 
kernel ] on a data set via 5 fold cross validation. 

The feature matrix contains :

A. 80% features are binary [ 0 or 1 ]
B. 10% are integer values representing counts / occurrences.
C. 10% are continuous values between different ranges.

My prior understanding was that decision tree based algorithms work better on 
mixed data types. In this particular case I am noticing 
SVC is performing much better than Random forest. 

I Z-score normalise the data before I sent it to support vector classifier. 
- Binary features ( type A) are left as it it. 
- Integer and Continuous features are Z-score normalised [   ( feat - 
mean(feat) ) / sd(feat)   ) . 

I was wondering if anyone can tell me if this normalisation approach it correct 
for SVC run.

Thanks in advance for your help. 

Regards,
Mamun
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to