Dear Mamun, *A.* 80% features are binary [ 0 or 1 ] > *B.* 10% are integer values representing counts / occurrences. > *C.* 10% are continuous values between different ranges. > > My prior understanding was that decision tree based algorithms work better > on mixed data types. In this particular case I am noticing > SVC is performing much better than Random forest. >
What does "performing better" mean in this case? How are you defining performance? A particular metric such as MCC, PPV, or NPV? Also, how is the cross-validation being done - is the data shuffled before creating train/test groups are created? Is the exact same split of training and test data per fold used for both SVC and RF? > I Z-score normalise the data before I sent it to support vector > classifier. > - Binary features ( type *A) *are left as it it. > - Integer and Continuous features are Z-score normalised [ ( feat - > mean(feat) ) / sd(feat) ) . > Normalizing your continuous values seems quite fine, but consider these aspects: --Does it make sense in the domain of your problem to Z-normalize the integral (integer-valued) descriptors/features? --For the integral values, would subtracting about the median value make more sense? This is similar to the previous consideration. --What happens to SVC if you don't normalize? --What happens to RF if you do normalize? While my various comments above are all geared toward empirical aspects and not toward theoretical aspects, picking some of them to explore is likely to help you gain practical insight on your situation/inquiry. I'm sure you already know this, but while machine learning may have some "practical guidelines for best practices", they are guidelines and not hard rules. So, again, I would recommend doing some more empirical tests and re-evaluating your situation once you have new data in hand. If you can provide a good amount of concrete data to present along with your "problem", this community is excellent at providing intelligent, helpful responses. Hope this helps. J.B.
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn