Greetings, I want to design a program that can deal with classification problems of the same type, where the number of positive observations is small but the number of negative much larger. Speaking with numbers, the number of positive observations could range usually between 2 to 20 and the number of negative could be at least x30 times larger. The number of features could be between 2 and 20 too, but that could be reduced using feature selection and elimination algorithms. I 've read in the documentation that some algorithms like the SVM are still effective when the number of dimensions is greater than the number of samples, but I am not sure if they are suitable for my case. Moreover, according to this Figure, the Nearest Neighbors is the best and second is the RBF SVM:
http://scikit-learn.org/stable/_images/sphx_glr _plot_classifier_comparison_001.png However, I assume that Nearest Neighbors would not be effective in my case where the number of positive observations is very low. For these reasons I would like to know your expert opinion about which classification algorithm should I try first. thanks in advance Thomas -- ====================================================================== Thomas Evangelidis Research Specialist CEITEC - Central European Institute of Technology Masaryk University Kamenice 5/A35/1S081, 62500 Brno, Czech Republic email: [email protected] [email protected] website: https://sites.google.com/site/thomasevangelidishomepage/
_______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn
