Three based algorithms (like Random Forest) usually work well for imbalanced datasets. You can also take a look at the SMOTE technique ( http://jair.org/media/953/live-953-2037-jair.pdf) which you can use for over-sampling the positive observations.
On Mon, Nov 14, 2016 at 9:14 AM, Thomas Evangelidis <[email protected]> wrote: > Greetings, > > I want to design a program that can deal with classification problems of > the same type, where the number of positive observations is small but the > number of negative much larger. Speaking with numbers, the number of > positive observations could range usually between 2 to 20 and the number of > negative could be at least x30 times larger. The number of features could > be between 2 and 20 too, but that could be reduced using feature selection > and elimination algorithms. I 've read in the documentation that some > algorithms like the SVM are still effective when the number of dimensions > is greater than the number of samples, but I am not sure if they are > suitable for my case. Moreover, according to this Figure, the Nearest > Neighbors is the best and second is the RBF SVM: > > http://scikit-learn.org/stable/_images/sphx_glr_plot_ > classifier_comparison_001.png > > However, I assume that Nearest Neighbors would not be effective in my > case where the number of positive observations is very low. For these > reasons I would like to know your expert opinion about which classification > algorithm should I try first. > > thanks in advance > Thomas > > > -- > > ====================================================================== > > Thomas Evangelidis > > Research Specialist > CEITEC - Central European Institute of Technology > Masaryk University > Kamenice 5/A35/1S081, > 62500 Brno, Czech Republic > > email: [email protected] > > [email protected] > > > website: https://sites.google.com/site/thomasevangelidishomepage/ > > > _______________________________________________ > scikit-learn mailing list > [email protected] > https://mail.python.org/mailman/listinfo/scikit-learn > > -- Fernando Marcos Wittmann MS Student - Energy Systems Dept. School of Electrical and Computer Engineering, FEEC University of Campinas, UNICAMP, Brazil +55 (19) 987-211302
_______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn
