Three based algorithms (like Random Forest) usually work well for
imbalanced datasets. You can also take a look at the SMOTE technique (
http://jair.org/media/953/live-953-2037-jair.pdf) which you can use for
over-sampling the positive observations.

On Mon, Nov 14, 2016 at 9:14 AM, Thomas Evangelidis <[email protected]>
wrote:

> Greetings,
>
> I want to design a program that can deal with classification problems of
> the same type, where the  number of positive observations is small but the
> number of negative much larger. Speaking with numbers, the number of
> positive observations could range usually between 2 to 20 and the number of
> negative could be at least x30 times larger. The number of features could
> be between 2 and 20 too, but that could be reduced using feature selection
> and elimination algorithms. I 've read in the documentation that some
> algorithms like the SVM are still effective when the number of dimensions
> is greater than the number of samples, but I am not sure if they are
> suitable for my case. Moreover, according to this Figure, the Nearest
> Neighbors is the best and second is the RBF SVM:
>
> http://scikit-learn.org/stable/_images/sphx_glr_plot_
> classifier_comparison_001.png
>
> However, I assume that Nearest Neighbors would not be effective in my
> case where the number of positive observations is very low. For these
> reasons I would like to know your expert opinion about which classification
> algorithm should I try first.
>
> thanks in advance
> Thomas
>
>
> --
>
> ======================================================================
>
> Thomas Evangelidis
>
> Research Specialist
> CEITEC - Central European Institute of Technology
> Masaryk University
> Kamenice 5/A35/1S081,
> 62500 Brno, Czech Republic
>
> email: [email protected]
>
>           [email protected]
>
>
> website: https://sites.google.com/site/thomasevangelidishomepage/
>
>
> _______________________________________________
> scikit-learn mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/scikit-learn
>
>


-- 

Fernando Marcos Wittmann
MS Student - Energy Systems Dept.
School of Electrical and Computer Engineering, FEEC
University of Campinas, UNICAMP, Brazil
+55 (19) 987-211302
_______________________________________________
scikit-learn mailing list
[email protected]
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to