Also, you should think about what your performance measure should be, and if it should be accuracy (usually it is not).
AUC is often good, but you need to choose an operating point in the end.

On 06/23/2015 10:58 AM, Trevor Stephens wrote:
Many of the scikit-learn classifiers are equipped with a parameter `class_weight` that can be helpful in situations such as this. Depending on if you are on the development branch, or a public release, the preset "auto" or "balanced" will re-weight samples by their inverse class frequencies.

You may also do a grid search to try and find a "better" set of class weights, something like this perhaps:

parameters = {'class_weight': [{A: i + 1., B: 10. - i} for i in range(10)]}
    clf = SomeClassifier()
    grid = GridSearchCV(clf, parameters)
    grid.fit(X, y)

- Trev

On Tue, Jun 23, 2015 at 7:25 AM, Neal Becker <ndbeck...@gmail.com <mailto:ndbeck...@gmail.com>> wrote:

    Any suggestions?


    Neal Becker wrote:

    > I am interested in supervised learning for classification where
    I have
    > multiple classes, but training data is highly unequal. There may
    be 1000s
> of training examples for class A, but maybe 100s for class B. What are
    > suggested algorithms/approaches?
    >
    >
    >
    
------------------------------------------------------------------------------



    
------------------------------------------------------------------------------
    Monitor 25 network devices or servers for free with OpManager!
    OpManager is web-based network management software that monitors
    network devices and physical & virtual servers, alerts via email & sms
    for fault. Monitor 25 devices for free with no restriction.
    Download now
    http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
    _______________________________________________
    Scikit-learn-general mailing list
    Scikit-learn-general@lists.sourceforge.net
    <mailto:Scikit-learn-general@lists.sourceforge.net>
    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general




------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors
network devices and physical & virtual servers, alerts via email & sms
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors 
network devices and physical & virtual servers, alerts via email & sms 
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to