I had a similar situation, so I created a larger training set with roughly
equal class membership by randomly sampling with replacement from the
training set. Results were much better during CV (against the inflated
training set) and also against the held out test set (from the original
training set).

-sujit

On Tue, Jun 23, 2015 at 7:25 AM, Neal Becker <ndbeck...@gmail.com> wrote:

> Any suggestions?
>
>
> Neal Becker wrote:
>
> > I am interested in supervised learning for classification where I have
> > multiple classes, but training data is highly unequal.  There may be
> 1000s
> > of training examples for class A, but maybe 100s for class B.  What are
> > suggested algorithms/approaches?
> >
> >
> >
>
> ------------------------------------------------------------------------------
>
>
>
>
> ------------------------------------------------------------------------------
> Monitor 25 network devices or servers for free with OpManager!
> OpManager is web-based network management software that monitors
> network devices and physical & virtual servers, alerts via email & sms
> for fault. Monitor 25 devices for free with no restriction. Download now
> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors 
network devices and physical & virtual servers, alerts via email & sms 
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to