I've been wrestling with this same issue in the regression case.

I realize it's not as straight forward to balance continuous data as it is
for discrete classes of output.

But I wonder if this list has any thoughts about how it might be approached.

The data I'm predicting is distributed normally and particularly when
sample sizes are small the tails tend to be neglected and poorly predicted.

Thoughts?


On Fri, Feb 8, 2013 at 2:44 AM, Manish Amde <manish...@gmail.com> wrote:

> Thanks Gilles. This definitely helps. I am glad I asked. :-)
>
> -Manish
>
> On Feb 7, 2013, at 11:33 PM, Gilles Louppe <g.lou...@gmail.com> wrote:
>
> > Hello,
> >
> > You might achieve what you want by using sample weights when fitting
> > your forest (See the 'sample_weight' parameter). There is also a
> > 'balance_weights' method from the preprocessing module that basically
> > generates sample weights for you, such that classes become balanced.
> >
> >
> https://github.com/glouppe/scikit-learn/blob/master/sklearn/preprocessing.py#L1221
> >
> > (This should appear in the reference, I'll fix that)
> >
> > Hope this helps,
> >
> > Gilles
> >
> > On 8 February 2013 00:44, Manish Amde <manish...@gmail.com> wrote:
> >> Fellow sklearners,
> >>
> >> I am working on a classification problem with an unbalanced data set and
> >> have been successful using SVM classifiers with the class_weight option.
> >>
> >> I have also tried Random Forests and am getting a decent ROC
> performance but
> >> I am hoping to get a performance improvement by using Weighted or
> Balanced
> >> Random Forests as suggested in this paper.
> >> http://www.stat.berkeley.edu/tech-reports/666.pdf
> >>
> >> I don't see any implementation of these options but I might be mistaken
> so I
> >> wanted to ask the community. Also, I am willing to write code and
> contribute
> >> back if this will be useful to other folks.
> >>
> >> I have also thought about balancing the data using up/down sampling the
> >> minority/majority class (with or without replacement) and even SMOTE but
> >> couldn't find those implementation in the scikit-learn library yet.  The
> >> modified Random Forests seem to outperform these methods according to
> the
> >> paper, hence I am interested in trying those first.
> >>
> >> -Manish
> >>
> >>
> ------------------------------------------------------------------------------
> >> Free Next-Gen Firewall Hardware Offer
> >> Buy your Sophos next-gen firewall before the end March 2013
> >> and get the hardware for free! Learn more.
> >> http://p.sf.net/sfu/sophos-d2d-feb
> >> _______________________________________________
> >> Scikit-learn-general mailing list
> >> Scikit-learn-general@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >>
> >
> >
> ------------------------------------------------------------------------------
> > Free Next-Gen Firewall Hardware Offer
> > Buy your Sophos next-gen firewall before the end March 2013
> > and get the hardware for free! Learn more.
> > http://p.sf.net/sfu/sophos-d2d-feb
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-general@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
> ------------------------------------------------------------------------------
> Free Next-Gen Firewall Hardware Offer
> Buy your Sophos next-gen firewall before the end March 2013
> and get the hardware for free! Learn more.
> http://p.sf.net/sfu/sophos-d2d-feb
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to