Thanks Gilles. This definitely helps. I am glad I asked. :-)

-Manish

On Feb 7, 2013, at 11:33 PM, Gilles Louppe <g.lou...@gmail.com> wrote:

> Hello,
> 
> You might achieve what you want by using sample weights when fitting
> your forest (See the 'sample_weight' parameter). There is also a
> 'balance_weights' method from the preprocessing module that basically
> generates sample weights for you, such that classes become balanced.
> 
> https://github.com/glouppe/scikit-learn/blob/master/sklearn/preprocessing.py#L1221
> 
> (This should appear in the reference, I'll fix that)
> 
> Hope this helps,
> 
> Gilles
> 
> On 8 February 2013 00:44, Manish Amde <manish...@gmail.com> wrote:
>> Fellow sklearners,
>> 
>> I am working on a classification problem with an unbalanced data set and
>> have been successful using SVM classifiers with the class_weight option.
>> 
>> I have also tried Random Forests and am getting a decent ROC performance but
>> I am hoping to get a performance improvement by using Weighted or Balanced
>> Random Forests as suggested in this paper.
>> http://www.stat.berkeley.edu/tech-reports/666.pdf
>> 
>> I don't see any implementation of these options but I might be mistaken so I
>> wanted to ask the community. Also, I am willing to write code and contribute
>> back if this will be useful to other folks.
>> 
>> I have also thought about balancing the data using up/down sampling the
>> minority/majority class (with or without replacement) and even SMOTE but
>> couldn't find those implementation in the scikit-learn library yet.  The
>> modified Random Forests seem to outperform these methods according to the
>> paper, hence I am interested in trying those first.
>> 
>> -Manish
>> 
>> ------------------------------------------------------------------------------
>> Free Next-Gen Firewall Hardware Offer
>> Buy your Sophos next-gen firewall before the end March 2013
>> and get the hardware for free! Learn more.
>> http://p.sf.net/sfu/sophos-d2d-feb
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>> 
> 
> ------------------------------------------------------------------------------
> Free Next-Gen Firewall Hardware Offer
> Buy your Sophos next-gen firewall before the end March 2013 
> and get the hardware for free! Learn more.
> http://p.sf.net/sfu/sophos-d2d-feb
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


------------------------------------------------------------------------------
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to