Hi Dale.
Please keep all discussions on the mailing list as not everybody might have the time to reply. The default should be class_weight=1 for each class, so dropping the half in one class should reduce the weight for that class to .5. This only works for removing duplicate data points (dropping points will clearly lose information otherwise).

Have a look here:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/tests/test_class_weight.py#L41

Cheers,
Andy


On 07/24/2015 12:57 PM, Dale Smith wrote:

Andy,

I’ve thought a bit about your suggestion for testing. I’m not sure I fully understand the mechanics or process. Forgive my lack of experience.

Suppose I have a data set with equal weights for a binary classification. Dropping half the samples for one class would change the weights to 0.25/0.75. Is this what you are thinking?

I suppose I could retrain the model with the 0.25/0.75 weights. I suspect I should get the same predictions (assuming I use the same seed for the random number generator).

Am I on the right track here?


*Dale Smith, Ph.D.*
Data Scientist
​
http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20logo.png 
<http://nexidia.com/>
*
d.* 404.495.7220 x 4008 *f.* 404.795.7221
Nexidia Corporate | 3565 Piedmont Road, Building Two, Suite 400 | Atlanta, GA 30305

http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20Blog.jpeg <http://blog.nexidia.com/>http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20LinkedIn.jpeg <https://www.linkedin.com/company/nexidia>http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20Google.jpeg <https://plus.google.com/u/0/107921893643164441840/posts>http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20twitter.jpeg <https://twitter.com/Nexidia>http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20Youtube.jpeg <https://www.youtube.com/user/NexidiaTV>

*From:*Andy [mailto:t3k...@gmail.com]
*Sent:* Thursday, July 23, 2015 8:27 AM
*To:* scikit-learn-general@lists.sourceforge.net
*Subject:* Re: [Scikit-learn-general] Added sample_weight to RFECV.fit but not sure how to test the change

I think my reply for this got swallowed by the sourceforge outage:

The main thing that you should test is whether the added behavior is correct. For that you should confirm that changing sample weights is equivalent to duplicating / dropping a sample.


On 07/22/2015 01:34 PM, Dale Smith wrote:

    I’ve added sample_weight as an optional parameter to RFECV.fit in
    order to handle highly unbalanced cases. I can build the package
    locally. However, looking at the tests directory does not give me
    much confidence that I can write validation and regression tests.
    I looked particularly at test_metaestimators.py.

    I also reviewed the Contributing section of the documentation, the
    wiki, and searched the mailing list archive, but didn’t find
    anything relevant. Are there any other sources I should review?


    *Dale Smith, Ph.D.*
    Data Scientist
    ​
    http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20logo.png
    <http://nexidia.com/>
    *
    d.* 404.495.7220 x 4008 *f.* 404.795.7221
    Nexidia Corporate | 3565 Piedmont Road, Building Two, Suite 400 |
    Atlanta, GA 30305

    http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20Blog.jpeg
    <http://blog.nexidia.com/>
    http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20LinkedIn.jpeg
    <https://www.linkedin.com/company/nexidia>
    http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20Google.jpeg
    <https://plus.google.com/u/0/107921893643164441840/posts>
    http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20twitter.jpeg
    <https://twitter.com/Nexidia>
    http://host.msgapp.com/Extranet/96621/Signature%20Images/sig%20Youtube.jpeg
    <https://www.youtube.com/user/NexidiaTV>




    
------------------------------------------------------------------------------

    Don't Limit Your Business. Reach for the Cloud.

    GigeNET's Cloud Solutions provide you with the tools and support that

    you need to offload your IT needs and focus on growing your business.

    Configured For All Businesses. Start Your Cloud Today.

    https://www.gigenetcloud.com/




    _______________________________________________

    Scikit-learn-general mailing list

    Scikit-learn-general@lists.sourceforge.net  
<mailto:Scikit-learn-general@lists.sourceforge.net>

    https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to