What would be the sample command for achieving it ? Sorry a bit new in this area and that's why I will be better able to understand it through certain example commands .
Thanks again ! On Tue, Jan 24, 2017 at 6:58 AM, Josh Vredevoogd <[email protected]> wrote: > If you do not want the weights to be uniform by class, then you need to > generate weights for each sample and pass the sample weight vector to the > fit method of the classifier. > > On Mon, Jan 23, 2017 at 4:48 PM, Debabrata Ghosh <[email protected]> > wrote: > >> Thanks Josh for your quick feedback ! It's quite helpful indeed . >> >> Further to it , I am having another burning question. In my sample >> dataset , I have 2 label columns (let's say x and y) >> >> My objective is to give the labels within column 'x' 10 times more weight >> as compared to labels within column y. >> >> My question is the parameter class_weight={0: 1, 1: 10} works for a >> single column, i.e., within a single column I have assigned 10 times weight >> to the positive labels. >> >> But my objective is to provide a 10 times weight to the positive labels >> within column 'x' as compared to the positive labels within column 'y'. >> >> May I please get a feedback from you around how to achieve this please. >> Thanks for your help in advance ! >> >> On Mon, Jan 23, 2017 at 9:56 AM, Josh Vredevoogd <[email protected]> >> wrote: >> >>> If you undersample, taking only 10% of the negative class, the >>> classifier will see different combinations of attributes and produce a >>> different fit to explain those distributions. In the worse case, imagine >>> you are classifying birds and through sampling you eliminate all `red` >>> examples. Your classifier likely now will not understand that red objects >>> can be birds. That's an overly simple example, but given a classifier >>> capable of exploring and explaining feature combinations, less obvious >>> versions of this are bound to happen. >>> >>> The extrapolation only works in the other direction: if you manually >>> duplicate samples by the sampling factor, you should get the exact same fit >>> as if you increased the class weight. >>> >>> Hope that helps, >>> Josh >>> >>> >>> On Sun, Jan 22, 2017 at 5:00 AM, Debabrata Ghosh <[email protected]> >>> wrote: >>> >>>> Thanks Josh ! >>>> >>>> I have used the parameter class_weight={0: 1, 1: 10} and the model code >>>> has run successfully. However, just to get a further clarity around it's >>>> concept, I am having another question for you please. I did the following 2 >>>> tests: >>>> >>>> 1. In my dataset , I have 1 million negative classes and 10,000 >>>> positive classes. First I ran my model code without supplying any >>>> class_weight parameter and it gave me certain True Positive and False >>>> Positive results. >>>> >>>> 2. Now in the second test, I had the same 1 million negative classes >>>> but reduced the positive classes to 1000 . But this time, I supplied the >>>> parameter class_weight={0: 1, 1: 10} and got my True Positive and False >>>> Positive Results >>>> >>>> My question is , when I multiply the results obtained from my second >>>> test with a factor of 10, I don't match with the results obtained from my >>>> first test. In other words, say I get the true positive against a threshold >>>> from the second test as 8 , while the true positive from the first test >>>> against the same threshold is 260. I am getting similar observations for >>>> the false positive results wherein if I multiply the results obtained in >>>> the second test by 10, I don't come close to the results obtained from the >>>> first set. >>>> >>>> Is my expectation correct ? Is my way of executing the test (i.e., >>>> reducing the the positive classes by 10 times and then feeding a class >>>> weight of 10 times the negative classes) and comparing the results with a >>>> model run without any class weight parameter correct ? >>>> >>>> Please let me know as per your convenience as this will help me a big >>>> way to understand the concept further. >>>> >>>> Thanks in advance ! >>>> >>>> On Sun, Jan 22, 2017 at 1:56 AM, Josh Vredevoogd <[email protected]> >>>> wrote: >>>> >>>>> The class_weight parameter doesn't behave the way you're expecting. >>>>> >>>>> The value in class_weight is the weight applied to each sample in that >>>>> class - in your example, each class zero sample has weight 0.001 and each >>>>> class one sample has weight 0.999, so each class one samples carries 999 >>>>> times the weight of a class zero sample. >>>>> >>>>> If you would like each class one sample to have ten times the weight, >>>>> you would set `class_weight={0: 1, 1: 10}` or `class_weight={0:0.1, 1:1}` >>>>> equivalently. >>>>> >>>>> >>>>> On Sat, Jan 21, 2017 at 10:18 AM, Debabrata Ghosh < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi All, >>>>>> Greetings ! >>>>>> >>>>>> I have a very basic question regarding the usage of the >>>>>> parameter class_weight in scikit learn's Random Forest Classifier's fit >>>>>> method. >>>>>> >>>>>> I have a fairly unbalanced sample and my positive class >>>>>> : negative class ratio is 1:100. In other words, I have a million records >>>>>> corresponding to negative class and 10,000 records corresponding to >>>>>> positive class. I have trained the random forest classifier model using >>>>>> the >>>>>> above record set successfully. >>>>>> >>>>>> Further, for a different problem, I want to test the >>>>>> parameter class_weight. So, I am setting the class_weight as [0:0.001 , >>>>>> 1:0.999] and I have tried running my model on the same dataset as >>>>>> mentioned >>>>>> in the above paragraph but with the positive class records reduced to >>>>>> 1000 >>>>>> [because now each positive class is given approximately 10 times more >>>>>> weight than a negative class]. However, the model run results are very >>>>>> very >>>>>> different between the 2 runs (with and without class_weight). And I >>>>>> expected a similar run results. >>>>>> >>>>>> Would you please be able to let me know where am I >>>>>> getting wrong. I know it's something silly but just want to improve on my >>>>>> concept. >>>>>> >>>>>> Thanks ! >>>>>> >>>>>> _______________________________________________ >>>>>> scikit-learn mailing list >>>>>> [email protected] >>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> [email protected] >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> [email protected] >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> [email protected] >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> >> _______________________________________________ >> scikit-learn mailing list >> [email protected] >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > [email protected] > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn
