You need to write your own function to compute a vector assigning a weight to each sample in X, then pass that as sample_weight parameter on RandomForestClassifier.fit() <http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier.fit>. If you also use class_weight on the model constructor, class_weight and sample_weight are multiplied through for each sample.
On Mon, Jan 23, 2017 at 11:36 PM, Debabrata Ghosh <[email protected]> wrote: > What would be the sample command for achieving it ? Sorry a bit new in > this area and that's why I will be better able to understand it through > certain example commands . > > Thanks again ! > > On Tue, Jan 24, 2017 at 6:58 AM, Josh Vredevoogd <[email protected]> > wrote: > >> If you do not want the weights to be uniform by class, then you need to >> generate weights for each sample and pass the sample weight vector to the >> fit method of the classifier. >> >> On Mon, Jan 23, 2017 at 4:48 PM, Debabrata Ghosh <[email protected]> >> wrote: >> >>> Thanks Josh for your quick feedback ! It's quite helpful indeed . >>> >>> Further to it , I am having another burning question. In my sample >>> dataset , I have 2 label columns (let's say x and y) >>> >>> My objective is to give the labels within column 'x' 10 times more >>> weight as compared to labels within column y. >>> >>> My question is the parameter class_weight={0: 1, 1: 10} works for a >>> single column, i.e., within a single column I have assigned 10 times weight >>> to the positive labels. >>> >>> But my objective is to provide a 10 times weight to the positive labels >>> within column 'x' as compared to the positive labels within column 'y'. >>> >>> May I please get a feedback from you around how to achieve this please. >>> Thanks for your help in advance ! >>> >>> On Mon, Jan 23, 2017 at 9:56 AM, Josh Vredevoogd <[email protected]> >>> wrote: >>> >>>> If you undersample, taking only 10% of the negative class, the >>>> classifier will see different combinations of attributes and produce a >>>> different fit to explain those distributions. In the worse case, imagine >>>> you are classifying birds and through sampling you eliminate all `red` >>>> examples. Your classifier likely now will not understand that red objects >>>> can be birds. That's an overly simple example, but given a classifier >>>> capable of exploring and explaining feature combinations, less obvious >>>> versions of this are bound to happen. >>>> >>>> The extrapolation only works in the other direction: if you manually >>>> duplicate samples by the sampling factor, you should get the exact same fit >>>> as if you increased the class weight. >>>> >>>> Hope that helps, >>>> Josh >>>> >>>> >>>> On Sun, Jan 22, 2017 at 5:00 AM, Debabrata Ghosh <[email protected] >>>> > wrote: >>>> >>>>> Thanks Josh ! >>>>> >>>>> I have used the parameter class_weight={0: 1, 1: 10} and the model >>>>> code has run successfully. However, just to get a further clarity around >>>>> it's concept, I am having another question for you please. I did the >>>>> following 2 tests: >>>>> >>>>> 1. In my dataset , I have 1 million negative classes and 10,000 >>>>> positive classes. First I ran my model code without supplying any >>>>> class_weight parameter and it gave me certain True Positive and False >>>>> Positive results. >>>>> >>>>> 2. Now in the second test, I had the same 1 million negative classes >>>>> but reduced the positive classes to 1000 . But this time, I supplied the >>>>> parameter class_weight={0: 1, 1: 10} and got my True Positive and False >>>>> Positive Results >>>>> >>>>> My question is , when I multiply the results obtained from my second >>>>> test with a factor of 10, I don't match with the results obtained from my >>>>> first test. In other words, say I get the true positive against a >>>>> threshold >>>>> from the second test as 8 , while the true positive from the first test >>>>> against the same threshold is 260. I am getting similar observations for >>>>> the false positive results wherein if I multiply the results obtained in >>>>> the second test by 10, I don't come close to the results obtained from the >>>>> first set. >>>>> >>>>> Is my expectation correct ? Is my way of executing the test (i.e., >>>>> reducing the the positive classes by 10 times and then feeding a class >>>>> weight of 10 times the negative classes) and comparing the results with a >>>>> model run without any class weight parameter correct ? >>>>> >>>>> Please let me know as per your convenience as this will help me a big >>>>> way to understand the concept further. >>>>> >>>>> Thanks in advance ! >>>>> >>>>> On Sun, Jan 22, 2017 at 1:56 AM, Josh Vredevoogd <[email protected] >>>>> > wrote: >>>>> >>>>>> The class_weight parameter doesn't behave the way you're expecting. >>>>>> >>>>>> The value in class_weight is the weight applied to each sample in >>>>>> that class - in your example, each class zero sample has weight 0.001 and >>>>>> each class one sample has weight 0.999, so each class one samples carries >>>>>> 999 times the weight of a class zero sample. >>>>>> >>>>>> If you would like each class one sample to have ten times the weight, >>>>>> you would set `class_weight={0: 1, 1: 10}` or `class_weight={0:0.1, 1:1}` >>>>>> equivalently. >>>>>> >>>>>> >>>>>> On Sat, Jan 21, 2017 at 10:18 AM, Debabrata Ghosh < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi All, >>>>>>> Greetings ! >>>>>>> >>>>>>> I have a very basic question regarding the usage of >>>>>>> the parameter class_weight in scikit learn's Random Forest Classifier's >>>>>>> fit >>>>>>> method. >>>>>>> >>>>>>> I have a fairly unbalanced sample and my positive >>>>>>> class : negative class ratio is 1:100. In other words, I have a million >>>>>>> records corresponding to negative class and 10,000 records >>>>>>> corresponding to >>>>>>> positive class. I have trained the random forest classifier model using >>>>>>> the >>>>>>> above record set successfully. >>>>>>> >>>>>>> Further, for a different problem, I want to test the >>>>>>> parameter class_weight. So, I am setting the class_weight as [0:0.001 , >>>>>>> 1:0.999] and I have tried running my model on the same dataset as >>>>>>> mentioned >>>>>>> in the above paragraph but with the positive class records reduced to >>>>>>> 1000 >>>>>>> [because now each positive class is given approximately 10 times more >>>>>>> weight than a negative class]. However, the model run results are very >>>>>>> very >>>>>>> different between the 2 runs (with and without class_weight). And I >>>>>>> expected a similar run results. >>>>>>> >>>>>>> Would you please be able to let me know where am I >>>>>>> getting wrong. I know it's something silly but just want to improve on >>>>>>> my >>>>>>> concept. >>>>>>> >>>>>>> Thanks ! >>>>>>> >>>>>>> _______________________________________________ >>>>>>> scikit-learn mailing list >>>>>>> [email protected] >>>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> scikit-learn mailing list >>>>>> [email protected] >>>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> scikit-learn mailing list >>>>> [email protected] >>>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> scikit-learn mailing list >>>> [email protected] >>>> https://mail.python.org/mailman/listinfo/scikit-learn >>>> >>>> >>> >>> _______________________________________________ >>> scikit-learn mailing list >>> [email protected] >>> https://mail.python.org/mailman/listinfo/scikit-learn >>> >>> >> >> _______________________________________________ >> scikit-learn mailing list >> [email protected] >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> > > _______________________________________________ > scikit-learn mailing list > [email protected] > https://mail.python.org/mailman/listinfo/scikit-learn > >
_______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn
