Hi Javier, In the particular case of tree-based models, you case use the soft labels to create a multi-output regression problem, which would yield an equivalent classifier (one can show that reduction of variance and the gini index would yield the same trees).
So basically, reg = RandomForestRegressor() reg.fit(X, encoded_y) should work. Gilles On 12 March 2017 at 20:11, Javier López Peña <[email protected]> wrote: > > On 12 Mar 2017, at 18:38, Gael Varoquaux <[email protected]> > wrote: > > You can use sample weights to go a bit in this direction. But in general, > the mathematical meaning of your intuitions will depend on the > classifier, so they will not be general ways of implementing them without > a lot of tinkering. > > > I see… to be honest for my purposes it would be enough to bypass the target > binarization for > the MLP classifier, so maybe I will just fork my own copy of that class for > this. > > The purpose is two-fold, on the one hand use the probabilities generated by > a very complex > model (e.g. a massive ensemble) to train a simpler one that achieves > comparable performance at a > fraction of the cost. Any universal classifier will do (neural networks are > the prime example). > > The second purpose it to use classes probabilities instead of observed > classes at training time. > In some problems this helps with model regularization (see section 6 of > [1]) > > Cheers, > J > > [1] https://arxiv.org/pdf/1503.02531v1.pdf > > > > _______________________________________________ > scikit-learn mailing list > [email protected] > https://mail.python.org/mailman/listinfo/scikit-learn > _______________________________________________ scikit-learn mailing list [email protected] https://mail.python.org/mailman/listinfo/scikit-learn
