Re: [scikit-learn] Label encoding for classifiers and soft targets

2017-03-15 Thread Andreas Mueller
On 03/13/2017 05:54 PM, Javier López Peña wrote: You could use a regression model with a logistic sigmoid in the output layer. By training a regression network with logistic activation the outputs do not add to 1. I just checked on a minimal example on the iris dataset. Sorry meant softmax ;

Re: [scikit-learn] Label encoding for classifiers and soft targets

2017-03-13 Thread Javier López Peña
> On 13 Mar 2017, at 21:18, Andreas Mueller wrote: > > No, if all the samples are normalized and your aggregation function is sane > (like the mean), the output will also be normalised. You are completely right, I hadn’t checked this for random forests. Still, my purpose is to reduce model com

Re: [scikit-learn] Label encoding for classifiers and soft targets

2017-03-13 Thread Javier López Peña
> You could use a regression model with a logistic sigmoid in the output layer. By training a regression network with logistic activation the outputs do not add to 1. I just checked on a minimal example on the iris dataset. ___ scikit-learn mailing lis

Re: [scikit-learn] Label encoding for classifiers and soft targets

2017-03-13 Thread Andreas Mueller
On 03/13/2017 08:35 AM, Javier López Peña wrote: Training a regression tree would require sticking some kind of probability normaliser at the end to ensure proper probabilities, this might somehow hurt sharpness or calibration. No, if all the samples are normalized and your aggregation function

Re: [scikit-learn] Label encoding for classifiers and soft targets

2017-03-13 Thread Andreas Mueller
On 03/12/2017 03:11 PM, Javier López Peña wrote: The purpose is two-fold, on the one hand use the probabilities generated by a very complex model (e.g. a massive ensemble) to train a simpler one that achieves comparable performance at a fraction of the cost. Any universal classifier will do (

Re: [scikit-learn] Label encoding for classifiers and soft targets

2017-03-13 Thread Javier López Peña
Hi Giles, thanks for the suggestion! Training a regression tree would require sticking some kind of probability normaliser at the end to ensure proper probabilities, this might somehow hurt sharpness or calibration. Unfortunately, one of the things I am trying to do with this is moving away fr

Re: [scikit-learn] Label encoding for classifiers and soft targets

2017-03-13 Thread Gilles Louppe
Hi Javier, In the particular case of tree-based models, you case use the soft labels to create a multi-output regression problem, which would yield an equivalent classifier (one can show that reduction of variance and the gini index would yield the same trees). So basically, reg = RandomForestRe

Re: [scikit-learn] Label encoding for classifiers and soft targets

2017-03-12 Thread Javier López Peña
> On 12 Mar 2017, at 18:38, Gael Varoquaux > wrote: > > You can use sample weights to go a bit in this direction. But in general, > the mathematical meaning of your intuitions will depend on the > classifier, so they will not be general ways of implementing them without > a lot of tinkering. I

Re: [scikit-learn] Label encoding for classifiers and soft targets

2017-03-12 Thread Gael Varoquaux
> Would it be simple to modify sklearn code to do this, or would it require a > lot of tinkering > such as modifying every single classifier under the sun? You can use sample weights to go a bit in this direction. But in general, the mathematical meaning of your intuitions will depend on the cl

[scikit-learn] Label encoding for classifiers and soft targets

2017-03-11 Thread Javier López Peña
Hi there! I have been recently experimenting with model regularization through the use of soft targets, and I’d like to be able to play with that from sklearn. The main idea is as follows: imagine I want to fit a (probabilisitic) classifier with three possible targets, 0, 1, 2 If I pass my tr