On Thu, Nov 10, 2011 at 09:58:16PM -0500, Alexandre Gramfort wrote:
> To me this is wrong to not apply such a scaling by n_samples. To
> motivate this, just look as the gist and you will see that if you don't
> do it then C / alpha needs to be changed if you duplicate every sample.
> This is particularly problematic with the cross-validation as you end
> up finding a C/alpha adapted to the size of the training folds rather
> than the full data. Think about the refit in GridSearchCV.
> Let me know what you think but I feel we should fix this.
Yes, we should fix this, but we are breaking backward compatibility by
doing this. Thus people will scratch there head and look at their results
wondering why hell broke loose when they updated their version of the
scikit.
Thus we need to plan a way forward for these changes:
1. I think that we should add a parameter to the different objects that
we are changing, that sets the scaling behavior. For one release or
two, the parameter is set by default in such a way that the behavior
is not changed. After this, we change the default, and yet-again
after a couple release, we remove it.
2. We need a warning that things will change in a later release and the
the right way to follow these changes is to switch the value of the
parameter.
3. We need a clear statement in whats_new.rst
4. Maybe we need to have a file for internal use where we document the
future changes, so that we don't forget this (and other things).
My 2 cents,
Gaƫl
------------------------------------------------------------------------------
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general