Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Olivier Grisel
Le 17 avril 2012 09:20, Gael Varoquaux a écrit : > On Tue, Apr 17, 2012 at 04:16:47PM +0200, Gael Varoquaux wrote: >> What do people think about my solution 'scale_params'? I thought that it >> was a way to make everybody happy, but I don't seem to be getting >> traction. > > I have opened a ticke

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Gael Varoquaux
On Tue, Apr 17, 2012 at 07:27:39AM -0700, Olivier Grisel wrote: > > Btw I feel it is somewhat of a problem to undo what was done in the current > > master, as I would guess some people are already working with that. > I a assume that people working with the master can expect this kind of > semanti

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Gael Varoquaux
On Tue, Apr 17, 2012 at 04:22:50PM +0200, Alexandre Gramfort wrote: > what would be the semantic of scale_params? scale_params=False by default scale_params=True would scale the parameters by a data-dependent terms such as C_min. > shall we touch every estimator No, because for some estimators w

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Gael Varoquaux
On Tue, Apr 17, 2012 at 04:16:47PM +0200, Gael Varoquaux wrote: > What do people think about my solution 'scale_params'? I thought that it > was a way to make everybody happy, but I don't seem to be getting > traction. I have opened a ticket with this idea. As it's Jaques' birthday today, he was

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Gael Varoquaux
On Wed, Apr 18, 2012 at 01:10:12AM +0900, Mathieu Blondel wrote: >On Tue, Apr 17, 2012 at 11:16 PM, Gael Varoquaux ><[1][email protected]> wrote: > What do people think about my solution 'scale_params'? I thought that it > was a way to make everybody happy, but I don'

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Mathieu Blondel
On Tue, Apr 17, 2012 at 11:16 PM, Gael Varoquaux < [email protected]> wrote: > > What do people think about my solution 'scale_params'? I thought that it > was a way to make everybody happy, but I don't seem to be getting > traction. > > What would be the default value for "scale_param

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Olivier Grisel
Le 17 avril 2012 07:23, Andreas Mueller a écrit : > > Btw I feel it is somewhat of a problem to undo what was done in the current > master, as I would guess some people are already working with that. I a assume that people working with the master can expect this kind of semantic shift to occur fr

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Alexandre Gramfort
what would be the semantic of scale_params? shall we touch every estimator or assume scale_params=True if not present as attribute? Alex On Tue, Apr 17, 2012 at 4:16 PM, Gael Varoquaux wrote: > On Tue, Apr 17, 2012 at 03:46:10PM +0200, Andreas Mueller wrote: >> I agree that they show that scali

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Andreas Mueller
Am 17.04.2012 16:16, schrieb Gael Varoquaux: > On Tue, Apr 17, 2012 at 03:46:10PM +0200, Andreas Mueller wrote: >> I agree that they show that scaling C seems better. >> BUT: I would not agree with Gael that scale_C=False is broken. >> Even with few samples, it is very hard to actually generate the

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Gael Varoquaux
On Tue, Apr 17, 2012 at 03:46:10PM +0200, Andreas Mueller wrote: > I agree that they show that scaling C seems better. > BUT: I would not agree with Gael that scale_C=False is broken. > Even with few samples, it is very hard to actually generate the problem. > You need to have a learning problem

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Mathieu Blondel
On Tue, Apr 17, 2012 at 10:48 PM, Andreas Mueller wrote: > ** > Am 17.04.2012 15:45, schrieb Mathieu Blondel: > > > > On Tue, Apr 17, 2012 at 10:31 PM, Olivier Grisel > wrote: > >> >> 1- use C and scale_C=False by default and document extensively the >> importance of scale_C=True when doing model

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Andreas Mueller
Am 17.04.2012 15:45, schrieb Mathieu Blondel: On Tue, Apr 17, 2012 at 10:31 PM, Olivier Grisel mailto:[email protected]>> wrote: 1- use C and scale_C=False by default and document extensively the importance of scale_C=True when doing model selection with small number of s

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Andreas Mueller
Am 17.04.2012 15:06, schrieb Alexandre Gramfort: > what's killing me is that andy's plot shows that scale_C is the way to > go so it's not just me. Also libsvm/liblinear bindings are the only > models that have a regularization parameter that depends on the > numbers of samples. Either we stick to

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Mathieu Blondel
On Tue, Apr 17, 2012 at 10:31 PM, Olivier Grisel wrote: > > 1- use C and scale_C=False by default and document extensively the > importance of scale_C=True when doing model selection with small > number of samples. (I am ok for the ugly warning in the grid search > class). > Setting scale_C to No

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Peter Prettenhofer
2012/4/17 Olivier Grisel : > ... > > Has anybody tried to confirm that this is a libsvm / liblinear > specific thing? How do shogun, svmlight and other non-libsvm SVM > implementation deal with this? As far as I can tell svm^light uses the same formulation as libsvm; For svm^rank they changed it t

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Olivier Grisel
Le 17 avril 2012 06:31, Olivier Grisel a écrit : > > 2- use alpha as in the rest of the other scikit-learn models and have > the default value of alpha set to None or "auto" that will be set to > `n_samples` in the fit method since `C=1` (unscaled) gives a good > baseline in practice on normalized

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Olivier Grisel
Le 17 avril 2012 06:06, Alexandre Gramfort a écrit : > what's killing me is that andy's plot shows that scale_C is the way to > go so it's not just me. Also libsvm/liblinear bindings are the only > models that have a regularization parameter that depends on the > numbers of samples. Has anybody t

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Gael Varoquaux
On Tue, Apr 17, 2012 at 10:07:38PM +0900, Mathieu Blondel wrote: >We can rename scale_C to scale_penalty or scale_params and use this option >wherever there's a dataset size-dependent option in the constructor... Please, will you stop reading my mind. It's a bit disturbing. Especially sinc

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Vlad Niculae
On Apr 17, 2012, at 15:53 , Alexandre Gramfort wrote: >> I think just moving from a train set to a test set would be problematic for >> small n_samples. > > what do you suggest? > I agree with your scale_C=None suggestion because it would (in theory) force the user to become aware of what th

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Gael Varoquaux
On Tue, Apr 17, 2012 at 02:39:33PM +0200, Alexandre Gramfort wrote: > ok I give up… Let's move back to scale_C=None that spits a warning to > strongly suggest users to make their choice. We could do it, but it's broken. Basically this choice would be accepting that in the small sample situation yo

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Mathieu Blondel
On Tue, Apr 17, 2012 at 9:56 PM, Lars Buitinck wrote: > > I'm not very fond of adding estimator-specific heuristics to > general-purpose modules... > We can rename scale_C to scale_penalty or scale_params and use this option wherever there's a dataset size-dependent option in the constructor...

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Alexandre Gramfort
what's killing me is that andy's plot shows that scale_C is the way to go so it's not just me. Also libsvm/liblinear bindings are the only models that have a regularization parameter that depends on the numbers of samples. Either we stick to libsvm and we have an inconsistent grid search + an incon

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Nelle Varoquaux
On 17/04/2012, Gael Varoquaux wrote: > On Tue, Apr 17, 2012 at 02:56:13PM +0200, Lars Buitinck wrote: >> >> > This way people who don't read the doc (the majority of the users) >> >> > will not fall in the libsvm-gives-different-results trap and will >> >> > have >> >> > the tools to not fall in t

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Gael Varoquaux
On Tue, Apr 17, 2012 at 02:56:13PM +0200, Lars Buitinck wrote: > >> > This way people who don't read the doc (the majority of the users) > >> > will not fall in the libsvm-gives-different-results trap and will have > >> > the tools to not fall in the statistical inconsistency trap if they > >> > ma

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Alexandre Gramfort
> I'm not very fond of adding estimator-specific heuristics to > general-purpose modules... I agree… it looks like a deadlock… Alex -- Better than sec? Nothing is better than sec when it comes to monitoring Big Data app

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Lars Buitinck
Op 17 april 2012 14:28 heeft Mathieu Blondel het volgende geschreven: > On Tue, Apr 17, 2012 at 9:17 PM, Andreas Mueller > wrote: >> >> > This way people who don't read the doc (the majority of the users) >> > will not fall in the libsvm-gives-different-results trap and will have >> > the tools t

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Alexandre Gramfort
> I think just moving from a train set to a test set would be problematic for > small n_samples. what do you suggest? Alex -- Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Vlad Niculae
I think just moving from a train set to a test set would be problematic for small n_samples. Vlad On Apr 17, 2012, at 15:48 , Olivier Grisel wrote: > Le 17 avril 2012 05:39, Gael Varoquaux a > écrit : >> On Tue, Apr 17, 2012 at 03:35:26PM +0300, Dimitrios Pritsos wrote: >>>If you would li

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Gael Varoquaux
On Tue, Apr 17, 2012 at 05:48:14AM -0700, Olivier Grisel wrote: > _it does not work_ => grid search / model selection does not work. More generally, you value for C must change depending on the number of samples that you have. G ---

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Olivier Grisel
Le 17 avril 2012 05:39, Gael Varoquaux a écrit : > On Tue, Apr 17, 2012 at 03:35:26PM +0300, Dimitrios Pritsos wrote: >>    If you would like the opinion of user (i.e. me) I think this is the best >>    solution for intuitive use of the Lib. And having scale_C=False as >>    default. > > For small

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Alexandre Gramfort
>> > This way people who don't read the doc (the majority of the users) >> > will not fall in the libsvm-gives-different-results trap and will have >> > the tools to not fall in the statistical inconsistency trap if they >> > make the effort to read the doc. >> >> + .5 > > +1 > > And we could add a

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Gael Varoquaux
On Tue, Apr 17, 2012 at 03:35:26PM +0300, Dimitrios Pritsos wrote: >If you would like the opinion of user (i.e. me) I think this is the best >solution for intuitive use of the Lib. And having scale_C=False as >default. For small number of samples, _it does not work_. Period, there is n

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Dimitrios Pritsos
On 04/17/2012 03:28 PM, Mathieu Blondel wrote: On Tue, Apr 17, 2012 at 9:17 PM, Andreas Mueller mailto:[email protected]>> wrote: > This way people who don't read the doc (the majority of the users) > will not fall in the libsvm-gives-different-results trap and will h

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Mathieu Blondel
On Tue, Apr 17, 2012 at 9:17 PM, Andreas Mueller wrote: > > This way people who don't read the doc (the majority of the users) > > will not fall in the libsvm-gives-different-results trap and will have > > the tools to not fall in the statistical inconsistency trap if they > > make the effort to r

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Andreas Mueller
Am 17.04.2012 14:14, schrieb Olivier Grisel: > Le 17 avril 2012 02:45, Gael Varoquaux a > écrit : >> @scikit-learn developers: >> >> Hum... >> http://www.flickr.com/photos/scriptingnews/3503448168/sizes/o/in/photostream/ > hahaha > >> The situation is that the authors of libSVM have chosen a solu

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Olivier Grisel
Le 17 avril 2012 02:45, Gael Varoquaux a écrit : > @scikit-learn developers: > > Hum... > http://www.flickr.com/photos/scriptingnews/3503448168/sizes/o/in/photostream/ hahaha > The situation is that the authors of libSVM have chosen a solution that > leads to inconsistent estimator with bad stat

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Andreas Mueller
Am 17.04.2012 11:45, schrieb Gael Varoquaux: > @scikit-learn developers: > > Hum... > http://www.flickr.com/photos/scriptingnews/3503448168/sizes/o/in/photostream/ My office mate just asked me whether that was the scikits users in front and the developers in the back :-/ --

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Paolo Losi
On Tue, Apr 17, 2012 at 11:45 AM, Gael Varoquaux < [email protected]> wrote: > > On the one hand, we really cannot have C the way the libSVM guy have > defined it, because parameter setting by cross-validation will not work. > On the other hand, it is clear that people keep tripping ove

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Gael Varoquaux
@scikit-learn developers: Hum... http://www.flickr.com/photos/scriptingnews/3503448168/sizes/o/in/photostream/ The situation is that the authors of libSVM have chosen a solution that leads to inconsistent estimator with bad statistical properties, but works well on many datasets. I think it is wr

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Dimitrios Pritsos
Hello G, Yes you are right the scale_C should be False for working as expected. Great because I prefer to work with the latest version. Thank you G Dimitrios On 04/17/2012 12:13 PM, Dimitrios Pritsos wrote: > > Ok I will do that now and I will let you know in 45 min > > On 04/17/2012 12:10

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Dimitrios Pritsos
Ok I will do that now and I will let you know in 45 min On 04/17/2012 12:10 PM, Gael Varoquaux wrote: > On Tue, Apr 17, 2012 at 12:08:46PM +0300, Dimitrios Pritsos wrote: >> I was running a test using SVC(c=1, kernel='linear') and I found that >> for the latest version of sklearn the results are

Re: [Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Gael Varoquaux
On Tue, Apr 17, 2012 at 12:08:46PM +0300, Dimitrios Pritsos wrote: > I was running a test using SVC(c=1, kernel='linear') and I found that > for the latest version of sklearn the results are WRONG! What does 'wrong' mean? Something that changed in the scikit, is that the 'c' is scaled by the num

[Scikit-learn-general] SERIOUS BUG

2012-04-17 Thread Dimitrios Pritsos
Hello List, I was running a test using SVC(c=1, kernel='linear') and I found that for the latest version of sklearn the results are WRONG! So I rolled back with git to this HEAD commit 5c2a8696e3184fdb5e2ca5c55e61fe29ebd37fbb Author: Andreas Mueller Date: Mon Jan 23 21:13:55 2012 +0100