Re: [Scikit-learn-general] SERIOUS BUG

Alexandre Gramfort Tue, 17 Apr 2012 07:23:28 -0700

what would be the semantic of scale_params?

shall we touch every estimator or assume scale_params=True if not
present as attribute?


Alex

On Tue, Apr 17, 2012 at 4:16 PM, Gael Varoquaux
<[email protected]> wrote:
> On Tue, Apr 17, 2012 at 03:46:10PM +0200, Andreas Mueller wrote:
>> I agree that they show that scaling C seems better.
>
>> BUT: I would not agree with Gael that scale_C=False is broken.
>
>> Even with few samples, it is very hard to actually generate the problem.
>> You need to have a learning problem that is VERY sensitive to the value
>> of C and you need to have a difference in size of the validation set that
>> is larger than the tolerance you have for C.
>
> For the logistic regression with l1 penalty, this happens very easily.
> For the logistic regression with l1 penalty, it is not as bad. In
> practice, in our problems, it seems that for SVMs, as long as C is 'big
> enough', there is not catastrophic failure, though things might be
> slightly suboptimal. I realize that the last sentence is in contradiction
> with the comon wisdom in statistics that tells you that you are better
> off with over-penalizing than under-penalizing.
>
> We found the problem using a logistic-l1 with n_samples ~ 200 and
> n_features ~ 50000.
>
> What do people think about my solution 'scale_params'? I thought that it
> was a way to make everybody happy, but I don't seem to be getting
> traction.
>
> I'd like us to give a good thought to this problem, as I think that it can
> be a recurrent pain. I'd actually be happy delaying the release a couple
> weeks and reaching a solution that we believe actually solves the problem
> for all classes of users (beginners and experts, large and small
> n_samples).
>
> Gaël
>
> ------------------------------------------------------------------------------
> Better than sec? Nothing is better than sec when it comes to
> monitoring Big Data applications. Try Boundary one-second
> resolution app monitoring today. Free.
> http://p.sf.net/sfu/Boundary-dev2dev
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] SERIOUS BUG

Reply via email to