Re: [Scikit-learn-general] SERIOUS BUG

Andreas Mueller Tue, 17 Apr 2012 07:23:28 -0700

Am 17.04.2012 16:16, schrieb Gael Varoquaux:
> On Tue, Apr 17, 2012 at 03:46:10PM +0200, Andreas Mueller wrote:
>> I agree that they show that scaling C seems better.
>> BUT: I would not agree with Gael that scale_C=False is broken.
>> Even with few samples, it is very hard to actually generate the problem.
>> You need to have a learning problem that is VERY sensitive to the value
>> of C and you need to have a difference in size of the validation set that
>> is larger than the tolerance you have for C.
> For the logistic regression with l1 penalty, this happens very easily.
> For the logistic regression with l1 penalty, it is not as bad. In
> practice, in our problems, it seems that for SVMs, as long as C is 'big
> enough', there is not catastrophic failure, though things might be
> slightly suboptimal. I realize that the last sentence is in contradiction
> with the comon wisdom in statistics that tells you that you are better
> off with over-penalizing than under-penalizing.
>
> We found the problem using a logistic-l1 with n_samples ~ 200 and
> n_features ~ 50000.
>
Thanks for your explanation.
I would be curious how many samples you used in cross-validation
and how many in training.
> What do people think about my solution 'scale_params'? I thought that it
> was a way to make everybody happy, but I don't seem to be getting
> traction.
Sorry, can't really concentrate right now. Traction -> later.
> I'd like us to give a good thought to this problem, as I think that it can
> be a recurrent pain. I'd actually be happy delaying the release a couple
> weeks and reaching a solution that we believe actually solves the problem
> for all classes of users (beginners and experts, large and small
> n_samples).
+1


Btw I feel it is somewhat of a problem to undo what was done in the current
master, as I would guess some people are already working with that.

------------------------------------------------------------------------------
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] SERIOUS BUG

Reply via email to