On Sun, Oct 2, 2011 at 11:35 AM, Olivier Grisel <[email protected]>wrote:

>
> > 100 pairs: avg=0.425, std=0.349106001094
> > 1000 pairs: avg=0.4725, std=0.354250970359
> > 10000 pairs:avg=0.48235, std=0.352155473477
> >
> > So, it is pretty clear to me that what I have here is either not the
> right
> > features builtin or just really noisy target data or both. As is, it
> seems
> > foolish and useless to pick a classifier and its parameters based on what
> I
> > have.
>
>
> An AUC of 0.50 is a random classifier. Either your data are pure noise
> or you classifier has an issue.
>

Yes, this is nuSVC and it behaves particularly badly but I intended the
numbers above to be merely an example of high variance. The logistic
regression classifier is slightly better (0.6xx) but my main question was
more related to whether or not it is expected to see such a high variance in
the performance of a classifier ?


> This will only work for binary classifiers with outcome 0 for the
> negative class and a non zero label for the positive class if I am not
>

Yes


> mistaken. Sounds a bit restrictive to me.
>

I do not know how prevalent these are. It just seems like knowing basic
statistics such as number of true positives etc. would be of general
interest.

Mathieu
-- 
Mathieu Lacage <[email protected]>
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, security
threats, fraudulent activity and more. Splunk takes this data and makes
sense of it. Business sense. IT sense. Common sense.
http://p.sf.net/sfu/splunk-d2dcopy1
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to