2011/11/23 Andreas Müller <[email protected]>:
> Hi everybody.
> Me again. I was getting some unexpected behaviour from the error metrics.
> Consider the following:
>
> import numpy as np
> from sklearn.datasets import load_digits
> from sklearn.metrics import zero_one_score
>
> zero_one_score(digits.target, np.vstack(digits.target))
>
>  >>> 0.10
>
> The shape of digits.target is (1797,), the shape
> of the stacked version is (1797, 1).
> That seems to cause broadcasting in "==".

Good catch.

> I thought utils.check_arrays was meant to
> avoid such problems, but it does not change the shape
> of these two arrays.
>
> What did I do wrong or what did I misunderstand here?
>
> Obviously I could reshape either array so that no broadcasting
> happens. I feel the problem is somewhat subtle, though,
> and it took me 3 hours to find.
>
> If you feel that is a problem, should it be addressed in "check_arrays"?

IMHO, we should have a specific check for 1D, integer arrays used for
targets in classification tasks and another specific check for
regression tasks with explicit docstring telling what we check and
explicit ValueError message explicating what we where expecting and
what we got instead.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to