2011/11/23 Andreas Müller <[email protected]>: > Hi everybody. > Me again. I was getting some unexpected behaviour from the error metrics. > Consider the following: > > import numpy as np > from sklearn.datasets import load_digits > from sklearn.metrics import zero_one_score > > zero_one_score(digits.target, np.vstack(digits.target)) > > >>> 0.10 > > The shape of digits.target is (1797,), the shape > of the stacked version is (1797, 1). > That seems to cause broadcasting in "==".
Good catch. > I thought utils.check_arrays was meant to > avoid such problems, but it does not change the shape > of these two arrays. > > What did I do wrong or what did I misunderstand here? > > Obviously I could reshape either array so that no broadcasting > happens. I feel the problem is somewhat subtle, though, > and it took me 3 hours to find. > > If you feel that is a problem, should it be addressed in "check_arrays"? IMHO, we should have a specific check for 1D, integer arrays used for targets in classification tasks and another specific check for regression tasks with explicit docstring telling what we check and explicit ValueError message explicating what we where expecting and what we got instead. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
