On 11/23/2011 03:08 PM, Olivier Grisel wrote: > 2011/11/23 Andreas Müller<[email protected]>: >> Hi everybody. >> Me again. I was getting some unexpected behaviour from the error metrics. >> Consider the following: >> >> import numpy as np >> from sklearn.datasets import load_digits >> from sklearn.metrics import zero_one_score >> >> zero_one_score(digits.target, np.vstack(digits.target)) >> >> >>> 0.10 >> >> The shape of digits.target is (1797,), the shape >> of the stacked version is (1797, 1). >> That seems to cause broadcasting in "==". > Good catch. > >> I thought utils.check_arrays was meant to >> avoid such problems, but it does not change the shape >> of these two arrays. >> >> What did I do wrong or what did I misunderstand here? >> >> Obviously I could reshape either array so that no broadcasting >> happens. I feel the problem is somewhat subtle, though, >> and it took me 3 hours to find. >> >> If you feel that is a problem, should it be addressed in "check_arrays"? > IMHO, we should have a specific check for 1D, integer arrays used for > targets in classification tasks and another specific check for > regression tasks with explicit docstring telling what we check and > explicit ValueError message explicating what we where expecting and > what we got instead. > That might be a good idea. Should the check for classifications tasks then be performed for each call to "fit" and each classification metric? I am not sure if you imply that want to check the dtype whether it is int. Or would you rather check that the array contains integers? Are there other requirements? I am not familiar enough with the implementation of the classification algorithms to say what kind of assumptions they make. Do labels have to be 0..n or [-1, 1] ?
Cheers, Andy ------------------------------------------------------------------------------ All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
