2013/6/21 Joel Nothman <jnoth...@student.usyd.edu.au>: > As long as the representation of unknown values is known (be it a particular > value, or use of a masked array), writing a Transformer should be pretty > straightforward, but I don't understand why you need extra arguments to > transform (which you imply by linking to #1963), or how inverse_transform > could possibly work. Could you give us an example?
I'd say use either masked arrays, or NaN. Masked arrays seems designed for this purpose. Both have the benefit that all other estimators can, in the shared input validation code, raise an exception when they encounter an MA or a NaN that points to the imputation code. For inspiration, here's a simplistic imputer: https://gist.github.com/larsmans/5828792 (This doesn't handle fit_transform correctly because when its training set has missing values, the result of mean() contains NaN.) > In the current scipy.sparse implementation, the value of non-encoded data in > sparse matrices is necessarily zero, and setting cells to zero makes them > disappear in sparse matrix transformations. So you can't use unfilled cells > as missing data, except where 0 isn't an option for actual values. In > general, you can allow the missing value indicator to be set as a > transformer parameter. Besides, scipy.sparse is hard to update in-place, is a very wasteful representation for dense data and is harder to work with than np.array (for us, but more importantly for users). > Currently, -1 is used for missing target values for semi-supervised > learning, not that there's a lot of it in scikit-learn. See #547, #430. -1 is a very valid feature value, though. It's only treated as a special label value in a few restricted cases (semi-supervised learning, outlier detection). -- Lars Buitinck Scientific programmer, ILPS University of Amsterdam ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general