2013/6/21 Joel Nothman <jnoth...@student.usyd.edu.au>:
> As long as the representation of unknown values is known (be it a particular
> value, or use of a masked array), writing a Transformer should be pretty
> straightforward, but I don't understand why you need extra arguments to
> transform (which you imply by linking to #1963), or how inverse_transform
> could possibly work. Could you give us an example?

I'd say use either masked arrays, or NaN. Masked arrays seems designed
for this purpose. Both have the benefit that all other estimators can,
in the shared input validation code, raise an exception when they
encounter an MA or a NaN that points to the imputation code.

For inspiration, here's a simplistic imputer:
https://gist.github.com/larsmans/5828792
(This doesn't handle fit_transform correctly because when its training
set has missing values, the result of mean() contains NaN.)

> In the current scipy.sparse implementation, the value of non-encoded data in
> sparse matrices is necessarily zero, and setting cells to zero makes them
> disappear in sparse matrix transformations. So you can't use unfilled cells
> as missing data, except where 0 isn't an option for actual values. In
> general, you can allow the missing value indicator to be set as a
> transformer parameter.

Besides, scipy.sparse is hard to update in-place, is a very wasteful
representation for dense data and is harder to work with than np.array
(for us, but more importantly for users).

> Currently, -1 is used for missing target values for semi-supervised
> learning, not that there's a lot of it in scikit-learn. See #547, #430.

-1 is a very valid feature value, though. It's only treated as a
special label value in a few restricted cases (semi-supervised
learning, outlier detection).

-- 
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam

------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to