On Fri, Jun 21, 2013 at 6:56 AM, Nicolas Trésegnie <
nicolas.treseg...@gmail.com> wrote:
>
> - To impute only some of the missing values (rows, columns or a
> combination)
>
> I think this can be added later if you have time. For now, I would rather
not clutter the API.
For rows, one can just use a mask: impute(X[mask]). Columns seem more
problematic.
>
> - To impute in-place or in a new array
>
> For me, data imputation is simply a particular transformation of the data.
> Particular in the the sense that it doesn't change the shape of the data
> and could be done in-place. So, I suggest to use the existing transform()
> and inverse_transform() and fit_transform() methods:
>
> - The transform() method would impute the data
>
> I think I like calling it transform() but one concern is how this will
play in the matrix-factorization based matrix completion object.
Similarly to PCA or other matrix factorization algorithms from
sklearn.decomposition, transform() could also be used to transform the data
to the latent space.
In fact, for matrix-factorization based matrix completion,
inverse_transform would precisely perform imputation, although I admit the
name is a bit counter-intuitive in this case. Below I show how this would
work:
In [1]: X = np.random.random((1000, 100))
In [2]: from sklearn.completion import MatrixFactorization
In [3]: mf = MatrixFactorization(n_components=10).fit(X)
In [4]: mf.components_.shape
Out[4]: (10, 100)
# Transform data to latent space
In [5]: Xt = mf.transform(X)
In [6]: Xt.shape
Out[6]: (1000, 10)
# Impute missing values
In [7]: mf.inverse_transform(Xt).shape
Out[7]: (1000, 100)
If we want to keep "transform" for imputation, we thus need two classes,
one for factorization and one for completion (the fit method would be the
same, though).
> - The inverse_transform() would remove the data from the selected
> rows/columns
>
> Can you elaborate?
Mathieu
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:
Build for Windows Store.
http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general