On 06/21/2013 08:52 AM, Mathieu Blondel wrote:
I think I like calling it transform() but one concern is how this will
play in the matrix-factorization based matrix completion object.
Similarly to PCA or other matrix factorization algorithms from
sklearn.decomposition, transform() could also be used to transform the
data to the latent space.
In fact, for matrix-factorization based matrix completion,
inverse_transform would precisely perform imputation, although I admit
the name is a bit counter-intuitive in this case. Below I show how
this would work:
If we want to keep "transform" for imputation, we thus need two
classes, one for factorization and one for completion (the fit method
would be the same, though).
* The inverse_transform() would remove the data from the
selected rows/columns
Can you elaborate?
Just forget what I said about the inverse_transform(), it does not make
sense with what I describe below.
I think it would be a good idea to use transform() for the imputation.
Like you said, when this method is already used to transform the data to
the latent space, another class would be needed and the fit methods
would be identical. I think it wouldn't be a problem. Actually, these
new classes would have well-defined responsibilities so it would be
easier to identify what they do.
For the representation, I tried a few things and came up with this small
piece of code. <https://gist.github.com/NicolasTr/8d619dec4681164ac0bd>
In this approach, each imputer can use different representations of the
missing values when fitting and transforming the data. It allows the
user to use the imputers in a pipeline. It will also be useful for the
recommendation systems: 0 could represent missing values and nan missing
values to impute. If _get_missing_values_mask() was modified to return a
sparse mask (or inversed mask if more economic) when the data is in a
sparse format, the imputers would have the responsibility to choose
which kind of matrices they want to support.
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:
Build for Windows Store.
http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general