On 06/21/2013 08:52 AM, Mathieu Blondel wrote:
I think I like calling it transform() but one concern is how this will play in the matrix-factorization based matrix completion object. Similarly to PCA or other matrix factorization algorithms from sklearn.decomposition, transform() could also be used to transform the data to the latent space. In fact, for matrix-factorization based matrix completion, inverse_transform would precisely perform imputation, although I admit the name is a bit counter-intuitive in this case. Below I show how this would work:

If we want to keep "transform" for imputation, we thus need two classes, one for factorization and one for completion (the fit method would be the same, though).

      * The inverse_transform() would remove the data from the
        selected rows/columns

Can you elaborate?
Just forget what I said about the inverse_transform(), it does not make sense with what I describe below.

I think it would be a good idea to use transform() for the imputation. Like you said, when this method is already used to transform the data to the latent space, another class would be needed and the fit methods would be identical. I think it wouldn't be a problem. Actually, these new classes would have well-defined responsibilities so it would be easier to identify what they do.

For the representation, I tried a few things and came up with this small piece of code. <https://gist.github.com/NicolasTr/8d619dec4681164ac0bd> In this approach, each imputer can use different representations of the missing values when fitting and transforming the data. It allows the user to use the imputers in a pipeline. It will also be useful for the recommendation systems: 0 could represent missing values and nan missing values to impute. If _get_missing_values_mask() was modified to return a sparse mask (or inversed mask if more economic) when the data is in a sparse format, the imputers would have the responsibility to choose which kind of matrices they want to support.
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to