Re: [Scikit-learn-general] Interface for data imputation

Mathieu Blondel Thu, 20 Jun 2013 22:08:45 -0700

On Fri, Jun 21, 2013 at 1:28 PM, Lars Buitinck <l.j.buiti...@uva.nl> wrote:


> Besides, scipy.sparse is hard to update in-place, is a very wasteful
> representation for dense data and is harder to work with than np.array
> (for us, but more importantly for users).
>

Dense formats like masked arrays or arrays with missing-values encoded by
NaN won't work for recommender datasets (unless you can fit a n_users x
n_items dense matrix in memory). So, we do need to find a suitable sparse
format to work with, since the second half of Nicolas's GSOC is about
matrix completion.
In my research, I have used CSR matrices, since they can be processed very
efficiently in Cython. Since CSR matrices are indeed non-trivial to
construct, we could provide an utility function which takes an iterator of
(feature_index, feature_value) pairs and produces a CSR matrix (with
hardcoded zero-values safely preserved).

Mathieu

------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Interface for data imputation

Reply via email to