Hi list,

I've got a large dataset in a CSV or VW [0] format that I want to load into
a sparse matrix (probably CSR).

I haven't found any utilities to do this out of the box.

It seems that `numpy.loadtxt` [1] doesn't take a matrix format.

On the other hand we have a utility for loading libsvm format into CSR
matrices [2].

So my question is : is there some utility or snippet to load a CSV into CSR
that I overlooked ?

If not, I'm pondering to submit a PR to add a utility to read CSV & VW
format into sparse matrices. I'm including VW format as it has some
interesting features: sparsity and keeping features names (a kind of
advanced svmlight format).

What do you think ?

[0] https://github.com/JohnLangford/vowpal_wabbit/wiki/Input-format
[1] http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html
[2]
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/datasets/svmlight_format.py#L253
------------------------------------------------------------------------------
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls. 
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to