Hi Eustache,

Although this might be more time consuming than needed, I load a .csv file using `read_csv` in the `pandas` library (http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html). You will get a dataframe, say DATAFRAME that you can convert to a numpy array by executing the command, np.array(DATAFRAME).

But I wish for a faster way to do this; +1 for a utility that reads a .csv file directly into a dense or sparse array.

Thanks



On 7/29/2014 11:22 AM, Eustache DIEMERT wrote:
Hi list,

I've got a large dataset in a CSV or VW [0] format that I want to load into a sparse matrix (probably CSR).

I haven't found any utilities to do this out of the box.

It seems that `numpy.loadtxt` [1] doesn't take a matrix format.

On the other hand we have a utility for loading libsvm format into CSR matrices [2].

So my question is : is there some utility or snippet to load a CSV into CSR that I overlooked ?

If not, I'm pondering to submit a PR to add a utility to read CSV & VW format into sparse matrices. I'm including VW format as it has some interesting features: sparsity and keeping features names (a kind of advanced svmlight format).

What do you think ?

[0] https://github.com/JohnLangford/vowpal_wabbit/wiki/Input-format
[1] http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html
[2] https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/datasets/svmlight_format.py#L253


------------------------------------------------------------------------------
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls.
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk


_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls. 
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to