Do you store zero entries explicitly in your CSV format? CSV doesn't strike
me as the best choice for representing sparse data...

M.


On Sun, Aug 31, 2014 at 5:21 PM, Eustache DIEMERT <eusta...@diemert.fr>
wrote:

> @Lars, shouldn't the last line of the for loop be
>
>   indptr.append(indptr[-1]+len(nonzero))
>
> rather than
>
>    indptr.append(i)
>
> ?
>
> FYI, here is the PR to include your snippet into the doc:
>
> https://github.com/scikit-learn/scikit-learn/pull/3610
>
> Eustache
>
>
> 2014-07-29 11:24 GMT+02:00 Lars Buitinck <larsm...@gmail.com>:
>
> 2014-07-29 10:22 GMT+02:00 Eustache DIEMERT <eusta...@diemert.fr>:
>> > So my question is : is there some utility or snippet to load a CSV into
>> CSR
>> > that I overlooked ?
>>
>> No, but it's not that hard to write [1].
>>
>>
>> >>> import array
>> >>> data = array.array("f")
>> >>> indices = array.array("i")
>> >>> indptr = array.array("i", [0])
>> >>> for i, row in enumerate(csv.reader(f), 1):
>> ...     row = np.array(map(float, row))
>> ...     n_features = len(row)
>> ...     nonzero = np.where(row)[0]
>> ...     data.extend(row[nonzero])
>> ...     indices.extend(nonzero)
>> ...     indptr.append(i)
>> ...
>> >>> X = csr_matrix((data, indices, indptr), dtype=float, shape=(i,
>> n_features))
>>
>>
>> Instead of arrays, you can also use plain lists. Arrays take less
>> space, but they can be a tiny bit slower than lists.
>>
>>
>> [1] https://gist.github.com/larsmans/fe2a289818299dcb094a
>>
>>
>> ------------------------------------------------------------------------------
>> Infragistics Professional
>> Build stunning WinForms apps today!
>> Reboot your WinForms applications with our WinForms controls.
>> Build a bridge from your legacy apps to the future.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
>
> ------------------------------------------------------------------------------
> Slashdot TV.
> Video for Nerds.  Stuff that matters.
> http://tv.slashdot.org/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Slashdot TV.  
Video for Nerds.  Stuff that matters.
http://tv.slashdot.org/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to