Hi all,

I have a sparse 3.4M x 3.4M adjacency matrix with nnz = 23M and wanted 
to see if CArray was an appropriate solution for storing it. Right now I 
am using the NumPy binary format for storing the data in coordinate 
format and loading the matrix with Scipy's sparse coo_matrix class. As 
far as I understand, with CArray the matrix would be written in full 
(zeros included) but a) since it's chunked accessing it does not take 
memory and b) with compression enabled it would possible to keep the 
size of the file reasonable.

If my assumptions are correct, then here is my problem: I am running 
into problems when writing the CArray to disk. I adapted the example 
from the documentation [1] and when I run the code on a 6000x6000 matrix 
with nnz = 17K I achieve a decent speed of roughly 4100 elements/s. 
However, when I try it on the full matrix the writing speed drops to 4 
elements/s. Am I doing something wrong? Any feedback would be greatly 
appreciated!

Code: https://gist.github.com/junkieDolphin/5843064

Cheers,

Giovanni

[1] 
http://pytables.github.io/usersguide/libref/homogenous_storage.html#the-carray-class

-- 
Giovanni Luca Ciampaglia

☞ http://www.inf.usi.ch/phd/ciampaglia/
✆ (812) 287-3471
✉ glciamp...@gmail.com


------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to