I have only glanced at the problem so I may have missed something but my approach to a large matrix would be to realise it is a flat file and mmap it. Your program would then treat it as a memory resident structure. The VM features of the OS would perform paging as necessary to keep a working set of the matrix in real memory.

Andreas wrote:

Hello Benilton,

some years ago i came across pyTables ( http://www.pytables.org ). It's a wrapper for the HDF5-format. PyTables claims to handle high data-thruput very well. It supports Matrix/Array-formats as these are typically used in scientific-projects. PyTables does not provide any form of relational-model, but it sounds to me that this is probably not what you need in first place. Maybe u can boost the performance of u'r calculations as soon as u can load/store Arrays/Matrixes en piece. I used it for document-clustering and was very happy being able to store compressed-Arrays generated with Numeric/NumArray-packages. The performance on ~1000 documents inside the cluster was fine though not as critical as yours. I appreciated the ease of use and the chance to easily add metadata into the dataset. Yes and a Jva-Gui is also avail. I don't know if your data-sets/data-types fit into this scenario, but maybe you want to take a look into the FAQ [ http:// www.pytables.org/moin/FAQ#head- b32537aba805dac2a1bf9cd6606c4fddcd964f96 ].

good luck, andreas




-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------



-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------

Reply via email to