So I now am keeping summaries in a separate table (3 actually, 2 VLArrays for the attributes and dtypes and 1 Table for the numeric data). It is 30 times faster to load for a small set of tables (59 mS compared to 1.7S), and 50 times faster for a large set of table (714mS compared to 35.5S).
Is it generally faster to keep the summaries in a separate hdf5 file or are they fine in the same file? Thx David -----Original Message----- From: Francesc Alted [mailto:fal...@pytables.org] Sent: Mon 13-Dec-2010 18:54 To: pytables-users@lists.sourceforge.net Subject: Re: [Pytables-users] MemoryError on reading attributes David, your screen captures were too large and your message bounced. I'm copying your message here. Your scripts are also attached. See my comments interspersed in your message. A Monday 13 December 2010 10:56:04 david.bri...@ubs.com escrigué: > ;o) here's the screen captures > > The file I corrupted I opened for append but had only read data!!! Uh, that's really ugly. Anyway, if you are not going to update the file, it is safer to open it in 'r'ead-only mode. > Good to know it's a well-known limitation of HDF5 though. Yup. Hope they fix this more sooner than later. > --- > > Ok I can now replicate without my app. > > Script1.py builds a large db. > > Script2.py opens the db and summarises it. > > Also enclosed are some screen shots of the task manager as script2 is > running. > > My conclusions are: > 1) pytables is not designed to safely manage memory, > 2) I should keep any summaries in a separate table if I am to > open the data base quickly (and without causing a > MemoryError). Well, your scripts were putting all the nodes in the object tree on a list, and that is the reason for the 'leak'. You don't need to put all the nodes on a list (in fact, this is strongly discouraged, for the reasons that you have seen) in order to iterate through the some selected nodes; for this a generator is way better. The next patch converts the function generating the list into a generator: """ --- script2.py 2010-12-13 19:13:44.000000000 +0100 +++ script2-modif.py 2010-12-13 19:17:23.000000000 +0100 @@ -22,8 +22,7 @@ if node._v_attrs.__getattr__(items[0]) <> items[1]: matches = False break - if matches: answer.append(node) - return answer + if matches: yield node def openDB(ptFilename): """ After applying this, script2 consumes 80 MB instead of 3.2 GB. And times are also similar (13.2 s for the patched version versus 14.2 s). > My intention is not to criticise but to understand where the limits > are. No offense taken ;-) > If the above is a fair evaluation (?) then my application should use > pytables to manage the data on disk but not in memory. My only > concern is how to stop pytables consuming all the memory if I need > to access many tables. > > Is it possible to drop the data structures that access a given table > from memory? Do I need to close the file occasionally or is there a > way to say drop table xyz from cache? (I'm wondering how using > node._f_close() affects performance?) > > Many thx > Hope this helps > David -- Francesc Alted Visit our website at http://www.ubs.com This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mails are not encrypted and cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. This message is provided for informational purposes and should not be construed as a solicitation or offer to buy or sell any securities or related financial instruments. UBS Limited is a company limited by shares incorporated in the United Kingdom registered in England and Wales with number 2035362. Registered office: 1 Finsbury Avenue, London EC2M 2PP. UBS Limited is authorised and regulated by the Financial Services Authority. UBS AG is a public company incorporated with limited liability in Switzerland domiciled in the Canton of Basel-City and the Canton of Zurich respectively registered at the Commercial Registry offices in those Cantons with Identification No: CH-270.3.004.646-4 and having respective head offices at Aeschenvorstadt 1, 4051 Basel and Bahnhofstrasse 45, 8001 Zurich, Switzerland. Registered in the United Kingdom as a foreign company with No: FC021146 and having a UK Establishment registered at Companies House, Cardiff, with No: BR 004507. The principal office of UK Establishment: 1 Finsbury Avenue, London EC2M 2PP. In the United Kingdom, UBS AG is authorised and regulated by the Financial Services Authority. UBS reserves the right to retain all messages. Messages are protected and accessed only in legally justified cases. ------------------------------------------------------------------------------ Lotusphere 2011 Register now for Lotusphere 2011 and learn how to connect the dots, take your collaborative environment to the next level, and enter the era of Social Business. http://p.sf.net/sfu/lotusphere-d2d _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users