Re: [Pytables-users] MemoryError on reading attributes

david.briant Tue, 14 Dec 2010 12:10:54 -0800

So I now am keeping summaries in a separate table (3 actually, 2 VLArrays for 
the attributes and dtypes and 1 Table for the numeric data). It is 30 times 
faster to load for a small set of tables (59 mS compared to 1.7S), and 50 times 
faster for a large set of table (714mS compared to 35.5S).


Is it generally faster to keep the summaries in a separate hdf5 file or are 
they fine in the same file?

Thx

David

-----Original Message-----
From: Francesc Alted [mailto:[email protected]] 
Sent: Mon 13-Dec-2010 18:54
To: [email protected]
Subject: Re: [Pytables-users] MemoryError on reading attributes

David, your screen captures were too large and your message bounced.  
I'm copying your message here.  Your scripts are also attached.  See my 
comments interspersed in your message.

A Monday 13 December 2010 10:56:04 [email protected] escrigué:
> ;o) here's the screen captures
> 
> The file I corrupted I opened for append but had only read data!!!

Uh, that's really ugly.  Anyway, if you are not going to update the 
file, it is safer to open it in 'r'ead-only mode.

> Good to know it's a well-known limitation of HDF5 though.

Yup.  Hope they fix this more sooner than later.

> ---
> 
> Ok I can now replicate without my app.
> 
> Script1.py builds a large db.
> 
> Script2.py opens the db and summarises it.
> 
> Also enclosed are some screen shots of the task manager as script2 is
> running.
> 
> My conclusions are:
>         1) pytables is not designed to safely manage memory,
>         2) I should keep any summaries in a separate table if I am to
>         open the data base quickly (and without causing a
>         MemoryError).

Well, your scripts were putting all the nodes in the object tree on a 
list, and that is the reason for the 'leak'.  You don't need to put all 
the nodes on a list (in fact, this is strongly discouraged, for the 
reasons that you have seen) in order to iterate through the some 
selected nodes; for this a generator is way better.  The next patch 
converts the function generating the list into a generator:

"""
--- script2.py  2010-12-13 19:13:44.000000000 +0100
+++ script2-modif.py    2010-12-13 19:17:23.000000000 +0100
@@ -22,8 +22,7 @@
                 if node._v_attrs.__getattr__(items[0]) <> items[1]:
                     matches = False
                     break
-            if matches: answer.append(node)
-    return answer
+            if matches: yield node

 def openDB(ptFilename):
"""

After applying this, script2 consumes 80 MB instead of 3.2 GB.  And 
times are also similar (13.2 s for the patched version versus 14.2 s).
 
> My intention is not to criticise but to understand where the limits
> are.

No offense taken ;-)

> If the above is a fair evaluation (?) then my application should use
> pytables to manage the data on disk but not in memory. My only
> concern is how to stop pytables consuming all the memory if I need
> to access many tables.
> 
> Is it possible to drop the data structures that access a given table
> from memory? Do I need to close the file occasionally or is there a
> way to say drop table xyz from cache? (I'm wondering how using
> node._f_close() affects performance?)
> 
> Many thx
> 

Hope this helps

> David

-- 
Francesc Alted
Visit our website at http://www.ubs.com 

This message contains confidential information and is intended only 
for the individual named. If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail. Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system. 

E-mails are not encrypted and cannot be guaranteed to be secure or 
error-free as information could be intercepted, corrupted, lost, 
destroyed, arrive late or incomplete, or contain viruses. The sender 
therefore does not accept liability for any errors or omissions in the 
contents of this message which arise as a result of e-mail transmission. 
If verification is required please request a hard-copy version. This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities 
or related financial instruments. 

UBS Limited is a company limited by shares incorporated in the United 
Kingdom registered in England and Wales with number 2035362. 
Registered office: 1 Finsbury Avenue, London EC2M 2PP.  UBS Limited 
is authorised and regulated by the Financial Services Authority. 

UBS AG is a public company incorporated with limited liability in 
Switzerland domiciled in the Canton of Basel-City and the Canton of 
Zurich respectively registered at the Commercial Registry offices in 
those Cantons with Identification No: CH-270.3.004.646-4 and having 
respective head offices at Aeschenvorstadt 1, 4051 Basel and 
Bahnhofstrasse 45, 8001 Zurich, Switzerland.  Registered in the 
United Kingdom as a foreign company with No: FC021146 and having a 
UK Establishment registered at Companies House, Cardiff, with No:  
BR 004507.  The principal office of UK Establishment: 1 Finsbury Avenue, 
London EC2M 2PP.  In the United Kingdom, UBS AG is authorised and 
regulated by the Financial Services Authority.

UBS reserves the right to retain all messages. Messages are protected 
and accessed only in legally justified cases. 

------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] MemoryError on reading attributes

Reply via email to