Hi Jacob,
This is not a solely PyTables issue. As described the methods you mention
all involve attribute (or metadata) access, which is notaoriously slow in
HDF5. Or rather, much slower that read/write from the datasets (Tables,
Arrays) themselves. Generally, having a single table with 3E8 rows will
be faster than searching through 3E3 tables with 1E5 rows. If there is
any way you can represent you data in a sane way to have larger tables, I
would recommend that you try this.
The other option too is to simply have an initialization step where you
create the all of the tables and then another loop where you append to all
of them, rather than searching through 3000 tables 3000 times. For
example:
for i in range(3000):
f.root.createTable("i" + str(i))
for i in range(3000):
tab = f.getNode("/i" + str(i))
tab.append(...)
In the above pseudocode, __contains__ is never called - let alone calling
it 3 times, like in your previous email. In effect the time that you are
spending searching in your previous email is 3000 tables x 3000 loop
iterations times 3 if-else branches. So you are automatically in a 9 -
27 million iteration, just by the way you have been using contains.
I really think that pre-creating the tables so that you *know* that they
are there and just have to get the nodes will be far faster for you.
Be Well
Anthony
On Wed, Jun 27, 2012 at 2:33 PM, Jacob Bennett <jacob.bennet...@gmail.com>wrote:
> Hello PyTables Users,
>
> I am asking this quick question because my application is currently
> horribly bottlenecking on these methods, all of which are called once
> before each Table.append(rows). The table writing on the other hand is
> much, much faster than the searching for the table.
>
> Any general discussion on this would be great. The current hierarchy
> consists of root leading to around 3000 nodes each of which have around
> 100000 rows.
>
> Thanks,
> Jacob
>
> --
> Jacob Bennett
> Massachusetts Institute of Technology
> Department of Electrical Engineering and Computer Science
> Class of 2014| benne...@mit.edu
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users