Re: [Pytables-users] How Fast is File.__contains, File.getNode, File.createTable?

Jacob Bennett Thu, 28 Jun 2012 08:41:59 -0700

Hey Anthony,

Awesome, I think I'm going to take your advice for aiming towards larger
tables. Just an inquiry though, let's say you keep track of a
dictionary/hashtable that maps node identifiers (keys) to instances of the
node object (values) which can be assigned during node creation. ie*
mydict['id'] = thisFile.createTable(params). I think this could actually
help get away from the expensive search calls. I'm still going to go with
larger tables though, since I have to read the data eventually.


Thanks Again For Your Time,
Jacob

On Thu, Jun 28, 2012 at 10:16 AM, Anthony Scopatz <scop...@gmail.com> wrote:

> Hi Jacob,
>
> This is not a solely PyTables issue.  As described the methods you mention
> all involve attribute (or metadata) access, which is notaoriously slow in
> HDF5.  Or rather, much slower that read/write from the datasets (Tables,
> Arrays) themselves.    Generally, having a single table with 3E8 rows will
> be faster than searching through 3E3 tables with 1E5 rows.    If there is
> any way you can represent you data in a sane way to have larger tables, I
> would recommend that you try this.
>
> The other option too is to simply have an initialization step where you
> create the all of the tables and then another loop where you append to all
> of them, rather than searching through 3000 tables 3000 times.   For
> example:
>
> for i in range(3000):
>     f.root.createTable("i" + str(i))
>
> for i in range(3000):
>     tab = f.getNode("/i" + str(i))
>     tab.append(...)
>
> In the above pseudocode, __contains__ is never called - let alone calling
> it 3 times, like in your previous email.  In effect the time that you are
> spending searching in your previous email is 3000 tables x 3000 loop
> iterations times 3 if-else branches.    So you are automatically in a 9 -
> 27 million iteration, just by the way you have been using contains.
>
> I really think that pre-creating the tables so that you *know* that they
> are there and just have to get the nodes will be far faster for you.
>
> Be Well
> Anthony
>
> On Wed, Jun 27, 2012 at 2:33 PM, Jacob Bennett 
> <jacob.bennet...@gmail.com>wrote:
>
>> Hello PyTables Users,
>>
>> I am asking this quick question because my application is currently
>> horribly bottlenecking on these methods, all of which are called once
>> before each Table.append(rows). The table writing on the other hand is
>> much, much faster than the searching for the table.
>>
>> Any general discussion on this would be great. The current hierarchy
>> consists of root leading to around 3000 nodes each of which have around
>> 100000 rows.
>>
>> Thanks,
>> Jacob
>>
>> --
>> Jacob Bennett
>> Massachusetts Institute of Technology
>> Department of Electrical Engineering and Computer Science
>> Class of 2014| benne...@mit.edu
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Pytables-users mailing list
>> Pytables-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>


-- 
Jacob Bennett
Massachusetts Institute of Technology
Department of Electrical Engineering and Computer Science
Class of 2014| benne...@mit.edu

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/

_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] How Fast is File.__contains, File.getNode, File.createTable?

Reply via email to