Re: [Pytables-users] searching for group names

Anthony Scopatz Mon, 05 Aug 2013 07:51:37 -0700

On Mon, Aug 5, 2013 at 4:11 AM, Nyirő Gergő <gergo.ny...@gmail.com> wrote:


> Hello,
>
>
> We develop a measurement evaluation tool, and we'd like to use
> pytables/hdf5 as a middle layer for signal accessing.
>
> We have to deal with the silly structure of the recorder device
> measurement format.
>
>
>
> The signals can be accessed via two identifiers:
>
> * device name: <source of the signal>-<channel of the
> message>-<another tag>-<yet another tag>
>
> * signal name
>
>
>
> The first identifier says the source information of the signal, which
> can be quite long.
>
> Therefore I grouped the device name into two layers:
>
> /<source of the signal>
>
>                 /<channel of the message>...
>
>                                 /<signal name>
>
>
>
> So if you have the same message from two channels, than you will get
> /foo-device-name
>
>                 /channel-1
>
>                                 /bar
>
>                                 /baz
>
>                 /channel-2
>
>                                 /bar
>
>                                 /baz
>
>
>
> Besides signal loading, we have to search for signal name as fast as
> possible, and return with the shortest unique device name part and the
> signal name.
>
> Using the structure above, iterating over the group names is quite
> slow. So I build up a table from device and signal name.
>
> As far as I know, the pytables query does not support string searching
> (e.g. startswidth, *foo[0-9]ch*, etc.), so fetching this table lead us
> to a pure python loop which is slow again.
>
> Therefore I build up a python dictionary from the table, which provide
> fast iteration against the table, but the init time increased from 100
> ms to 3-4 sec (we have more than 40 000 signals).
>
>
>
> Do you have any advice how to search for group names in hdf5 with
> pytables in an efficient way?
>

Hi grego,

Searching through group names, like accessing all HDF5 metadata, is slow.
 For group names this is because rather than searching through a list you
are traversing a B-tree, IIRC.  So you have to use the couple of tricks
that you used: 1) have another Table / Array of all table names, 2) read
this in once to a native Python data structure (dict here).

However, 4 sec to read in this table seems excessive for data of this size.
 You are probably not reading this in properly.  You should be using:

raw_grps = f.root.grp_names[:]

or similar.

Maybe other people have some other ideas.

Be Well
Anthony


>
> ps: I would be most happy with a glob interface.
>
>
>
> thanks for your advices in advance,
>
> gergo
>
>
> ------------------------------------------------------------------------------
> Get your SQL database under version control now!
> Version control is standard for application code, but databases havent
> caught up. So what steps can you take to put your SQL databases under
> version control? Why should you start doing it? Read more to find out.
> http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>

------------------------------------------------------------------------------
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk

_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] searching for group names

Reply via email to