On Thu, Jul 5, 2012 at 12:34 PM, Jacob Bennett <jacob.bennet...@gmail.com>wrote:

> Hello Pytables Users,
>
> I am currently having a maximum number of children error within pytables.
> I am trying to store stock updates within hdf5. My current schema is to
> have one file represent a trading day, each table represent a particular
> instrumentID (stock id) and have each record in the table belong to a
> specific update with a timestamp (where the timestamp could be considered a
> primary key).
>
> I am currently having all tables be direct descendants of root.
>
> The problem with this is that per day I have the following stats:
>
> #of tables ::= 20000
> #of Records per table ::= 250000
>
> The problem persists in that 20000 is too many children to be associated
> with a particular node. Continuing with this schema will consume
> an exorbitant amount of memory and lead to slower query times.
>
> Is there a way to redesign this schema so that it could work better with
> pytables? Or is this simply too much data?
>

It certainly isn't too much data.  HDF5 scales to petabytes ;)


> Would it help to follow with the current schema and just increase the
> depth of the tree by taking parts of the instrumentId (instrumentId is an
> int64) as nodes?
>

Yes, this would be one approach that would work.  Basically, nodes in HDF5
only get a fixed amount of storage for metadata, including what children
they have.  (I believe this number is 64 kb.  In theory, it is possible to
increase this number and recompile hdf5, but then files generated in this
way would only be compatible with your altered version of the library.)  So
if a group has so many children that storing their names and locations
takes up more than 64 kb, you have run out of room.  By adding N other
subgroups to the hierarchy you increase the metadata available to N * 64
kb.

This is probably the easiest thing to do given your current setup.
 Anything else would require you changing the table description.  There are
probably some natural groupings within your instrumentIDs (eg all
commodities go in one group, for example) that you could use.

Be Well
Anthony


>
> Thanks,
> Jacob
>
> --
> Jacob Bennett
> Massachusetts Institute of Technology
> Department of Electrical Engineering and Computer Science
> Class of 2014| benne...@mit.edu
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to