[Pytables-users] new user: advice on how to structure files

Andre' Walker-Loud Thu, 17 Nov 2011 16:21:15 -0800

Hi All,

I just stumbled upon pytables, and have been playing around with converting my 
data files into hdf5 using pytables.  I am wondering about strategies to create 
data files.


I have created a file with the following group structure

root
  corr_name
    src_type
      snk_type
        config
          data

the data = 1 x 48 array of floats
config = a set which is to be averaged over, in this particular case, 1000, 
1010, ..., 20100 (1911 in all)
the other three groups are just collect metadata describing the data below, and 
provide a natural way to build matrices of data files, allowing the user (my 
collaborators) to pick and chose various combinations of srcs and snks (instead 
of taking them all).

This structure arises naturally (to me) from the type of data files I am 
storing/analyzing, but I imagine there are better ways to build the file (also, 
when I make my file this way, it is only 105 MB, but it causes HDFViewer to 
fail to open with an OutOfMemory error).  I would appreciate any advice on how 
to do this better.

Below is the relevant python script which creates my file.

Thanks,

Andre

import tables as pyt
import personal_calls_to_numpy as pc
import os

corrs = ['name1','name2',...]
dirs = []
for no in range(1000,20101,10):
    dirs.append('c'+str(no))
    #dirs.append(str(no))  #this gives NaturalNaming error

f = pyt.openFile('nplqcd_iso_old.h5','w')
root = f.root
for corr in corrs:
   cg = f.createGroup(root,corr.split('_')[-1])
   src = f.createGroup(cg,'Src_GaussSmeared')
   for s in ['S','P']:
       if os.path.exists('concatonated/'+corr+'_'+tag+'_'+s+'.dat'):
           print('adding '+corr+'_'+tag+'_'+s+'.dat')
           h,c = pc.read_corr('concatonated/'+corr+'_'+tag+'_'+s+'.dat')
           Ncfg = int(h[0]); NT = int(h[1])
           snk = f.createGroup(src,'Snk_'+s)
           #data = f.createArray(snk,'real',c)
           for cfg in range(Ncfg):
               gc = f.createGroup(snk,dirs[cfg])
               data = f.createArray(gc,'real',c[cfg])
       else:
           print('concatonated/'+corr+'_'+tag+'_'+s+'.dat DOES NOT EXIST')
f.close()


------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

[Pytables-users] new user: advice on how to structure files

Reply via email to