On Thu, Nov 17, 2011 at 6:20 PM, Andre' Walker-Loud <walksl...@gmail.com>wrote:

> Hi All,
>
> I just stumbled upon pytables, and have been playing around with
> converting my data files into hdf5 using pytables.  I am wondering about
> strategies to create data files.
>
> I have created a file with the following group structure
>
> root
>  corr_name
>    src_type
>      snk_type
>        config
>          data
>
> the data = 1 x 48 array of floats
> config = a set which is to be averaged over, in this particular case,
> 1000, 1010, ..., 20100 (1911 in all)
> the other three groups are just collect metadata describing the data
> below, and provide a natural way to build matrices of data files, allowing
> the user (my collaborators) to pick and chose various combinations of srcs
> and snks (instead of taking them all).
>

This seems pretty reasonable.

You could also try to rearrange your data to have a shallower hierarchy and
have everything stored in Tables with src_type, corr_name, etc columns that
you then search through.  The reason for doing this is to avoid the
overhead of the hierarchy (not only the space on disk but also speed of
traversal).  But what you have definitely works.


>
> This structure arises naturally (to me) from the type of data files I am
> storing/analyzing, but I imagine there are better ways to build the file
> (also, when I make my file this way, it is only 105 MB, but it causes
> HDFViewer to fail to open with an OutOfMemory error).  I would appreciate
> any advice on how to do this better.
>

I use ViTables to view much larger files than that.  I would recommend
checking it out.

Be Well
Anthony


>
> Below is the relevant python script which creates my file.
>
> Thanks,
>
> Andre
>
> import tables as pyt
> import personal_calls_to_numpy as pc
> import os
>
> corrs = ['name1','name2',...]
> dirs = []
> for no in range(1000,20101,10):
>    dirs.append('c'+str(no))
>    #dirs.append(str(no))  #this gives NaturalNaming error
>
> f = pyt.openFile('nplqcd_iso_old.h5','w')
> root = f.root
> for corr in corrs:
>   cg = f.createGroup(root,corr.split('_')[-1])
>   src = f.createGroup(cg,'Src_GaussSmeared')
>   for s in ['S','P']:
>       if os.path.exists('concatonated/'+corr+'_'+tag+'_'+s+'.dat'):
>           print('adding '+corr+'_'+tag+'_'+s+'.dat')
>           h,c = pc.read_corr('concatonated/'+corr+'_'+tag+'_'+s+'.dat')
>           Ncfg = int(h[0]); NT = int(h[1])
>           snk = f.createGroup(src,'Snk_'+s)
>           #data = f.createArray(snk,'real',c)
>           for cfg in range(Ncfg):
>               gc = f.createGroup(snk,dirs[cfg])
>               data = f.createArray(gc,'real',c[cfg])
>       else:
>           print('concatonated/'+corr+'_'+tag+'_'+s+'.dat DOES NOT EXIST')
> f.close()
>
>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure
> contains a definitive record of customers, application performance,
> security threats, fraudulent activity, and more. Splunk takes this
> data and makes sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-novd2d
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to