A Monday 18 May 2009 10:31:47 Francesc Alted escrigué:
> A Sunday 17 May 2009 15:31:00 Robert Ferrell escrigué:
> > I have an elementary question.
> >
> > I have a dictionary with about 10,000 keys.  The keys are (shortish)
> > strings.  Each value is a time series of structured arrays (record
> > arrays) with 5 fields.  Each value totals about 100,000 bytes, so the
> > total data size isn't huge, about 1GB.
> >
> > What would be a good way to store this in PyTables?  I've been
> > creating a group for each key, but that is a bad idea (since it's very
> > slow).
> >
> > I have very little knowledge/experience with either data bases or
> > PyTables, so I'm pretty sure I'm just missing a basic concept.
>
> Mmh, there are several ways to implement what you want.  However, provided
> that your values are structured arrays, the easiest (and probably one of
> the fastest) way is to implement the dictionary as a monolithic table.

Er, this is the fastest, if you have PyTables Pro and you index the key field, 
of course ;)

Another solution in case you don't want to buy Pro is to setup a VLArray of 
ObjectAtom atoms and save every recarray in a single row.  Then, build a table 
with two fields: 'key' where you save your key and 'vrow' where you save the 
row location of your value in the VLArray.  With this, you can fetch the value 
quickly by using an idiom like:

print 'key == "2" -->', vlarray[keys.readWhere('key == "2"')['vrow'][0]]
print 'key == "1001" -->', vlarray[keys.readWhere('key == "1001"')['vrow'][0]]

I'm attaching a new script based on this approach.

Cheers,

-- 
Francesc Alted
import numpy as np
import tables as tb

N = 10000    # number of keys
M = 5        # number of registers per key
array_dtype = 'int32,float64,bool' # the dtype of your recarray

class Record(tb.IsDescription):
    key = tb.StringCol(itemsize=10)
    vrow = tb.Int64Col()

f = tb.openFile("/tmp/test.h5", "w")
k = f.createTable(f.root, 'keys', Record, expectedrows=N)
v = f.createVLArray(f.root, 'values', tb.ObjectAtom())

# Feed the table and vlarray with some info
row = k.row
for i in xrange(N):
    row['key'] = str(i)
    row['vrow'] = i
    row.append()
    value = []
    for j in xrange(M):
        value.append((j, i*j, i < M))
    v.append(np.array(value, dtype=array_dtype))
k.flush()
v.flush()

# Now, do some selections:
print "Result of fetches:"
print 'key == "2" -->', v[k.readWhere('key == "2"')['vrow'][0]]
print 'key == "1001" -->', v[k.readWhere('key == "1001"')['vrow'][0]]

f.close()
------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables 
unlimited royalty-free distribution of the report engine 
for externally facing server and web deployment. 
http://p.sf.net/sfu/businessobjects
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to