A Sunday 17 May 2009 15:31:00 Robert Ferrell escrigué:
> I have an elementary question.
>
> I have a dictionary with about 10,000 keys.  The keys are (shortish)
> strings.  Each value is a time series of structured arrays (record
> arrays) with 5 fields.  Each value totals about 100,000 bytes, so the
> total data size isn't huge, about 1GB.
>
> What would be a good way to store this in PyTables?  I've been
> creating a group for each key, but that is a bad idea (since it's very
> slow).
>
> I have very little knowledge/experience with either data bases or
> PyTables, so I'm pretty sure I'm just missing a basic concept.

Mmh, there are several ways to implement what you want.  However, provided 
that your values are structured arrays, the easiest (and probably one of the 
fastest) way is to implement the dictionary as a monolithic table.  One of the 
fields (say, the first one) would be the key, and the others the ones that 
composes your structured array.  With this setup, you can easily fetch your 
arrays by key.  For example:

print 'key == "2" -->', t.readWhere('key == "2"')
print 'key == "1001" -->', t.readWhere('key == "1001"')

or, if you want to get rid of the key field:

l = [r[1:] for r in t.where('key == "2"')]
print 'key == "2" -->', np.array(l, dtype=array_dtype)
l = [r[1:] for r in t.where('key == "1001"')]
print 'key == "1001" -->', np.array(l, dtype=array_dtype)

I'm attaching an auto-contained example.  It is somewhat simple, but I think 
it shows the point for your case pretty well.

Hope this helps,

-- 
Francesc Alted
import numpy as np
import tables as tb

class Record(tb.IsDescription):
    key = tb.StringCol(itemsize=10, pos=0)
    f1 = tb.Int32Col(pos=1)
    f2 = tb.Float64Col(pos=2)
    f3 = tb.BoolCol(pos=3)

f = tb.openFile("/tmp/test.h5", "w")
t = f.createTable(f.root, 'table', Record)

# Feed the table with some info
row = t.row
for i in xrange(10000):
    for j in xrange(5):
        row['key'] = str(i)
        row['f1'] = j
        row['f2'] = i*j
        row['f3'] = j < 2
        row.append()
t.flush()

# Now, do some selections:
print "Selections with key field:"
print 'key == "2" -->', t.readWhere('key == "2"')
print 'key == "1001" -->', t.readWhere('key == "1001"')

# Get a recarray without the 'keys' field (#0):
print "Selections without key field:"
l = [r[1:] for r in t.where('key == "2"')]
print 'key == "2" -->', np.array(l, dtype='int32,float64,bool')
l = [r[1:] for r in t.where('key == "1001"')]
print 'key == "1001" -->', np.array(l, dtype='int32,float64,bool')

f.close()
------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables 
unlimited royalty-free distribution of the report engine 
for externally facing server and web deployment. 
http://p.sf.net/sfu/businessobjects
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to