On May 19, 2009, at 2:42 AM, Francesc Alted wrote: > A Tuesday 19 May 2009 05:03:48 escriguéreu: >> On May 18, 2009, at 3:06 AM, Francesc Alted wrote: >>> A Monday 18 May 2009 10:31:47 Francesc Alted escrigué: >>>> A Sunday 17 May 2009 15:31:00 Robert Ferrell escrigué: >>>>> I have an elementary question. >>>>> >>>>> I have a dictionary with about 10,000 keys. The keys are >>>>> (shortish) >>>>> strings. Each value is a time series of structured arrays (record >>>>> arrays) with 5 fields. Each value totals about 100,000 bytes, so >>>>> the >>>>> total data size isn't huge, about 1GB. >>>>> >>>>> What would be a good way to store this in PyTables? I've been >>>>> creating a group for each key, but that is a bad idea (since it's >>>>> very >>>>> slow). >>>>> >>>>> I have very little knowledge/experience with either data bases or >>>>> PyTables, so I'm pretty sure I'm just missing a basic concept. >>>> >>>> Mmh, there are several ways to implement what you want. However, >>>> provided >>>> that your values are structured arrays, the easiest (and probably >>>> one of >>>> the fastest) way is to implement the dictionary as a monolithic >>>> table. >>> >>> Er, this is the fastest, if you have PyTables Pro and you index the >>> key field, >>> of course ;) >>> >>> Another solution in case you don't want to buy Pro is to setup a >>> VLArray of >>> ObjectAtom atoms and save every recarray in a single row. Then, >>> build a table >>> with two fields: 'key' where you save your key and 'vrow' where you >>> save the >>> row location of your value in the VLArray. With this, you can fetch >>> the value >>> quickly by using an idiom like: >>> >>> print 'key == "2" -->', vlarray[keys.readWhere('key == "2"')['vrow'] >>> [0]] >>> print 'key == "1001" -->', vlarray[keys.readWhere('key == "1001"') >>> ['vrow'][0]] >>> >>> I'm attaching a new script based on this approach. >> >> Thanks for your quick response. I'll try this out. I neglected to >> mention that the time series vary somewhat in length. I'm thinking >> that makes the VLArray desirable. In any case, I get the idea of >> putting the keys in the table. That's a step forward in my >> understanding. > > Yet another solution is to use a single table for keeping the time > series and > another one where you keep the key, starting row for a specific time > series > and the length of this time series. Something like: > > class Record(tb.IsDescription): > key = tb.StringCol(itemsize=10, pos=0) > srow = tb.Int64Col(pos=1) # start row in recarray table > rlen = tb.Int64Col(pos=2) # length of recarray in recarray table > > > With this the queries would be: > > (_, srow, rlen) = k.readWhere('key == "2"')[0] > print 'key == "2" -->', v[srow:srow+rlen] > (_, srow, rlen) = k.readWhere('key == "1001"')[0] > print 'key == "1001" -->', v[srow:srow+rlen] > > Attached is a simple example of this. > > As I said before, there are many possibilities :)
I'm still trying to wrap my head around the concepts. I tried making one of the columns a TimeSeriesTable, but that didn't get me anywhere. import tables as tb import scikits.timeseries as ts import scikits.timeseries.lib.tstables as tsTb class MyTable(tb.IsDescription): key = tb.StringCol(16) tsTable = tsTb.TimeSeriesTable Is this even close to something that might work? -robert ------------------------------------------------------------------------------ Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT is a gathering of tech-side developers & brand creativity professionals. Meet the minds behind Google Creative Lab, Visual Complexity, Processing, & iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian Group, R/GA, & Big Spaceship. http://www.creativitycat.com _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users