On May 19, 2009, at 2:42 AM, Francesc Alted wrote:

> A Tuesday 19 May 2009 05:03:48 escriguéreu:
>> On May 18, 2009, at 3:06 AM, Francesc Alted wrote:
>>> A Monday 18 May 2009 10:31:47 Francesc Alted escrigué:
>>>> A Sunday 17 May 2009 15:31:00 Robert Ferrell escrigué:
>>>>> I have an elementary question.
>>>>>
>>>>> I have a dictionary with about 10,000 keys.  The keys are  
>>>>> (shortish)
>>>>> strings.  Each value is a time series of structured arrays (record
>>>>> arrays) with 5 fields.  Each value totals about 100,000 bytes, so
>>>>> the
>>>>> total data size isn't huge, about 1GB.
>>>>>
>>>>> What would be a good way to store this in PyTables?  I've been
>>>>> creating a group for each key, but that is a bad idea (since it's
>>>>> very
>>>>> slow).
>>>>>
>>>>> I have very little knowledge/experience with either data bases or
>>>>> PyTables, so I'm pretty sure I'm just missing a basic concept.
>>>>
>>>> Mmh, there are several ways to implement what you want.  However,
>>>> provided
>>>> that your values are structured arrays, the easiest (and probably
>>>> one of
>>>> the fastest) way is to implement the dictionary as a monolithic
>>>> table.
>>>
>>> Er, this is the fastest, if you have PyTables Pro and you index the
>>> key field,
>>> of course ;)
>>>
>>> Another solution in case you don't want to buy Pro is to setup a
>>> VLArray of
>>> ObjectAtom atoms and save every recarray in a single row.  Then,
>>> build a table
>>> with two fields: 'key' where you save your key and 'vrow' where you
>>> save the
>>> row location of your value in the VLArray.  With this, you can fetch
>>> the value
>>> quickly by using an idiom like:
>>>
>>> print 'key == "2" -->', vlarray[keys.readWhere('key == "2"')['vrow']
>>> [0]]
>>> print 'key == "1001" -->', vlarray[keys.readWhere('key == "1001"')
>>> ['vrow'][0]]
>>>
>>> I'm attaching a new script based on this approach.
>>
>> Thanks for your quick response.  I'll try this out.  I neglected to
>> mention that the time series vary somewhat in length.  I'm thinking
>> that makes the VLArray desirable.  In any case, I get the idea of
>> putting the keys in the table.  That's a step forward in my
>> understanding.
>
> Yet another solution is to use a single table for keeping the time  
> series and
> another one where you keep the key, starting row for a specific time  
> series
> and the length of this time series.  Something like:
>
> class Record(tb.IsDescription):
>    key = tb.StringCol(itemsize=10, pos=0)
>    srow = tb.Int64Col(pos=1)   # start row in recarray table
>    rlen = tb.Int64Col(pos=2)   # length of recarray in recarray table
>
>
> With this the queries would be:
>
> (_, srow, rlen) = k.readWhere('key == "2"')[0]
> print 'key == "2" -->', v[srow:srow+rlen]
> (_, srow, rlen) = k.readWhere('key == "1001"')[0]
> print 'key == "1001" -->', v[srow:srow+rlen]
>
> Attached is a simple example of this.
>
> As I said before, there are many possibilities :)

I'm still trying to wrap my head around the concepts.  I tried making  
one of the columns a TimeSeriesTable, but that didn't get me anywhere.

import tables as tb
import scikits.timeseries as ts
import scikits.timeseries.lib.tstables as tsTb

class MyTable(tb.IsDescription):
        key = tb.StringCol(16)
        tsTable = tsTb.TimeSeriesTable

Is this even close to something that might work?

-robert


------------------------------------------------------------------------------
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, & 
iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian
Group, R/GA, & Big Spaceship. http://www.creativitycat.com 
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to