Re: [Pytables-users] Converting Well-Defined Python Dictionary to pytables rows

Dav Clark Mon, 13 Apr 2009 11:33:12 -0700

Interesting.  I think this is worth mentioning on the list...

In brief - is there a reasonable upper limit on the number of arraysand/or tables that one should put in a single group in PyTables?


DC

On Apr 13, 2009, at 7:51 AM, Mark Fenner wrote:

So, it turns out that there are simply too many individual keys to do
a "many table" solution.  The creation process with a single table
(many rows) takes about 7 minutes.  I stopped the many table version
after about an hour.  Of course, the lookups (in the many table) would
have been about immediate (I assume).  The lookups in the single table
take about 10 seconds.  So, that's tolerable for now.

I guess I could also go to a "file hierarchy" structured "database"
... I forget the standard linux directory file limit.  But, I'm pretty
happy with pytables.

Thanks again,
Mark

On Sun, Apr 12, 2009 at 12:47 PM, Dav Clark <d...@alum.mit.edu> wrote:

Always happy to help... I get a lot more than I give!
DC
On Apr 12, 2009, at 5:09 AM, Mark Fenner wrote:

Dav,

Great idea!!  I wouldn't have thought of it that way.  I did come up
with this (for the mono-table solution, your relational model):

class Key(PT.IsDescription):
   name = PT.StringCol(16)
   class Values(PT.IsDescription):
       innervalue = PT.UInt32Col()
       innercode = PT.EnumCol(code, 'ND', base='uint8')

Which actually worked pretty well.  However, I'm going to give your
poly-table idea a try.

Thanks!
Mark

On Sat, Apr 11, 2009 at 6:12 PM, Dav Clark <d...@alum.mit.edu> wrote:

Some ideas below...

On Apr 11, 2009, at 11:03 AM, Mark Fenner wrote:

I have data (lots of it) that looks like this:

mydict['key_string'] = [(int1, 'str1'), (int2, 'str2'), ... (intn,

'strn')]

The ints are 7 digits max (unsigned 24 bits max); the strings are a 3

character code (it

could be replaced with a 4-bit number -- possible an Enum?).
This probably won't matter much if you turn compression on... butothers may
know more about that.
It would also be possible to structure the data like this, if itwould
help matters:

mydict['key_string'] = [(int1, int2, int3, ...., intn), ('str1',

'str2', 'str3', ..., 'strn')]

I should note that while there are _many_ keys, there are relatively

tame entries per key (say a maximum of
10? maybe 20 in a very rare instance). The overall database isabout
600MB which I currently wrote out to disk
as a text python dictionary (by hand, it crashed cPickle) ... thedata
I scraped out amounted to about 300MB.
Even reading that in with execfile was a bad idea. I had to resortto
reading subsets and appending them to

the in-memory dictionary. Needless to say, these options aren't going

to work.  I don't mind 20 minutes to build

the datastructure, but another 20 to load it isn't going to work very

well.  And, I typically only need some entries, not

all of them.

Assuming that I want to be able to quickly look up a 'key_string' and

return the list of tuples (or equivalent structure), how should I

structure a pytable to hold
this data? In particular, I'm puzzling out what my "row" classshould
look like.  Of course, I'd like to avoid extraneous rows if possible.
But, maybe I'm not thinking about "rows" in the right way. Sinceeach
entry (a row?) has a list of things associated with it and b/c those

things are uniform types, I was thinking of using an array within a

row, but I don't think that is possible.
It sounds like you are doing single-key lookups. If that's thecase, I
don't see what you gain by using a single table. Would it work foryou
application to just make each key the name of a simple table (int,str)?
If you do want a big table, as far as I know, you still can't have"ragged"
tables (i.e., tables with columns of differing size per row). Ifyou know a
maximum size ahead of time, you could make your column large enoughto store
the largest array. Or, you could think more relationally... havethree
columns (hash-key, int str). If order is important, you may needto add
another column for that. Not sure though... pytables may maintainrow order
in query results.

Cheers,

Dav

------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com

_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Converting Well-Defined Python Dictionary to pytables rows

Reply via email to