On Thu, 11 May 2006, Michael Cohen wrote:
> Hi Chris,
> You probably need to balance the hash table. Normally a dict works by hashing
> the key into an array of a certain size. The array is indexed by the hash
> value and a linked list of values is attached to each slot in the array.
>
> If you load the table too much, there will be many hash collisions, which
> will cause the linked lists to get very long. This will slow down access time
> in an exponential fasion.
>
> Apart from not using a tuple for a dict key (im not sure how efficient the
> algorithm of getting a hash value from a python tuple struct is). I would
> recommend to split the single dict into a dict of dicts. This will have the
> effect to spreading the load on the hash table:
Good thinking, but alas, the overhead of dictionary creation for the
second value was too much:
dictionary metakit sqlite[1] hash-in-hash
---------- ------- --------- ------------
1M numbers 8.8s 22.2s 89.6s 21.9s
2M numbers 24.0s 43.7s 190.0s 56.8s
5M numbers 115.3s 105.4s N/A > 185s[2]
[1] pysqlite V1 & sqlite V3.
[2] I had to kill the process because it chewed up all 1Gig of RAM :-(
Thanks for the suggestion, but no go.
--
Chris Foote <[EMAIL PROTECTED]>
Inetd Pty Ltd T/A HostExpress
Web: http://www.hostexpress.com.au
Blog: http://www.hostexpress.com.au/drupal/chris
Phone: (08) 8410 4566
_______________________________________________
sapug mailing list
[email protected]
http://mail.python.org/mailman/listinfo/sapug