[CVS ci] hash-utf8 benchmark

Leopold Toetsch Sun, 26 Oct 2003 03:04:05 -0800


When the encoding of a hash and a lookup key doesn't match, we get a
huge penalty on hash lookup:
0.702336
6.634617
...
DOD runs = 6052
Collect runs = 378
Collect memory = 17557328

It takes 10 times longer, an we have considerable DOD/GC stress.

Do we really need to transcode strings for string_compare, or can we just do something like: - store the key->encoding in the hash header - as long as all keys are ascii do nothing - for the the first non-ascii key: convert all keys to utf8 (this doesn't change the hash value) - then always convert new key to utf8 (for hash_put) convert lookup key to utf8, if non-ascii - then compare:

   if (string_length(s1) != string_length(s2))
      return 1; // we don't care what's greater
   for (i=0; i < string_length(s1); ++i)
      if (string_index(s1, i) != string_index(s2, i))
         return 1;
   return 0;

leo

[CVS ci] hash-utf8 benchmark

Reply via email to