When the encoding of a hash and a lookup key doesn't match, we get a huge penalty on hash lookup: 0.702336 6.634617 ... DOD runs = 6052 Collect runs = 378 Collect memory = 17557328

It takes 10 times longer, an we have considerable DOD/GC stress.

Do we really need to transcode strings for string_compare, or can we just do something like:
- store the key->encoding in the hash header
- as long as all keys are ascii do nothing
- for the the first non-ascii key:
convert all keys to utf8 (this doesn't change the hash value)
- then always
convert new key to utf8 (for hash_put)
convert lookup key to utf8, if non-ascii
- then compare:


   if (string_length(s1) != string_length(s2))
      return 1; // we don't care what's greater
   for (i=0; i < string_length(s1); ++i)
      if (string_index(s1, i) != string_index(s2, i))
         return 1;
   return 0;

leo



Reply via email to