Re: [xml] Better hash function for dict.c

Daniel Veillard Wed, 06 Aug 2008 01:12:29 -0700

On Wed, Aug 06, 2008 at 08:25:14AM +0200, Stefan Behnel wrote:
> Hmm, was that in my patch? Out of the top of my head, shouldn't the last line 
> read
> 
>     xmlDictComputeBigKey(prefix, -1, xmlDictComputeBigKey(name, len, len))))
> 
> or something in that line? This looks like a copy&paste error to me...


  err, yes, actually we have

#define xmlDictComputeKey(dict, name, len)             \
    (((dict)->size == MIN_DICT_SIZE) ?                 \
     xmlDictComputeFastKey(name, len) :        \
     xmlDictComputeBigKey(name, len, len))             \

#define xmlDictComputeQKey(dict, prefix, name, len)                  \
    (((prefix) == NULL) ?                                            \
      (xmlDictComputeKey(dict, name, len)) :                         \
      (((dict)->size == MIN_DICT_SIZE) ?                             \
       xmlDictComputeFastQKey(prefix, name, len) :                   \
       xmlDictComputeBigKey(prefix, xmlStrlen(prefix),               \
                            xmlDictComputeBigKey(name, len, len))))

> Anyway:
> 
> >     the problem is that basically if you compute the key for a QName
> > as "a:b" you can get 2 different answers, one if you accessed it using
> > "a:b" directly and hence xmlDictComputeKey() or if using "a" prefix and
> > "b" name, given the algorithm the key are not the same, and it is a key
> > property of the dictionary to always return the same exact pointer for
> > the same string. This breaks that property.
> 
> True, I didn't know about this property. And the 4-byte-at-once property will
> really make this very hard to achieve.

  yes, that's the core of the problem

> A way I see to fix this is to search the string for the first ':' and always
> calculate the hash separately for the part before and after the ':', not
> including the ':' itself. That should not break hashing namespace URIs either
> (AFAIR, at the least the XML namespace gets hashed at some point). Something 
> like
> 
>     int len = strlen(s)
>     char* prefix_end = strchr(s, ':')
>     if (prefix_end)
>         h = xmlDictComputeBigKey(s, prefix_end-s,
>               xmlDictComputeBigKey(prefix_end+1, len-(prefix_end-s-1),
>                                                  len-(prefix_end-s-1)))
>     else
>         h = xmlDictComputeBigKey(s, len, len)

  Another option I looked at is the 'One-at-a-Time Hash' from
http://burtleburtle.net/bob/hash/doobs.html , looking at the criterias
and the results it looks like a good hash too, not too expensive and 
should work well. I will try to make a patch using this this morning,
if you have a bit of time then, maybe you can rerun your initial tests
with that one, is that possible ?

Daniel

-- 
Red Hat Virtualization group http://redhat.com/virtualization/
Daniel Veillard      | virtualization library  http://libvirt.org/
[EMAIL PROTECTED]  | libxml GNOME XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine  http://rpmfind.net/
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
[email protected]
http://mail.gnome.org/mailman/listinfo/xml

Re: [xml] Better hash function for dict.c

Reply via email to