Robert Roessler:

I probably misunderstood your response a couple back in this thread:

"You are probably thinking of PropSet which should probably be
turned into a real hash table as the data sets it is used on are much
larger than it was originally written for."

Improving the hashing aspect of PropSet may be worthwhile but it hasn't been a significant issue to me. Perhaps it is for others with slower machines or more complex configurations. The small fixed size of PropSet::props is more likely to be the limiting factor in its hashing performance rather than the particular hash calculation performed.

The original HashString function body is 7 lines (too? is Python where this came from?)...

No, I think the current one came from an '80s text book. Python's is, as paraphrased in SinkWorld with UTF32 strings:

    static int HashString(const int *s, int len) {
        // Use the Python string hash algorithm
        int ret = s[0] << 7;
        for (int i=0; i<len; i++) {
            ret *= 1000003;
            ret ^= s[i];
        }
        ret ^= len;
        return ret;
    }

> 3 obvious responses:

1) is line-count a good determinant in evaluating hash algos?
2) source code size/layout may not be a good indicator of the length of runtime code paths

Complexity for the code reader should be avoided. My reaction to the Hsieh code was:
1) What's this preprocessor stuff? Is it alignment trickery?
2) It is reading 16 bit blocks - no, it is 32 bits at a time.
3) Is it reading past the end due to the blocking?
4) I hope those typedefs aren't duplicating stuff from system headers leading to warnings when this is combined with other code.

After a while you can work the issues out but its not immediately obvious. Extra complexity can be justified by performance and it would be much more convincing if there was a measurement of either an overall performance improvement (PropSet::Get 20% faster by changing hash function) or something less direct (fewer empty buckets or shorter chains on the distributed properties files or someone's particularly large configuration).

   Neil
_______________________________________________
Scite-interest mailing list
[email protected]
http://mailman.lyra.org/mailman/listinfo/scite-interest

Reply via email to