Teodor Sigaev <teo...@sigaev.ru> writes: >> -hash_ctl.hash = oid_hash; /* a bit more efficient than tag_hash */ >> +hash_ctl.hash = tag_hash; /* a bit more efficient than tag_hash */ >> >> I think the comment may need removed here.
> Thank you, fixed I looked at this patch. It's not right at all here: + if (hashp->hash == sizeof(int32) && info->hash == tag_hash) + hashp->hash = int32_hash; + else + hashp->hash = info->hash; hashp->hash is not defined at this point, and if it were defined it would not be the size of the key, and replacing that with info->keysize wouldn't be exactly the right thing either since info->keysize isn't necessarily set (there is a default, though I suspect it's mostly unused). I hacked up a solution to that and was about to commit when I had second thoughts about the whole approach. The thing that's worrying me about this is that it depends on the assumption that the passed-in info->hash pointer is identical to what dynahash.c thinks is the address of tag_hash. I'm not convinced that that's necessarily true on all platforms; in particular, I'm worried that a call from a loadable module (such as plperl) might be passing the address of a trampoline or thunk function that is a jump to tag_hash and not tag_hash itself. I don't see this happening on my Linux box but it might well happen on, say, Windows. There are already some comparisons of the hash function pointer in dynahash.c, so you might think this concern is moot, but if you look closely they're just comparisons to the address of string_hash --- which, in normal use-cases, would be supplied by dynahash.c itself. So I don't think the existing coding proves this will work. Now, we could do it like this anyway, and just ignore the fact that we might not get the optimization on every platform for hash tables created by loadable modules. However, there's another way we could attack this, which is to invent a new hash option flag bit that says "pick a suitable hash function for me, assuming that all bits of the specified key size are significant". So instead of ctl.keysize = sizeof(...); ctl.entrysize = sizeof(...); ctl.hash = tag_hash; tab = hash_create("...", ..., &ctl, HASH_ELEM | HASH_FUNCTION); you'd write ctl.keysize = sizeof(...); ctl.entrysize = sizeof(...); tab = hash_create("...", ..., &ctl, HASH_ELEM | HASH_BLOBS); which would save some code space and is arguably cleaner than the current approach of specifying some support functions and not others. This would be more invasive than the current patch, since we'd have to run around and touch code that currently mentions tag_hash as well as the places that currently mention oid_hash. But the patch is already pretty invasive just doing the latter. Thoughts? Anyone have a decent suggestion for what to call the flag bit? I'm not enamored of "HASH_BLOBS", it's just the first thing that came to mind. Also, depending on how much we want to annoy third-party authors, we could consider making a clean break from the Old Way here and say say that callers must specify all or none of the hash support functions, thus letting us get rid of the very ad-hoc logic in hash_create() that fills in defaults for some functions based on what you said for others. So the rule would be something like: No flag mentioned: use functions suitable for null-terminated strings HASH_BLOBS: use functions suitable for arbitrary fixed-size tags HASH_FUNCTIONS: caller must supply all three support functions and we'd remove the existing flag bits HASH_FUNCTION, HASH_COMPARE, HASH_KEYCOPY. But this would just be in the line of making the API simpler/more logical, it wouldn't buy us performance as such. I'm not sure whether it's a good idea to go this far. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers