On Tue, Oct 2, 2012 at 8:22 PM, Peter Geoghegan <pe...@2ndquadrant.com> wrote: > On 2 October 2012 18:16, Tom Lane <t...@sss.pgh.pa.us> wrote >> 1. Why isn't something like md5() on the reported query text an equally >> good solution for users who want a query hash? > > Because that does not uniquely identify the entry. The very first > thing that the docs say on search_path is "Qualified names are tedious > to write, and it's often best not to wire a particular schema name > into applications anyway". Presumably, the reason it's best not to > wire schema names into apps is because it might be useful to modify > search_path in a way that dynamically made the same queries in some > application reference what are technically distinct relations. If > anyone does this, and it seems likely that many do for various > reasons, they will be out of luck when using some kind of > pg_stat_statements aggregation. > > This was the behaviour that I intended for pg_stat_statements all > along, and I think it's better than a solution that matches query > strings. > >> 2. If people are going to accumulate stats on queries over a long period >> of time, is a 32-bit hash really good enough for the purpose? If I'm >> doing the math right, the chance of collision is already greater than 1% >> at 10000 queries, and rises to about 70% for 100000 queries; see >> http://en.wikipedia.org/wiki/Birthday_paradox >> We discussed this issue and decided it was okay for pg_stat_statements's >> internal hash table, but it's not at all clear to me that it's sensible >> to use 32-bit hashes for external accumulation of query stats. > > Well, forgive me for pointing this out, but I did propose that the > hash be a 64-bit value (which would have necessitated adopting > hash_any() to produce 64-bit values), but you rejected the proposal. I > arrived at the same probability for a collision as you did and posted > in to the list, in discussion shortly after the normalisation stuff > was committed.
What was the argument for rejecting it? Just that it required hash_any() to be adapted? Now that we have a very clear use case where this would help, perhaps it's time to re-visit this proposal? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers