[HACKERS] pg_stat_statements: Query normalisation may fail during stats reset

2014-05-06 Thread Michael Renner
Hi,

when regularly collecting  resetting query information from pg_stat_statements 
it’s possible to trigger a situation where unnormalised queries are stored.


I think what happens is the following:

pgss_post_parse_analyse calls pgss_store with a non-null jstate which will 
cause the query string to be normalised and stored if the query id doesn’t 
exist in the hash table.


pgss_ExecutorEnd calls pgss_store with a null jstate which will cause the 
statistics to be stored if the query id exists.

If the query id does not exist (because the hash table has been reset between 
post_parse_analyse and ExecutorEnd) it hits the entry creation path which in 
turn will create an entry with the unnormalised query string because jstate is 
null.


This is a bit counterintuitive if you rely on the query to be normalised, e.g. 
for privacy reasons where you don’t want to leak query constants like password 
hashes or usernames.


Is this something that should be fixed or is this intentional behavior? The 
documentation doesn’t make any strong claims on the state of the query string, 
so this might be debatable. [1]


best,
Michael

[1] 
http://www.postgresql.org/message-id/e1uwztj-00075u...@wrigleys.postgresql.org 
is a similar situation where constants remain unnormalised.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [HACKERS] pg_stat_statements: Query normalisation may fail during stats reset

2014-05-06 Thread Robert Haas
On Tue, May 6, 2014 at 12:26 PM, Michael Renner
michael.ren...@amd.co.at wrote:
 when regularly collecting  resetting query information from
 pg_stat_statements it’s possible to trigger a situation where unnormalised
 queries are stored.

 I think what happens is the following:

 pgss_post_parse_analyse calls pgss_store with a non-null jstate which will
 cause the query string to be normalised and stored if the query id doesn’t
 exist in the hash table.

 pgss_ExecutorEnd calls pgss_store with a null jstate which will cause the
 statistics to be stored if the query id exists.

 If the query id does not exist (because the hash table has been reset
 between post_parse_analyse and ExecutorEnd) it hits the entry creation path
 which in turn will create an entry with the unnormalised query string
 because jstate is null.

 This is a bit counterintuitive if you rely on the query to be normalised,
 e.g. for privacy reasons where you don’t want to leak query constants like
 password hashes or usernames.


 Is this something that should be fixed or is this intentional behavior? The
 documentation doesn’t make any strong claims on the state of the query
 string, so this might be debatable. [1]

It sounds pretty wonky to me, but then, so does the behavior in the
email to which you linked.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements: Query normalisation may fail during stats reset

2014-05-06 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, May 6, 2014 at 12:26 PM, Michael Renner
 when regularly collecting  resetting query information from
 pg_stat_statements it’s possible to trigger a situation where unnormalised
 queries are stored.
 
 Is this something that should be fixed or is this intentional behavior? The
 documentation doesn’t make any strong claims on the state of the query
 string, so this might be debatable. [1]

 It sounds pretty wonky to me, but then, so does the behavior in the
 email to which you linked.

The source code says that query strings are normalized on a best effort
basis, so perhaps we ought to say the same in the documentation.

It would be rather expensive to provide a guarantee of normalization:
basically, we'd have to compute the normalized query string during parsing
*even when the hashtable entry already exists*, and then store it
somewhere where it'd survive till ExecutorEnd (but, preferably, not be
leaked if we never get to ExecutorEnd; which makes this hard).  I think
most people would find that a bad tradeoff.

One cheap-and-dirty solution is to throw away the execution stats if we
get to the end and find the hash table entry no longer exists, rather than
make a new entry with a not-normalized string.  Not sure if that cure is
better than the disease or not.

Another thought, though it's not too relevant to this particular scenario
of intentional resets, is that we could raise the priority of entries
for statements-in-progress even further.  I notice for example that if
entry_alloc finds an existing hashtable entry, it does nothing to raise
the usage count of that entry.

 This is a bit counterintuitive if you rely on the query to be normalised,
 e.g. for privacy reasons where you don’t want to leak query constants like
 password hashes or usernames.

The bigger picture here is that relying on query normalization for privacy
doesn't seem like a bright idea.  Consider making sure that
security-relevant values are passed as parameters rather than being
embedded in the query text in the first place.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements: Query normalisation may fail during stats reset

2014-05-06 Thread Peter Geoghegan
On Tue, May 6, 2014 at 11:31 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 The source code says that query strings are normalized on a best effort
 basis, so perhaps we ought to say the same in the documentation.

Perhaps.

 It would be rather expensive to provide a guarantee of normalization:
 basically, we'd have to compute the normalized query string during parsing
 *even when the hashtable entry already exists*, and then store it
 somewhere where it'd survive till ExecutorEnd (but, preferably, not be
 leaked if we never get to ExecutorEnd; which makes this hard).  I think
 most people would find that a bad tradeoff.

I certainly would.

 One cheap-and-dirty solution is to throw away the execution stats if we
 get to the end and find the hash table entry no longer exists, rather than
 make a new entry with a not-normalized string.  Not sure if that cure is
 better than the disease or not.

I am certain that it is. Consider long running queries that don't
manage to get the benefit of the aggressive decay for stick entries
technique, because there is consistent contention.

 Another thought, though it's not too relevant to this particular scenario
 of intentional resets, is that we could raise the priority of entries
 for statements-in-progress even further.  I notice for example that if
 entry_alloc finds an existing hashtable entry, it does nothing to raise
 the usage count of that entry.

To do otherwise would create an artificial prejudice against prepared
queries, though.

 This is a bit counterintuitive if you rely on the query to be normalised,
 e.g. for privacy reasons where you don’t want to leak query constants like
 password hashes or usernames.

 The bigger picture here is that relying on query normalization for privacy
 doesn't seem like a bright idea.  Consider making sure that
 security-relevant values are passed as parameters rather than being
 embedded in the query text in the first place.

I agree.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers