Re: [HACKERS] pg_stat_statements query jumbling question

2015-11-15 Thread Michael Paquier
On Sun, Nov 15, 2015 at 1:34 AM, Tom Lane  wrote:
> Michael Paquier  writes:
>> On Sun, Nov 1, 2015 at 2:03 AM, Julien Rouhaud
>>  wrote:
>>> I'm also rather sceptical about this change.
>
>> Hm. Thinking a bit about this patch, it presents the advantage to be
>> able to track the same queries easily across systems even if those
>> tables have been created with a different OID.
>
> That argument would only hold if *every* use of OIDs in the jumbles
> were replaced by names --- not only tables, but types, operators,
> functions, etc.  I'm already concerned that the additional name
> lookups will cost a lot of performance, and I think adding so many
> more would probably be disastrous.

Yeah, I was thinking about a GUC switch to change from one mode to
another yesterday night before sleeping. Now if there was a patch
actually implementing that, and proving that this has no performance
impact, well I think that this may be worth considering for
integration. But we're far from that

>> Also, isn't this patch actually broken if we rename relations and/or
>> its namespace?
>
> Well, it would mean that the query would no longer be considered "the
> same".  You could argue either way whether that's good or bad.  But
> yeah, this approach will break one set of use-cases to enable another
> set.
>
> On the whole, I think my vote is to reject this patch.

Agreed. It's clear that the patch as-is is not enough.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements query jumbling question

2015-11-14 Thread Tom Lane
Michael Paquier  writes:
> On Sun, Nov 1, 2015 at 2:03 AM, Julien Rouhaud
>  wrote:
>> I'm also rather sceptical about this change.

> Hm. Thinking a bit about this patch, it presents the advantage to be
> able to track the same queries easily across systems even if those
> tables have been created with a different OID.

That argument would only hold if *every* use of OIDs in the jumbles
were replaced by names --- not only tables, but types, operators,
functions, etc.  I'm already concerned that the additional name
lookups will cost a lot of performance, and I think adding so many
more would probably be disastrous.

> Also, isn't this patch actually broken if we rename relations and/or
> its namespace?

Well, it would mean that the query would no longer be considered "the
same".  You could argue either way whether that's good or bad.  But
yeah, this approach will break one set of use-cases to enable another
set.

On the whole, I think my vote is to reject this patch.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements query jumbling question

2015-11-14 Thread Michael Paquier
On Sun, Nov 1, 2015 at 2:03 AM, Julien Rouhaud
 wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Hello,
>
> On 10/10/2015 08:46, Satoshi Nagayasu wrote:
>> On 2015/10/03 6:18, Peter Geoghegan wrote:
>>> On Wed, Sep 2, 2015 at 7:41 PM, Satoshi Nagayasu
>>>  wrote:
 I know this still needs to be discussed, but I would like to
 submit a patch for further discussion at the next CF, 2015-11.
>>>
>>> I think I already expressed this in my explanation of the
>>> current behavior, but to be clear: -1 from me to this proposal.
>>
>> I would like to introduce queryId to pgaudit and sql_firewall
>> extensions to determine query groups. queryId could be useful if
>> available in those modules.
>>
>> I think users may want to do that based on object names, because
>> they issue queries with the object names, not the internal object
>> ids.
>>
>> Changing queryId after re-creating the same table may make the
>> user gets confused, because the query string the user issue is not
>> changed.
>>
>> At least, I would like to give some options to be chosen by the
>> user. Is it possible and/or reasonable?
>>
>
> I'm also rather sceptical about this change.

Hm. Thinking a bit about this patch, it presents the advantage to be
able to track the same queries easily across systems even if those
tables have been created with a different OID. Say for example a
production server and a test server, the test server having a bunch of
other relations that messed up with OID consistency. So you could
compare more easily statistics across those servers with postgres_fdw
for example, and I guess that's what Nagayasu-san had in mind.

> In any case, the change you propose in this patch is not good, as you
> only consider the relation name and ignore the relation's schema. You
> need to also add the namespace name in the jumble state.

Yep, pg_class ensures the uniqueness of relations using (relname,
relnamespace), so we would need as well the namespace. Now, looking at
the code of pg_stat_statements the database OID is not used to
generate the queryid, right, hence don't we need to add as well the
database name of this relation in the set?

Also, isn't this patch actually broken if we rename relations and/or
its namespace? It seems that we would begin to generate inconsistent
query IDs that happens.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements query jumbling question

2015-11-03 Thread Peter Geoghegan
On Sat, Oct 31, 2015 at 10:03 AM, Julien Rouhaud
 wrote:
>> At least, I would like to give some options to be chosen by the
>> user. Is it possible and/or reasonable?
>>
>
> I'm also rather sceptical about this change.

Is anyone willing to argue for it, apart from Satoshi?


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements query jumbling question

2015-10-31 Thread Julien Rouhaud
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hello,

On 10/10/2015 08:46, Satoshi Nagayasu wrote:
> On 2015/10/03 6:18, Peter Geoghegan wrote:
>> On Wed, Sep 2, 2015 at 7:41 PM, Satoshi Nagayasu
>>  wrote:
>>> I know this still needs to be discussed, but I would like to
>>> submit a patch for further discussion at the next CF, 2015-11.
>> 
>> I think I already expressed this in my explanation of the
>> current behavior, but to be clear: -1 from me to this proposal.
> 
> I would like to introduce queryId to pgaudit and sql_firewall
> extensions to determine query groups. queryId could be useful if
> available in those modules.
> 
> I think users may want to do that based on object names, because
> they issue queries with the object names, not the internal object
> ids.
> 
> Changing queryId after re-creating the same table may make the
> user gets confused, because the query string the user issue is not
> changed.
> 
> At least, I would like to give some options to be chosen by the
> user. Is it possible and/or reasonable?
> 

I'm also rather sceptical about this change.

In any case, the change you propose in this patch is not good, as you
only consider the relation name and ignore the relation's schema. You
need to also add the namespace name in the jumble state.

Also, the documentation would need to be updated, at least on this part:

"As a rule of thumb, queryid values can be assumed to be stable and
comparable only so long as the underlying server version and catalog
metadata details stay exactly the same. Two servers participating in
replication based on physical WAL replay can be expected to have
identical queryid values for the same query. However, logical
replication schemes do not promise to keep replicas identical in all
relevant details, so queryid will not be a useful identifier for
accumulating costs across a set of logical replicas. If in doubt,
direct testing is recommended."

Regards.

> Regards,


- -- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.17 (GNU/Linux)

iQEcBAEBAgAGBQJWNPR+AAoJELGaJ8vfEpOqP3AH/j8Eob8jaXnhXNGsjJ7NyBkc
91pn87iW+FgyRGNH8K7wvOyPnBl3JWjlaIVShdImoph6agn6HhyXqbEceKXKKBS5
7B2fhY0xjhzFrKbQ9NpwxAvnPKA4ChOFsFmN/YtJMZzBEUHMDBcZpixhoGXy6iKB
Q44blhmZswXtJ744p3oslUNJX0eKoIZuqxRihVvUJYo/7Ogh0kWje5W4btA+CT/k
/IkQcMgUZj8jhyNYqBTaxlgNRJvTmk+iSuieAL6Fjjwq/VhR9QyD+4viKC+q0Nf6
puL74qPvwLzvy3+MXGc0dg+8m+PwZWcDg/pZSka+5EwJaHUv5bjouDZqimHH978=
=jlQP
-END PGP SIGNATURE-


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements query jumbling question

2015-10-10 Thread Satoshi Nagayasu

On 2015/10/03 6:18, Peter Geoghegan wrote:

On Wed, Sep 2, 2015 at 7:41 PM, Satoshi Nagayasu  wrote:

I know this still needs to be discussed, but I would like to submit
a patch for further discussion at the next CF, 2015-11.


I think I already expressed this in my explanation of the current
behavior, but to be clear: -1 from me to this proposal.


I would like to introduce queryId to pgaudit and sql_firewall extensions
to determine query groups. queryId could be useful if available in those
modules.

I think users may want to do that based on object names, because they
issue queries with the object names, not the internal object ids.

Changing queryId after re-creating the same table may make the user
gets confused, because the query string the user issue is not changed.

At least, I would like to give some options to be chosen by the user.
Is it possible and/or reasonable?

Regards,
--
NAGAYASU Satoshi 


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements query jumbling question

2015-10-02 Thread Peter Geoghegan
On Wed, Sep 2, 2015 at 7:41 PM, Satoshi Nagayasu  wrote:
> I know this still needs to be discussed, but I would like to submit
> a patch for further discussion at the next CF, 2015-11.

I think I already expressed this in my explanation of the current
behavior, but to be clear: -1 from me to this proposal.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements query jumbling question

2015-09-02 Thread Satoshi Nagayasu
On 2015/09/01 14:39, Satoshi Nagayasu wrote:
> On 2015/09/01 14:01, Tom Lane wrote:
>> Satoshi Nagayasu  writes:
>>> On 2015/09/01 13:41, Peter Geoghegan wrote:
 If you want to use the queryId field directly, which I recall you
 mentioning before, then that's harder. There is simply no contract
 among extensions for "owning" a queryId. But when the fingerprinting
 code is moved into core, then I think at that point queryId may cease
 to be even a thing that pg_stat_statements theoretically has the right
 to write into. Rather, it just asks the core system to do the
 fingerprinting, and finds it within queryId. At the same time, other
 extensions may do the same, and don't need to care about each other.

 Does that work for you?
>>
>>> Yes. I think so.
>>
>>> I need some query fingerprint to determine query group. I want queryid
>>> to keep the same value when query strings are the same (except literal
>>> values).
>>
>> The problem I've got with this is the unquestioned assumption that every
>> application for query IDs will have exactly the same requirements for
>> what the ID should include or ignore.
> 
> I'm not confident about that too, but at least, I think we will be able
> to collect most common use cases as of today. (aka best guess. :)
> 
> And IMHO it would be ok to change the spec in future release.

I know this still needs to be discussed, but I would like to submit
a patch for further discussion at the next CF, 2015-11.

Regards,
-- 
NAGAYASU Satoshi 
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c 
b/contrib/pg_stat_statements/pg_stat_statements.c
index 59b8a2e..2f6fb99 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -76,6 +76,8 @@
 #include "tcop/utility.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
+#include "utils/rel.h"
+#include "utils/relcache.h"
 
 PG_MODULE_MAGIC;
 
@@ -2286,6 +2288,7 @@ static void
 JumbleRangeTable(pgssJumbleState *jstate, List *rtable)
 {
ListCell   *lc;
+   Relation   rel;
 
foreach(lc, rtable)
{
@@ -2296,7 +2299,9 @@ JumbleRangeTable(pgssJumbleState *jstate, List *rtable)
switch (rte->rtekind)
{
case RTE_RELATION:
-   APP_JUMB(rte->relid);
+   rel = RelationIdGetRelation(rte->relid);
+   APP_JUMB_STRING(RelationGetRelationName(rel));
+   RelationClose(rel);
JumbleExpr(jstate, (Node *) rte->tablesample);
break;
case RTE_SUBQUERY:

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements query jumbling question

2015-08-31 Thread Peter Geoghegan
On Mon, Aug 31, 2015 at 8:32 PM, Satoshi Nagayasu  wrote:
> Why don't we use relation name (with looking up the catalog)
> on query jumbling? For performance reason?

I think that there is a good case for preferring this behavior. While
it is a little confusing that pg_stat_statements does not change the
representative query string, renaming a table does not make it a
substantively different table.

There is, IIRC, one case where a string is jumbled directly (CTE
name). It's usually not the right thing, IMV.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements query jumbling question

2015-08-31 Thread Peter Geoghegan
On Mon, Aug 31, 2015 at 9:29 PM, Satoshi Nagayasu  wrote:
> BTW, I'm interested in improving the queryid portability now because
> I'd like to use it in other extensions. :)
> That's the reason why I'm looking at query jumbling here.

Are you interested in having the query fingerprinting/jumbling
infrastructure available to all backend code? That seems like a good
idea to me generally. I would like to be able to put queryId in
log_line_prefix, or to display it within EXPLAIN, and have it
available everywhere. I like the idea of per-query
log_min_duration_statement settings.

If you want to use the queryId field directly, which I recall you
mentioning before, then that's harder. There is simply no contract
among extensions for "owning" a queryId. But when the fingerprinting
code is moved into core, then I think at that point queryId may cease
to be even a thing that pg_stat_statements theoretically has the right
to write into. Rather, it just asks the core system to do the
fingerprinting, and finds it within queryId. At the same time, other
extensions may do the same, and don't need to care about each other.

Does that work for you?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements query jumbling question

2015-08-31 Thread Satoshi Nagayasu

On 2015/09/01 13:41, Peter Geoghegan wrote:

On Mon, Aug 31, 2015 at 9:29 PM, Satoshi Nagayasu  wrote:

BTW, I'm interested in improving the queryid portability now because
I'd like to use it in other extensions. :)
That's the reason why I'm looking at query jumbling here.


Are you interested in having the query fingerprinting/jumbling
infrastructure available to all backend code? That seems like a good
idea to me generally.


Yes. I've been working on the sql_firewall extension[1], which is
totally built on top of the pg_stat_statements.

[1] http://pgsnaga.blogspot.jp/2015/08/postgresql-sql-firewall.html

As of today, sql_firewall has duplicated code for query jumbling.
But if it goes into the core, it looks fantastic.


I would like to be able to put queryId in
log_line_prefix, or to display it within EXPLAIN, and have it
available everywhere. I like the idea of per-query
log_min_duration_statement settings.


Sounds cool. :)


If you want to use the queryId field directly, which I recall you
mentioning before, then that's harder. There is simply no contract
among extensions for "owning" a queryId. But when the fingerprinting
code is moved into core, then I think at that point queryId may cease
to be even a thing that pg_stat_statements theoretically has the right
to write into. Rather, it just asks the core system to do the
fingerprinting, and finds it within queryId. At the same time, other
extensions may do the same, and don't need to care about each other.

Does that work for you?


Yes. I think so.

I need some query fingerprint to determine query group. I want queryid
to keep the same value when query strings are the same (except literal 
values).


Another reason is just because I need to import/export query ids.

Regards,
--
NAGAYASU Satoshi 


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements query jumbling question

2015-08-31 Thread Satoshi Nagayasu
On 2015/09/01 14:01, Tom Lane wrote:
> Satoshi Nagayasu  writes:
>> On 2015/09/01 13:41, Peter Geoghegan wrote:
>>> If you want to use the queryId field directly, which I recall you
>>> mentioning before, then that's harder. There is simply no contract
>>> among extensions for "owning" a queryId. But when the fingerprinting
>>> code is moved into core, then I think at that point queryId may cease
>>> to be even a thing that pg_stat_statements theoretically has the right
>>> to write into. Rather, it just asks the core system to do the
>>> fingerprinting, and finds it within queryId. At the same time, other
>>> extensions may do the same, and don't need to care about each other.
>>>
>>> Does that work for you?
> 
>> Yes. I think so.
> 
>> I need some query fingerprint to determine query group. I want queryid
>> to keep the same value when query strings are the same (except literal
>> values).
> 
> The problem I've got with this is the unquestioned assumption that every
> application for query IDs will have exactly the same requirements for
> what the ID should include or ignore.

I'm not confident about that too, but at least, I think we will be able
to collect most common use cases as of today. (aka best guess. :)

And IMHO it would be ok to change the spec in future release.

Regards,
-- 
NAGAYASU Satoshi 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements query jumbling question

2015-08-31 Thread Satoshi Nagayasu



On 2015/09/01 12:36, Peter Geoghegan wrote:

On Mon, Aug 31, 2015 at 8:32 PM, Satoshi Nagayasu  wrote:

Why don't we use relation name (with looking up the catalog)
on query jumbling? For performance reason?


I think that there is a good case for preferring this behavior. While
it is a little confusing that pg_stat_statements does not change the
representative query string, renaming a table does not make it a
substantively different table.

There is, IIRC, one case where a string is jumbled directly (CTE
name). It's usually not the right thing, IMV.



Thanks for the comment. I've never considered that. Interesting.

From the users point of view, IMHO, it would be better to avoid
confusing if queryid is determined by only visible values -- userid,
dbid and query string itself.

BTW, I'm interested in improving the queryid portability now because
I'd like to use it in other extensions. :)
That's the reason why I'm looking at query jumbling here.

Thoughts?

Regards,
--
NAGAYASU Satoshi 


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements query jumbling question

2015-08-31 Thread Tom Lane
Satoshi Nagayasu  writes:
> On 2015/09/01 13:41, Peter Geoghegan wrote:
>> If you want to use the queryId field directly, which I recall you
>> mentioning before, then that's harder. There is simply no contract
>> among extensions for "owning" a queryId. But when the fingerprinting
>> code is moved into core, then I think at that point queryId may cease
>> to be even a thing that pg_stat_statements theoretically has the right
>> to write into. Rather, it just asks the core system to do the
>> fingerprinting, and finds it within queryId. At the same time, other
>> extensions may do the same, and don't need to care about each other.
>> 
>> Does that work for you?

> Yes. I think so.

> I need some query fingerprint to determine query group. I want queryid
> to keep the same value when query strings are the same (except literal 
> values).

The problem I've got with this is the unquestioned assumption that every
application for query IDs will have exactly the same requirements for
what the ID should include or ignore.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers