Re: [HACKERS] pg_stat_statements and non default search_path

2016-10-16 Thread Julien Rouhaud
On 16/10/2016 11:21, Craig Ringer wrote:
> On 16 Oct. 2016 14:31, Julien Rouhaud wrote:
>>
>> On 16/10/2016 02:38, Jim Nasby wrote:
>> > On 10/10/16 12:58 AM, Julien Rouhaud wrote:
>> >> Unless you mean deparsing the Query instead of using raw source
> text?  I
>> >> think that would solve this issue (and also the other issue when
>> >> multiple queries are submitted at once, you get the normalized version
>> >> of all the queries multiple time), but AFAIK ruleutils.c doesn't expose
>> >> enough to do it (like get_query_def()), and exposing it isn't an
> option.
>> >
>> > Why couldn't we expose it?
> 
> I'm interested in that too, for the purpose of passing the correct
> substring of a multi-statement to ProcessUtility_hook.

I have another use case for this: being able to easily print what is the
query (or queries) that are actually executed (ie. what's get out of the
rewriter).  That's useful when trying to figure out / optimize nested
views hell or abuse of rules.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements and non default search_path

2016-10-16 Thread Craig Ringer
On 16 Oct. 2016 14:31, "Julien Rouhaud"  wrote:
>
> On 16/10/2016 02:38, Jim Nasby wrote:
> > On 10/10/16 12:58 AM, Julien Rouhaud wrote:
> >> Unless you mean deparsing the Query instead of using raw source text?
I
> >> think that would solve this issue (and also the other issue when
> >> multiple queries are submitted at once, you get the normalized version
> >> of all the queries multiple time), but AFAIK ruleutils.c doesn't expose
> >> enough to do it (like get_query_def()), and exposing it isn't an
option.
> >
> > Why couldn't we expose it?

I'm interested in that too, for the purpose of passing the correct
substring of a multi-statement to ProcessUtility_hook. Perhaps by using
parser position data to generate a start pointer and length within the
querystring already passed.

For the problem in this thread one could always implement plan-to-query
transforms (deparse). But you'd get back something pretty far from the
input query after function inlining, subquery flattering, condition
pushdown/pullup, join elimination, etc etc.

I think pg_stat_plans attempted a middle ground here, capturing explain
style plans rather than trying to report the SQL.

I do think there is utility to storing search_path and optionally
specialising stats according to it. The most obvious case is multi-tenant
and schema-sharded models where per-schema stats would be very handy. You'd
want to use the active search path though so you eliminate references to
nonexistent schemas. Otherwise $user would screw things up.


Re: [HACKERS] pg_stat_statements and non default search_path

2016-10-16 Thread Julien Rouhaud
On 16/10/2016 10:47, Lukas Fittl wrote:
> Can somebody chime in if it would be feasible to store this in the
> out-of-band query text file, and whether a patch for this would be
> considered acceptable?

FWIW I already have a quick and dirty patch for this, I don't think
there's any major difficulty here.  I still hope we can find a way to
store the fully qualified objects name in the normalized query instead.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements and non default search_path

2016-10-16 Thread Lukas Fittl
On Sat, Oct 15, 2016 at 11:29 PM, Julien Rouhaud 
wrote:
>
> > BTW, after thinking about it some more, I don't see how storing the
> > search_path would help at all... it's not like you can do anything with
> > it unless you have a huge chunk of the parser available.
> >
>
> My use case is not really to know the fully qualified name of each
> identifier, but being able to optimize a problematic query found with
> pg_stat_statements.  I can already "unjumble" it automatically, but the
> search_path is needed to be able to get an explain or just execute the
> query.
>

I'd also find having the search_path available to be a helpful benefit -
I've run into the same problem as Julien where two identical queries
couldn't be identified correctly because of the missing search_path.

In particular this situation might happen if you shard different tenants
using schemas (one schema for each tenant), and use the search_path to set
the current tenant.

In my own setup I combine pg_stat_statements with a slightly hacked up copy
of the Postgres parser, so having the correct search_path + query text
available is enough to find the objects a query references (most of the
time, there are a few other edge cases).

My assumption thus far has been that adding another field like this might
not be considered because of performance considerations.

Can somebody chime in if it would be feasible to store this in the
out-of-band query text file, and whether a patch for this would be
considered acceptable?

Best,
Lukas

-- 
Lukas Fittl

Skype: lfittl
Phone: +1 415 321 0630


Re: [HACKERS] pg_stat_statements and non default search_path

2016-10-16 Thread Julien Rouhaud
On 16/10/2016 02:38, Jim Nasby wrote:
> On 10/10/16 12:58 AM, Julien Rouhaud wrote:
>> Unless you mean deparsing the Query instead of using raw source text?  I
>> think that would solve this issue (and also the other issue when
>> multiple queries are submitted at once, you get the normalized version
>> of all the queries multiple time), but AFAIK ruleutils.c doesn't expose
>> enough to do it (like get_query_def()), and exposing it isn't an option.
> 
> Why couldn't we expose it?
> 

I'm not really sure.  The only thread I could find on this topic didn't
get any answer:
https://www.postgresql.org/message-id/flat/5e5bb0f8-b05e-47c2-8706-f85e70a6d...@citusdata.com

> BTW, after thinking about it some more, I don't see how storing the
> search_path would help at all... it's not like you can do anything with
> it unless you have a huge chunk of the parser available.
> 

My use case is not really to know the fully qualified name of each
identifier, but being able to optimize a problematic query found with
pg_stat_statements.  I can already "unjumble" it automatically, but the
search_path is needed to be able to get an explain or just execute the
query.

Of course, I'd prefer that pgss can generate a fully qualified
normalized query instead of storing the search_path.

> BTW, this issue isn't limited to just tables; it affects almost every
> object identifier you can put in a query, like functions, operators,
> types, etc.

Yes, indeed.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements and non default search_path

2016-10-15 Thread Jim Nasby

On 10/10/16 12:58 AM, Julien Rouhaud wrote:

> Would another option be to temporarily set search_path to '' when
> getting the query text? ISTM that would be the best option.

I don't think that possible.  The query text used is raw query text
provided by the post_parse_analyse hook (ParseState->p_sourcetext).


Oh, hrm. :/


Unless you mean deparsing the Query instead of using raw source text?  I
think that would solve this issue (and also the other issue when
multiple queries are submitted at once, you get the normalized version
of all the queries multiple time), but AFAIK ruleutils.c doesn't expose
enough to do it (like get_query_def()), and exposing it isn't an option.


Why couldn't we expose it?

BTW, after thinking about it some more, I don't see how storing the 
search_path would help at all... it's not like you can do anything with 
it unless you have a huge chunk of the parser available.


BTW, this issue isn't limited to just tables; it affects almost every 
object identifier you can put in a query, like functions, operators, 
types, etc.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements and non default search_path

2016-10-09 Thread Julien Rouhaud
On 10/10/2016 05:00, Jim Nasby wrote:
> On 10/7/16 4:39 AM, Julien Rouhaud wrote:
>> I see two ways of fixing this. First one would be to store normalized
>> queries while fully qualifiying the relation's names. After a quick look
>> this can't be done without storing at least a token location in
>> RangeTblEntry nodes, which sounds like a bad idea.
>>
>> The other way would be to store the value of search_path with each pgss
>> entry (either in shared_memory or in the external file).
>>
>> Is it something that we should fix, and if yes is any of this
>> acceptable, or does someone see another way to solve this problem?
> 
> Would another option be to temporarily set search_path to '' when
> getting the query text? ISTM that would be the best option.

I don't think that possible.  The query text used is raw query text
provided by the post_parse_analyse hook (ParseState->p_sourcetext).

Unless you mean deparsing the Query instead of using raw source text?  I
think that would solve this issue (and also the other issue when
multiple queries are submitted at once, you get the normalized version
of all the queries multiple time), but AFAIK ruleutils.c doesn't expose
enough to do it (like get_query_def()), and exposing it isn't an option.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements and non default search_path

2016-10-09 Thread Jim Nasby

On 10/7/16 4:39 AM, Julien Rouhaud wrote:

I see two ways of fixing this. First one would be to store normalized
queries while fully qualifiying the relation's names. After a quick look
this can't be done without storing at least a token location in
RangeTblEntry nodes, which sounds like a bad idea.

The other way would be to store the value of search_path with each pgss
entry (either in shared_memory or in the external file).

Is it something that we should fix, and if yes is any of this
acceptable, or does someone see another way to solve this problem?


Would another option be to temporarily set search_path to '' when 
getting the query text? ISTM that would be the best option.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_stat_statements and non default search_path

2016-10-07 Thread Julien Rouhaud
Hello,

I've faced multiple times a painful limitation with pg_stat_statements.
Queries are normalized based on the textual SQL statements, and the
queryid is computed using objects' oids. So for instance with different
search_path settings, we can end up with the same normalized query but
different queryids, and there's no way to know what are the actual
relations each query is using. It also means that it can be almost
impossible to use the normalized query to replay a query.

I see two ways of fixing this. First one would be to store normalized
queries while fully qualifiying the relation's names. After a quick look
this can't be done without storing at least a token location in
RangeTblEntry nodes, which sounds like a bad idea.

The other way would be to store the value of search_path with each pgss
entry (either in shared_memory or in the external file).

Is it something that we should fix, and if yes is any of this
acceptable, or does someone see another way to solve this problem?

Regards.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers