Re: [HACKERS] BUG: pg_stat_statements query normalization issues with combined queries

Kyotaro HORIGUCHI Wed, 21 Dec 2016 23:04:17 -0800

At Wed, 21 Dec 2016 09:28:58 +0100 (CET), Fabien COELHO <coe...@cri.ensmp.fr> 
wrote in <alpine.DEB.2.20.1612210841370.3892@lancre>
> 
> Hello Robert & Kyotaro,
> 
> >> I'm a little doubtful about your proposed fix.  However, I haven't
> >> studied the code, so I don't know what other approach might be better.
> 
> That is indeed the question!
> 
> The key point is that the parser parses several statements from a
> string, but currently there is no clue about where the queries was
> found in the string. Only a parser may know about what is being parsed
> so can generate the location information. Now the possible solutions I
> see are:
> 
>  - the string is split per query before parsing, but this requires at
>    least a lexer... it looks pretty uneffective to have another lexing
>    phase involved, even if an existing lexer is reused.


I don't see this is promising. Apparently a waste of CPU cycles.

>  - the parser processes one query at a time and keep the "remaining
>    unparse string" in some state that can be queried to check up to
>    where it proceeded, but then the result must be stored somewhere
>    and it would make sense that it would be in the statement anyway,
>    just the management of the location information would be outside
>    the parser. Also that would add the cost of relaunching the parser,
>    not sure how bad or insignificant this is. This is (2) below.

I raised this as a spoiler, I see this is somewhat too invasive
for the problem to be solved.

>  - the parser still generates a List<Node> but keep track of the location
>    of statements doing so, somehow... propably as I outlined.

Yeah, this seems most reasonable so far. It seems to me to be
better that the statement location is conveyed as a part of a
parse tree so as not to need need a side channel for location.

I'd like to rename debug_query_string to more reasonable name if
we are allowed but perhaps not.

> The query string information can be at some way of pointing in the
> initial
> string, or the substring itself that could be extracted at some point.
> I initially suggested the former because this is already what the
> parser does for some nodes, and because it seems simpler to do so.
> 
> Extracting the string instead would suggest that the location of
> tokens
> within this statement are relative to this string rather than the
> initial one, but that involves a lot more changes and it is easier to
> miss something doing so.

Agreed that copying statement string would be too much. But the
new *Stmt node should somehow have also the end location of the
statement since the user of a parse tree cannot look into another
one.


> > That will work and doable, but the more fundamental issue here
> > seems to be that post_parse_analyze_hook or other similar hooks
> > are called with a Query incosistent with query_debug_string.
> 
> Sure.
> 
> > It is very conveniently used but the name seems to me to be suggesting
> > that such usage is out of purpose. I'm not sure, though.
> >
> > A similar behavior is shown in error messages but this harms far
> > less than the case of pg_stat_statements.
> >
> > ERROR:  column "b" does not exist
> > LINE 1: ...elect * from t where a = 1; select * from t where b = 2;
> > com...
> >                                                             ^
> 
> Ah, I wrote this piece of code a long time ago:-) The location is
> relative to the full string, see comment above, changing that would
> involve much
> more changes, and I'm not sure whether it is desirable.
> 
> Also, think of:
> 
> SELECT * FROM foo; DROP TABLE foo; SELECT * FROM foo;
> 
> Maybe the context to know which "SELECT * FROM foo" generates an error
> should be kept.
> 
> > 1. Let all of the parse node have location in
> >  debug_query_string. (The original proposal)
> 
> Yep.
> 
> > 2. Let the parser return once for each statement (like psql
> >   parser)
> 
> I'm not sure it does... "\;" generates ";" in the output and the psql
> lexer keeps on lexing.
> 
> > and corresponding query string is stored in a
> >   varialble other than debug_query_string.
> 
> I think that would involve many changes because of the way postgres is
> written, the list is expected and returned by quite a few
> functions. Moreover query rewriting may generate several queries out
> of one anyway, so the list would be kept.
> 
> So I'm rather still in favor of my initial proposal, that is extend
> the existing location information to statements, not only some tokens.

I thought that it's too much to let all types of parse node have
location but grepping the gram.y with "makeNode" pursuaded me to
agree with you. After changing all *Stmt nodes, only several
types of nodes seems missing it.
 
regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BUG: pg_stat_statements query normalization issues with combined queries

Reply via email to