Hello Robert & Kyotaro,

I'm a little doubtful about your proposed fix. However, I haven't studied the code, so I don't know what other approach might be better.

That is indeed the question!

The key point is that the parser parses several statements from a string, but currently there is no clue about where the queries was found in the string. Only a parser may know about what is being parsed so can generate the location information. Now the possible solutions I see are:

 - the string is split per query before parsing, but this requires at
   least a lexer... it looks pretty uneffective to have another lexing
   phase involved, even if an existing lexer is reused.

 - the parser processes one query at a time and keep the "remaining
   unparse string" in some state that can be queried to check up to
   where it proceeded, but then the result must be stored somewhere
   and it would make sense that it would be in the statement anyway,
   just the management of the location information would be outside
   the parser. Also that would add the cost of relaunching the parser,
   not sure how bad or insignificant this is. This is (2) below.

 - the parser still generates a List<Node> but keep track of the location
   of statements doing so, somehow... propably as I outlined.

The query string information can be at some way of pointing in the initial
string, or the substring itself that could be extracted at some point.
I initially suggested the former because this is already what the parser does for some nodes, and because it seems simpler to do so.

Extracting the string instead would suggest that the location of tokens
within this statement are relative to this string rather than the initial one, but that involves a lot more changes and it is easier to miss something doing so.

That will work and doable, but the more fundamental issue here
seems to be that post_parse_analyze_hook or other similar hooks
are called with a Query incosistent with query_debug_string.


It is very conveniently used but the name seems to me to be suggesting that such usage is out of purpose. I'm not sure, though.

A similar behavior is shown in error messages but this harms far
less than the case of pg_stat_statements.

ERROR:  column "b" does not exist
LINE 1: ...elect * from t where a = 1; select * from t where b = 2; com...

Ah, I wrote this piece of code a long time ago:-) The location is relative to the full string, see comment above, changing that would involve much
more changes, and I'm not sure whether it is desirable.

Also, think of:


Maybe the context to know which "SELECT * FROM foo" generates an error should be kept.

1. Let all of the parse node have location in
 debug_query_string. (The original proposal)


2. Let the parser return once for each statement (like psql

I'm not sure it does... "\;" generates ";" in the output and the psql lexer keeps on lexing.

and corresponding query string is stored in a
  varialble other than debug_query_string.

I think that would involve many changes because of the way postgres is written, the list is expected and returned by quite a few functions. Moreover query rewriting may generate several queries out of one anyway, so the list would be kept.

So I'm rather still in favor of my initial proposal, that is extend the existing location information to statements, not only some tokens.


Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to