At Wed, 21 Dec 2016 09:28:58 +0100 (CET), Fabien COELHO <coe...@cri.ensmp.fr> wrote in <alpine.DEB.2.20.1612210841370.3892@lancre> > > Hello Robert & Kyotaro, > > >> I'm a little doubtful about your proposed fix. However, I haven't > >> studied the code, so I don't know what other approach might be better. > > That is indeed the question! > > The key point is that the parser parses several statements from a > string, but currently there is no clue about where the queries was > found in the string. Only a parser may know about what is being parsed > so can generate the location information. Now the possible solutions I > see are: > > - the string is split per query before parsing, but this requires at > least a lexer... it looks pretty uneffective to have another lexing > phase involved, even if an existing lexer is reused.
I don't see this is promising. Apparently a waste of CPU cycles. > - the parser processes one query at a time and keep the "remaining > unparse string" in some state that can be queried to check up to > where it proceeded, but then the result must be stored somewhere > and it would make sense that it would be in the statement anyway, > just the management of the location information would be outside > the parser. Also that would add the cost of relaunching the parser, > not sure how bad or insignificant this is. This is (2) below. I raised this as a spoiler, I see this is somewhat too invasive for the problem to be solved. > - the parser still generates a List<Node> but keep track of the location > of statements doing so, somehow... propably as I outlined. Yeah, this seems most reasonable so far. It seems to me to be better that the statement location is conveyed as a part of a parse tree so as not to need need a side channel for location. I'd like to rename debug_query_string to more reasonable name if we are allowed but perhaps not. > The query string information can be at some way of pointing in the > initial > string, or the substring itself that could be extracted at some point. > I initially suggested the former because this is already what the > parser does for some nodes, and because it seems simpler to do so. > > Extracting the string instead would suggest that the location of > tokens > within this statement are relative to this string rather than the > initial one, but that involves a lot more changes and it is easier to > miss something doing so. Agreed that copying statement string would be too much. But the new *Stmt node should somehow have also the end location of the statement since the user of a parse tree cannot look into another one. > > That will work and doable, but the more fundamental issue here > > seems to be that post_parse_analyze_hook or other similar hooks > > are called with a Query incosistent with query_debug_string. > > Sure. > > > It is very conveniently used but the name seems to me to be suggesting > > that such usage is out of purpose. I'm not sure, though. > > > > A similar behavior is shown in error messages but this harms far > > less than the case of pg_stat_statements. > > > > ERROR: column "b" does not exist > > LINE 1: ...elect * from t where a = 1; select * from t where b = 2; > > com... > > ^ > > Ah, I wrote this piece of code a long time ago:-) The location is > relative to the full string, see comment above, changing that would > involve much > more changes, and I'm not sure whether it is desirable. > > Also, think of: > > SELECT * FROM foo; DROP TABLE foo; SELECT * FROM foo; > > Maybe the context to know which "SELECT * FROM foo" generates an error > should be kept. > > > 1. Let all of the parse node have location in > > debug_query_string. (The original proposal) > > Yep. > > > 2. Let the parser return once for each statement (like psql > > parser) > > I'm not sure it does... "\;" generates ";" in the output and the psql > lexer keeps on lexing. > > > and corresponding query string is stored in a > > varialble other than debug_query_string. > > I think that would involve many changes because of the way postgres is > written, the list is expected and returned by quite a few > functions. Moreover query rewriting may generate several queries out > of one anyway, so the list would be kept. > > So I'm rather still in favor of my initial proposal, that is extend > the existing location information to statements, not only some tokens. I thought that it's too much to let all types of parse node have location but grepping the gram.y with "makeNode" pursuaded me to agree with you. After changing all *Stmt nodes, only several types of nodes seems missing it. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers