On Mon, Oct 21, 2024 at 10:35:17AM +0200, Anthonin Bonnefoy wrote: > I've updated 0001 to only use ORDER BY query. The query strings are > not exact doublons, as the nested statement has the additional ending > ';' due to using the whole string instead of just the RawStmt. Thus, > the other sort expressions will never be used since there's no > equality. There's also the possibility of breaking down each statement > in individual blocks, with pgss reset and fetch for each one. However, > I feel it's gonna add a lot of noise in the test file.
I've looked at 0001, and finished by splitting the case of all-level tracking with the multi-statements as the resulting table was feeling heavily bloated, particularly because of MERGE that spawned in multiple lines even if there were less entries. The rest, except for some styling inconsistencies, was feeling OK. One of the multi-statement tests includes this output for HEAD, and that's on two PGSS entries: EXPLAIN (COSTS OFF) SELECT $1, $2 UNION SELECT $3, $4; EXPLAIN (COSTS OFF) (SELECT 1, 2, 3) UNION SELECT 3, 4, 5; EXPLAIN (COSTS OFF) SELECT 1, 2 UNION SELECT 3, 4; EXPLAIN (COSTS OFF) (SELECT $1, $2, $3) UNION SELECT $4, $5, $6; I did not notice that first, but that's really something! Normalization is only applied partially to a portion of the string, so we have a bunch of bloat for non-top queries that has existed for years. + ParseLoc stmt_location; /* start location, or -1 if unknown */ + ParseLoc stmt_len; /* length in bytes; 0 means "rest of string" */ I'm OK with this approach after considering a few things, mostly in terms of consistency with the existing node structures. The existing business with YYLLOC_DEFAULT() counts here. -Query * -transformTopLevelStmt(ParseState *pstate, RawStmt *parseTree) What's the advantage of removing this routine? Is that because you're pushing the initialization of stmt_location and stmt_len into transformOptionalSelectInto(), making it mostly moot? Something that worries me a bit is that this changes makes the code less clean, by having a SELECT-INTO specific routine called in the parse-analyze paths, while adding three individual paths in charge of setting pstate->p_stmt_len and p_stmt_location. + n->stmt_len = @3 - @2; This specific case deserves a comment. I don't have the best understanding of this area yet (need more analysis), but With the existing updateRawStmtEnd() and RawStmt also tracking this information, I am wondering if we could be smarter with less paths manipulating the start locations and lengths. And your patch adds a new setQueryStmtLen() that does the same kind of job. Hmm. FWIW, I don't feel strongly about 0004 that tries to make the REFRESH handling smarter. I am not sure that it even makes sense as-is by hacking into a wrapper of pg_get_viewdef_worker() to get the query string, leading it to not be normalized. This has also a small footprint in 0003. I think that the bits in ExecRefreshMatView() should be discarded from 0003, as a result. -- Michael
signature.asc
Description: PGP signature