Hi, A few comments. + "After this number of duplicating constants start to merge them.",
duplicating -> duplicate + foreach(lc, (List *) expr) + { + Node * subExpr = (Node *) lfirst(lc); + + if (!IsA(subExpr, Const)) + { + allConst = false; + break; + } + } It seems the above foreach loop (within foreach(temp, (List *) node)) can be preceded with a check that allConst is true. Otherwise the loop can be skipped. + if (currentExprIdx == pgss_merge_threshold - 1) + { + JumbleExpr(jstate, expr); + + /* + * A const expr is already found, so JumbleExpr must + * record it. Mark it as merged, it will be the first + * merged but still present in the statement query. + */ + Assert(jstate->clocations_count > 0); + jstate->clocations[jstate->clocations_count - 1].merged = true; + currentExprIdx++; + } The above snippet occurs a few times. Maybe extract into a helper method. Cheers On Sat, Dec 26, 2020 at 2:45 AM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > On Wed, Nov 18, 2020 at 05:04:32PM +0100, Dmitry Dolgov wrote: > > > On Wed, Aug 12, 2020 at 06:19:02PM +0200, Dmitry Dolgov wrote: > > > > > > I would like to start another thread to follow up on [1], mostly to > bump up the > > > topic. Just to remind, it's about how pg_stat_statements jumbling > ArrayExpr in > > > queries like: > > > > > > SELECT something FROM table WHERE col IN (1, 2, 3, ...) > > > > > > The current implementation produces different jumble hash for every > different > > > number of arguments for essentially the same query. Unfortunately a > lot of ORMs > > > like to generate these types of queries, which in turn leads to > > > pg_stat_statements pollution. Ideally we want to prevent this and have > only one > > > record for such a query. > > > > > > As the result of [1] I've identified two highlighted approaches to > improve this > > > situation: > > > > > > * Reduce the generated ArrayExpr to an array Const immediately, in > cases where > > > all the inputs are Consts. > > > > > > * Make repeating Const to contribute nothing to the resulting hash. > > > > > > I've tried to prototype both approaches to find out pros/cons and be > more > > > specific. Attached patches could not be considered a completed piece > of work, > > > but they seem to work, mostly pass the tests and demonstrate the > point. I would > > > like to get some high level input about them and ideally make it clear > what is > > > the preferred solution to continue with. > > > > I've implemented the second approach mentioned above, this version was > > tested on our test clusters for some time without visible issues. Will > > create a CF item and would appreciate any feedback. > > After more testing I found couple of things that could be improved, > namely in the presence of non-reducible constants one part of the query > was not copied into the normalized version, and this approach also could > be extended for Params. These are incorporated in the attached patch. >