Re: [HACKERS] Custom compression methods
> 25 дек. 2020 г., в 14:34, Dilip Kumar написал(а): > > > > > > > Maybe add Lz4\Zlib WAL FPI compression on top of this patchset? I'm not insisting on anything, it just would be so cool to have it... BTW currently there are Oid collisions in original patchset. Best regards, Andrey Borodin. v16-0007-Add-Lz4-compression-to-WAL-FPIs.patch Description: Binary data
possible replacement for llvm JIT
Hi maybe interesting project for decreasing overhead of JIT https://developers.redhat.com/blog/2020/01/20/mir-a-lightweight-jit-compiler-project/ Regards Pavel
Re: Wrong comment in tuptable.h
Hi, On 2020-12-26 18:00:49 -0800, Jeff Davis wrote: > /* >* Return a copy of heap tuple representing the contents of the slot. > The >* copy needs to be palloc'd in the current memory context. The slot >* itself is expected to remain unaffected. It is *not* expected to > have >* meaningful "system columns" in the copy. The copy is not be > "owned" by >* the slot i.e. the caller has to take responsibility to free memory >* consumed by the slot. >*/ > HeapTuple (*copy_heap_tuple) (TupleTableSlot *slot); > > But acquire_sample_rows() calls ExecCopySlotHeapTuple(), and then > subsequently sorts the rows by TID. Is acquire_sample_rows() doing > something it shouldn't, or is the comment mistaken? I think the comment is too vague and thinking of system columns as xmin/xmax/cmin/cmax. Greetings, Andres Freund
Wrong comment in tuptable.h
/* * Return a copy of heap tuple representing the contents of the slot. The * copy needs to be palloc'd in the current memory context. The slot * itself is expected to remain unaffected. It is *not* expected to have * meaningful "system columns" in the copy. The copy is not be "owned" by * the slot i.e. the caller has to take responsibility to free memory * consumed by the slot. */ HeapTuple (*copy_heap_tuple) (TupleTableSlot *slot); But acquire_sample_rows() calls ExecCopySlotHeapTuple(), and then subsequently sorts the rows by TID. Is acquire_sample_rows() doing something it shouldn't, or is the comment mistaken? Regards, Jeff Davis
Re: SQL/JSON: JSON_TABLE
For new files introduced in the patches: + * Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group 2021 is a few days ahead. It would be good to update the year range. For transformJsonTableColumn: + jfexpr->op = + jtc->coltype == JTC_REGULAR ? IS_JSON_VALUE : + jtc->coltype == JTC_EXISTS ? IS_JSON_EXISTS : IS_JSON_QUERY; Should IS_JSON_EXISTS be mentioned in the comment preceding the method ? For JsonTableDestroyOpaque(): + state->opaque = NULL; Should the memory pointed to by opaque be freed ? + l2 = list_head(tf->coltypes); + l3 = list_head(tf->coltypmods); + l4 = list_head(tf->colvalexprs); Maybe the ListCell pointer variables can be named corresponding to the lists they iterate so that the code is easier to understand. +typedef enum JsonTablePlanJoinType +{ + JSTP_INNER = 0x01, + JSTP_OUTER = 0x02, + JSTP_CROSS = 0x04, Since plan type enum starts with JSTP_, I think the plan join type should start with JSTPJ_ or JSTJ_. Otherwise the following would be a bit hard to read: + else if (plan->plan_type == JSTP_JOINED) + { + if (plan->join_type == JSTP_INNER || + plan->join_type == JSTP_OUTER) since different fields are checked in the two if statements but the prefixes don't convey that. + Even though the path names are not incuded into the PLAN DEFAULT Typo: incuded -> included + int nchilds = 0; nchilds -> nchildren +#if 0 /* XXX it' unclear from the standard whether root path name is mandatory or not */ + if (plan && plan->plan_type != JSTP_DEFAULT && !rootPathName) Do you plan to drop the if block ? Cheers On Fri, Dec 25, 2020 at 12:32 PM Nikita Glukhov wrote: > > On 03.08.2020 10:55, Michael Paquier wrote: > > On Sun, Jul 05, 2020 at 12:15:58PM -0500, Justin Pryzby wrote: > > It looks like this needs to be additionally rebased - I will set cfbot to > "Waiting". > > ... Something that has not happened in four weeks, so this is marked > as returned with feedback. Please feel free to resubmit once a rebase > is done. > -- > Michael > > > Atatched 44th version of the pacthes rebased onto current master > (#0001 corresponds to v51 of SQL/JSON patches). > > > -- > Nikita Glukhov > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company >
Re: Better client reporting for "immediate stop" shutdowns
Hi, On 2020-12-26 13:37:15 -0500, Tom Lane wrote: > I suppose a compromise position could be to let the postmaster export its > "Shutdown" state variable, and then let backends assume that SIGTERM means > fast shutdown or pg_terminate_backend depending on the state of that one > global variable. But it'd be a bit imprecise so I don't really feel it's > more useful than what we have. Fair enough, I think. > > I'd like to not log all these repeated messages into the server > > log. It's quite annoying to have to digg through thousands of lines of > > repeated "terminating connection..." > > Hm. That's an orthogonal issue, but certainly worth considering. > There are a couple of levels we could consider: > > 1. Just make the logged messages less verbose (they certainly don't > need the DETAIL and HINT lines). > > 2. Suppress the log entries altogether. > > I would have been against #2 before this patch, because it'd mean > that a rogue SIGQUIT leaves no clear trace in the log. But with > this patch, we can be fairly sure that we know whether SIGQUIT came > from the postmaster, and then it might be all right to suppress the > log entry altogether when it did. > > I'd be happy to write up a patch for either of these, but let's > decide what we want first. My vote would be #2, with the same reasoning as yours. Greetings, Andres Freund
Re: SQL/JSON: functions
Hi, For ExecEvalJsonExprSubtrans(), if you check !subtrans first, + /* No need to use subtransactions. */ + return func(op, econtext, res, resnull, p, error); The return statement would allow omitting the else keyword and left-indent the code in the current if block. For ExecEvalJsonExpr() + *resnull = !DatumGetPointer(res); + if (error && *error) + return (Datum) 0; Suppose *resnull is false and *error is true, 0 would be returned with *resnull as false. Should the *resnull be consistent with the actual return value ? For ExecEvalJson() : + Assert(*op->resnull); + *op->resnull = true; I am not sure of the purpose for the assignment since *op->resnull should be true by the assertion. For raw_expression_tree_walker : + if (walker(jfe->on_empty, context)) + return true; Should the if condition include jfe->on_empty prior to walking ? nit: for contain_mutable_functions_walker, if !IsA(jexpr->path_spec, Const) is checked first (and return), the current if block can be left indented. For JsonPathDatatypeStatus, + jpdsDateTime, /* unknown datetime type */ Should the enum be named jpdsUnknownDateTime so that its meaning is clear to people reading the code ? For get_json_behavior(), I wonder if mapping from behavior->btype to the string form would shorten the body of switch statement. e.g. char* map[] = { " NULL", " ERROR", " EMPTY", ... }; Cheers On Fri, Dec 25, 2020 at 5:19 PM Zhihong Yu wrote: > For 0001-Common-SQL-JSON-clauses-v51.patch : > > + /* | implementation_defined_JSON_representation_option (BSON, > AVRO etc) */ > > I don't find implementation_defined_JSON_representation_option in the > patchset. Maybe rephrase the above as a comment > without implementation_defined_JSON_representation_option ? > > For getJsonEncodingConst(), should the method error out for the default > case of switch (encoding) ? > > 0002-SQL-JSON-constructors-v51.patch : > > + Assert(!OidIsValid(collation)); /* result is always an > json[b] type */ > > an json -> a json > > + /* XXX TEXTOID is default by standard */ > + returning->typid = JSONOID; > > Comment doesn't seem to match the assignment. > > For json_object_agg_transfn : > > + if (out->len > 2) > + appendStringInfoString(out, ", "); > > Why length needs to be at least 3 (instead of 2) ? > > Cheers > > On Fri, Dec 25, 2020 at 12:26 PM Nikita Glukhov > wrote: > >> On 17.09.2020 08:41, Michael Paquier wrote: >> >> On Sat, Jul 18, 2020 at 09:24:11AM -0400, Andrew Dunstan wrote: >> >> I think patches 5 and 6 need to be submitted to the next commitfest, >> This is far too much scope creep to be snuck in under the current CF item. >> >> I'll look at patches 1-4. >> >> Even with that, the patch set has been waiting on author for the last >> six weeks, so I am marking it as RwF for now. Please feel free to >> resubmit. >> >> Attached 51st version of the patches rebased onto current master. >> >> >> There were some shift/reduce conflicts in SQL grammar that have appeared >> after "expr AS keyword" refactoring in 06a7c3154f. I'm not sure if I >> resolved >> them correctly. JSON TEXT pseudotype, introduced in #0006, caused a lot of >> grammar conflicts, so it was replaced with simple explicit pg_catalog.json. >> >> Also new CoercionForm COERCE_SQL_SYNTAX was introduced, and this reminds >> custom >> function formats that I have used in earlier version of the patches for >> deparsing of SQL/JSON constructor expressions that were based on raw json[b] >> function calls. These custom function formats were replaced in v43 with >> dedicated executor nodes for SQL/JSON constructors. So, I'm not sure is it >> worth to try to replace back nodes with new COERCE_SQL_SYNTAX. >> >> -- >> Nikita Glukhov >> Postgres Professional: http://www.postgrespro.com >> The Russian Postgres Company >> >>
Re: Better client reporting for "immediate stop" shutdowns
Andres Freund writes: > On 2020-12-21 16:43:33 -0500, Tom Lane wrote: >> * I chose to report the same message for immediate shutdown as we >> already use for SIGTERM (fast shutdown or pg_terminate_backend()). >> Should it be different, and if so what? [ per upthread, I did already change the SIGQUIT message to specify "immediate shutdown" ] > To do better I think we'd have to distinguish the different cases? An > error message like > "terminating connection due to {fast shutdown,immediate shutdown,connection > termination} administrator command" > or such could be helpful, but I don't think your patch adds *quite* > enough state? Well, if you want to distinguish different causes for SIGTERM then you'd need additional mechanism for that. I think we'd have to have a per-child termination-reason field, since SIGTERM might be sent to just an individual backend rather than the whole flotilla at once. I didn't think it was quite worth the trouble --- "administrator command" seems close enough for both fast shutdown and pg_terminate_backend() --- but you could certainly argue differently. I suppose a compromise position could be to let the postmaster export its "Shutdown" state variable, and then let backends assume that SIGTERM means fast shutdown or pg_terminate_backend depending on the state of that one global variable. But it'd be a bit imprecise so I don't really feel it's more useful than what we have. > I'd like to not log all these repeated messages into the server > log. It's quite annoying to have to digg through thousands of lines of > repeated "terminating connection..." Hm. That's an orthogonal issue, but certainly worth considering. There are a couple of levels we could consider: 1. Just make the logged messages less verbose (they certainly don't need the DETAIL and HINT lines). 2. Suppress the log entries altogether. I would have been against #2 before this patch, because it'd mean that a rogue SIGQUIT leaves no clear trace in the log. But with this patch, we can be fairly sure that we know whether SIGQUIT came from the postmaster, and then it might be all right to suppress the log entry altogether when it did. I'd be happy to write up a patch for either of these, but let's decide what we want first. regards, tom lane
Re: [HACKERS] [PATCH] Generic type subscripting
Dmitry Dolgov <9erthali...@gmail.com> writes: >> On Tue, Dec 22, 2020 at 02:21:22PM -0500, Tom Lane wrote: >> We do have precedent for this, it's the rules about resolving argument >> types for overloaded functions. But the conclusion that that precedent >> leads to is that we should check whether the subscript expression can >> be *implicitly* coerced to either integer or text, and fail if neither >> coercion or both coercions succeed. > I'm not sure I completely follow and can't immediately find the relevant > code for overloaded functions, so I need to do a perception check. > Following this design in jsonb_subscripting_transform we try to coerce > the subscription expression to both integer and text (and maybe even to > jsonpath), and based on the result of which coercion has succeeded chose > different logic to handle it, right? Right, with the important proviso that the coercion strength is COERCION_IMPLICIT not COERCION_ASSIGNMENT. > And just for me to understand. In the above example of the overloaded > function, with the integer we can coerce it only to text (since original > type of the expression is integer), and with the bigint it could be > coerced both to integer and text, that's why failure, isn't? No, there's no such IMPLICIT-level casts. Coercing bigint down to int is only allowed at ASSIGNMENT or higher coercion strength. In a case like jsonpath['...'], the initially UNKNOWN-type literal could in theory be coerced to any of these types, so you'd have to resolve that case manually. The overloaded-function code has an internal preference that makes it choose TEXT if it has a choice of TEXT or some other target type for an UNKNOWN input (cf parse_func.c starting about line 1150), but if you ask can_coerce_type() it's going to say TRUE for all three cases. Roughly speaking, then, I think what you want to do is 1. If input type is UNKNOWNOID, choose result type TEXT. 2. Otherwise, apply can_coerce_type() to see if the input type can be coerced to int4, text, or jsonpath. If it succeeds for none or more than one of these, throw error. Otherwise choose the single successful type. 3. Apply coerce_type() to coerce to the chosen result type. 4. At runtime, examine exprType() of the input to figure out what to do. regards, tom lane
Re: pglz compression performance, take two
Tomas Vondra writes: > On 12/26/20 8:06 AM, Andrey Borodin wrote: >> I'm still in doubt should I register this patch on CF or not. I'm willing to >> work on this, but it's not clear will it hit PGv14. And I hope for PGv15 we >> will have lz4 or something better for WAL compression. > I'd suggest registering it, otherwise people are much less likely to > give you feedback. I don't see why it couldn't land in PG14. Even if lz4 or something else shows up, the existing code will remain important for TOAST purposes. It would be years before we lose interest in it. regards, tom lane
Re: Rethinking plpgsql's assignment implementation
Hi I repeated tests. I wrote a set of simple functions. It is a synthetical test, but I think it can identify potential problems well. I calculated the average of 3 cycles and I checked the performance of each function. I didn't find any problem. The total execution time is well too. Patched code is about 11% faster than master (14sec x 15.8sec). So there is new important functionality with nice performance benefits. make check-world passed Regards Pavel plpgsql-perftest.sql Description: application/sql
Re: pg_stat_statements and "IN" conditions
Hi, A few comments. + "After this number of duplicating constants start to merge them.", duplicating -> duplicate + foreach(lc, (List *) expr) + { + Node * subExpr = (Node *) lfirst(lc); + + if (!IsA(subExpr, Const)) + { + allConst = false; + break; + } + } It seems the above foreach loop (within foreach(temp, (List *) node)) can be preceded with a check that allConst is true. Otherwise the loop can be skipped. + if (currentExprIdx == pgss_merge_threshold - 1) + { + JumbleExpr(jstate, expr); + + /* +* A const expr is already found, so JumbleExpr must +* record it. Mark it as merged, it will be the first +* merged but still present in the statement query. +*/ + Assert(jstate->clocations_count > 0); + jstate->clocations[jstate->clocations_count - 1].merged = true; + currentExprIdx++; + } The above snippet occurs a few times. Maybe extract into a helper method. Cheers On Sat, Dec 26, 2020 at 2:45 AM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > On Wed, Nov 18, 2020 at 05:04:32PM +0100, Dmitry Dolgov wrote: > > > On Wed, Aug 12, 2020 at 06:19:02PM +0200, Dmitry Dolgov wrote: > > > > > > I would like to start another thread to follow up on [1], mostly to > bump up the > > > topic. Just to remind, it's about how pg_stat_statements jumbling > ArrayExpr in > > > queries like: > > > > > > SELECT something FROM table WHERE col IN (1, 2, 3, ...) > > > > > > The current implementation produces different jumble hash for every > different > > > number of arguments for essentially the same query. Unfortunately a > lot of ORMs > > > like to generate these types of queries, which in turn leads to > > > pg_stat_statements pollution. Ideally we want to prevent this and have > only one > > > record for such a query. > > > > > > As the result of [1] I've identified two highlighted approaches to > improve this > > > situation: > > > > > > * Reduce the generated ArrayExpr to an array Const immediately, in > cases where > > > all the inputs are Consts. > > > > > > * Make repeating Const to contribute nothing to the resulting hash. > > > > > > I've tried to prototype both approaches to find out pros/cons and be > more > > > specific. Attached patches could not be considered a completed piece > of work, > > > but they seem to work, mostly pass the tests and demonstrate the > point. I would > > > like to get some high level input about them and ideally make it clear > what is > > > the preferred solution to continue with. > > > > I've implemented the second approach mentioned above, this version was > > tested on our test clusters for some time without visible issues. Will > > create a CF item and would appreciate any feedback. > > After more testing I found couple of things that could be improved, > namely in the presence of non-reducible constants one part of the query > was not copied into the normalized version, and this approach also could > be extended for Params. These are incorporated in the attached patch. >
Re: Parallel Inserts in CREATE TABLE AS
On Thu, Dec 24, 2020 at 1:07 PM Bharath Rupireddy wrote: > > On Thu, Dec 24, 2020 at 10:25 AM vignesh C wrote: > > You could change intoclause_len = strlen(intoclausestr) to > > strlen(intoclausestr) + 1 and use intoclause_len in the remaining > > places. We can avoid the +1 in the other places. > > + /* Estimate space for into clause for CTAS. */ > > + if (IS_CTAS(intoclause) && OidIsValid(objectid)) > > + { > > + intoclausestr = nodeToString(intoclause); > > + intoclause_len = strlen(intoclausestr); > > + shm_toc_estimate_chunk(>estimator, intoclause_len + > > 1); > > + shm_toc_estimate_keys(>estimator, 1); > > + } > > Done. > > > Can we use node->nworkers_launched == 0 in place of > > node->need_to_scan_locally, that way the setting and resetting of > > node->need_to_scan_locally can be removed. Unless need_to_scan_locally > > is needed in any of the functions that gets called. > > + /* Enable leader to insert in case no parallel workers were > > launched. */ > > + if (node->nworkers_launched == 0) > > + node->need_to_scan_locally = true; > > + > > + /* > > +* By now, for parallel workers (if launched any), would have > > started their > > +* work i.e. insertion to target table. In case the leader is > > chosen to > > +* participate for parallel inserts in CTAS, then finish its > > share before > > +* going to wait for the parallel workers to finish. > > +*/ > > + if (node->need_to_scan_locally) > > + { > > need_to_scan_locally is being set in ExecGather() even if > nworkers_launched > 0 it can still be true, so I think we can not > remove need_to_scan_locally in ExecParallelInsertInCTAS. > > Attaching v15 patch set for further review. Note that the change is > only in 0001 patch, other patches remain unchanged from v14. > +-- parallel inserts must occur +select explain_pictas( +'create table parallel_write as select length(stringu1) from tenk1;'); +select count(*) from parallel_write; +drop table parallel_write; We can change comment "parallel inserts must occur" like "parallel insert must be selected for CTAS on normal table" +-- parallel inserts must occur +select explain_pictas( +'create unlogged table parallel_write as select length(stringu1) from tenk1;'); +select count(*) from parallel_write; +drop table parallel_write; We can change comment "parallel inserts must occur" like "parallel insert must be selected for CTAS on unlogged table" Similar comment need to be handled in other places also. +create function explain_pictas(text) returns setof text +language plpgsql as +$$ +declare +ln text; +begin +for ln in +execute format('explain (analyze, costs off, summary off, timing off) %s', +$1) +loop +ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N'); +ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N'); +ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N'); +return next ln; +end loop; +end; +$$; The above function is same as function present in partition_prune.sql: create function explain_parallel_append(text) returns setof text language plpgsql as $$ declare ln text; begin for ln in execute format('explain (analyze, costs off, summary off, timing off) %s', $1) loop ln := regexp_replace(ln, 'Workers Launched: \d+', 'Workers Launched: N'); ln := regexp_replace(ln, 'actual rows=\d+ loops=\d+', 'actual rows=N loops=N'); ln := regexp_replace(ln, 'Rows Removed by Filter: \d+', 'Rows Removed by Filter: N'); return next ln; end loop; end; $$; If possible try to make a common function for both and use. + if (intoclausestr && OidIsValid(objectid)) + fpes->objectid = objectid; + else + fpes->objectid = InvalidOid; Here OidIsValid(objectid) check is not required intoclausestr will be set only if OidIsValid. Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
Re: Parallel copy
On Wed, Dec 23, 2020 at 3:05 PM Hou, Zhijie wrote: > > Hi > > > Yes this optimization can be done, I will handle this in the next patch > > set. > > > > I have a suggestion for the parallel safety-check. > > As designed, The leader does not participate in the insertion of data. > If User use (PARALLEL 1), there is only one worker process which will do the > insertion. > > IMO, we can skip some of the safety-check in this case, becase the > safety-check is to limit parallel insert. > (except temporary table or ...) > > So, how about checking (PARALLEL 1) separately ? > Although it looks a bit complicated, But (PARALLEL 1) do have a good > performance improvement. > Thanks for the comments Hou Zhijie, I will run a few tests with 1 worker and try to include this in the next patch set. Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
Re: Added missing copy related data structures to typedefs.list
On Thu, Dec 17, 2020 at 4:28 AM Bruce Momjian wrote: > > On Mon, Dec 7, 2020 at 01:56:50PM +0530, vignesh C wrote: > > Hi, > > > > Added missing copy related data structures to typedefs.list, these > > data structures were added while copy files were split during the > > recent commit. I found this while running pgindent for parallel copy > > patches. > > The Attached patch has the changes for the same. > > Thoughts? > > Uh, we usually only update the typedefs file before we run pgindent on > the master branch. > Ok, Thanks for the clarification. I was not sure as in few of the enhancements it was included as part of the patches. Regards, Vignesh EnterpriseDB: http://www.enterprisedb.com
Re: pglz compression performance, take two
On 12/26/20 8:06 AM, Andrey Borodin wrote: 12 дек. 2020 г., в 22:47, Andrey Borodin написал(а): I've cleaned up comments, checked that memory alignment stuff actually make sense for 32-bit ARM (according to Godbolt) and did some more code cleanup. PFA v2 patch. I'm still in doubt should I register this patch on CF or not. I'm willing to work on this, but it's not clear will it hit PGv14. And I hope for PGv15 we will have lz4 or something better for WAL compression. I'd suggest registering it, otherwise people are much less likely to give you feedback. I don't see why it couldn't land in PG14. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pgsql: Add key management system
On Sat, Dec 26, 2020 at 02:38:40PM +, Shinoda, Noriyoshi (PN Japan FSIP) wrote: > Hi, > > Thank you for developing a great feature. I tested it immediately. > The attached patch changes the value of the --file-encryption-keylen option > of the initdb command to mandatory. No value is currently required. > I also changed the help message and the manual. Thank you, applied. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
RE: pgsql: Add key management system
Hi, Thank you for developing a great feature. I tested it immediately. The attached patch changes the value of the --file-encryption-keylen option of the initdb command to mandatory. No value is currently required. I also changed the help message and the manual. Regards, Noriyoshi Shinoda -Original Message- From: Bruce Momjian [mailto:br...@momjian.us] Sent: Saturday, December 26, 2020 4:01 AM To: Erik Rijkers Cc: pgsql-hackers@lists.postgresql.org Subject: Re: pgsql: Add key management system On Fri, Dec 25, 2020 at 07:30:01PM +0100, Erik Rijkers wrote: > On 2020-12-25 16:19, Bruce Momjian wrote: > > > Add key management system > > doc/src/sgml/database-encryption.sgml | 97 + > > Attached are a few typos. > > I also noticed that this option does not occur in the initdb --help: > > -u --copy-encryption-keys > > Was that deliberate? No. :-( Attached patch applied. Thanks. -- Bruce Momjian https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee keylength.diff Description: keylength.diff
Re: pg_stat_statements and "IN" conditions
> On Wed, Nov 18, 2020 at 05:04:32PM +0100, Dmitry Dolgov wrote: > > On Wed, Aug 12, 2020 at 06:19:02PM +0200, Dmitry Dolgov wrote: > > > > I would like to start another thread to follow up on [1], mostly to bump up > > the > > topic. Just to remind, it's about how pg_stat_statements jumbling ArrayExpr > > in > > queries like: > > > > SELECT something FROM table WHERE col IN (1, 2, 3, ...) > > > > The current implementation produces different jumble hash for every > > different > > number of arguments for essentially the same query. Unfortunately a lot of > > ORMs > > like to generate these types of queries, which in turn leads to > > pg_stat_statements pollution. Ideally we want to prevent this and have only > > one > > record for such a query. > > > > As the result of [1] I've identified two highlighted approaches to improve > > this > > situation: > > > > * Reduce the generated ArrayExpr to an array Const immediately, in cases > > where > > all the inputs are Consts. > > > > * Make repeating Const to contribute nothing to the resulting hash. > > > > I've tried to prototype both approaches to find out pros/cons and be more > > specific. Attached patches could not be considered a completed piece of > > work, > > but they seem to work, mostly pass the tests and demonstrate the point. I > > would > > like to get some high level input about them and ideally make it clear what > > is > > the preferred solution to continue with. > > I've implemented the second approach mentioned above, this version was > tested on our test clusters for some time without visible issues. Will > create a CF item and would appreciate any feedback. After more testing I found couple of things that could be improved, namely in the presence of non-reducible constants one part of the query was not copied into the normalized version, and this approach also could be extended for Params. These are incorporated in the attached patch. >From a93824799eda63391989e8845393f0b773508e18 Mon Sep 17 00:00:00 2001 From: Dmitrii Dolgov <9erthali...@gmail.com> Date: Tue, 17 Nov 2020 16:18:08 +0100 Subject: [PATCH v2] Prevent jumbling of every element in ArrayExpr pg_stat_statements produces multiple entries for queries like SELECT something FROM table WHERE col IN (1, 2, 3, ...) depending on number of parameters, because every element of ArrayExpr is jumbled. Make Consts contribute nothing to the jumble hash if they're part of a series and at position further that specified threshold. Do the same for similar queries with VALUES as well. --- .../expected/pg_stat_statements.out | 657 +- .../pg_stat_statements/pg_stat_statements.c | 262 ++- .../sql/pg_stat_statements.sql| 129 3 files changed, 1034 insertions(+), 14 deletions(-) diff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out index 2a303a7f07..6978e37ca7 100644 --- a/contrib/pg_stat_statements/expected/pg_stat_statements.out +++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out @@ -205,7 +205,7 @@ SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C"; INSERT INTO test VALUES(generate_series($1, $2), $3) | 1 | 10 SELECT * FROM test ORDER BY a| 1 | 12 SELECT * FROM test WHERE a > $1 ORDER BY a | 2 |4 - SELECT * FROM test WHERE a IN ($1, $2, $3, $4, $5) | 1 |8 + SELECT * FROM test WHERE a IN ($1, $2, $3, $4, ...) | 1 |8 SELECT pg_stat_statements_reset()| 1 |1 SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C" | 0 |0 UPDATE test SET b = $1 WHERE a = $2 | 6 |6 @@ -861,4 +861,659 @@ SELECT query, plans, calls, rows FROM pg_stat_statements ORDER BY query COLLATE SELECT query, plans, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C" | 1 | 0 |0 (6 rows) +-- +-- Consts merging +-- +SET pg_stat_statements.merge_threshold = 5; +CREATE TABLE test_merge (id int, data int); +-- IN queries +-- Normal +SELECT pg_stat_statements_reset(); + pg_stat_statements_reset +-- + +(1 row) + +SELECT * FROM test_merge WHERE id IN (1, 2, 3); + id | data ++-- +(0 rows) + +SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C"; + query | calls ++--- + SELECT * FROM test_merge WHERE id IN ($1, $2, $3) | 1 + SELECT pg_stat_statements_reset() | 1 + SELECT query, calls FROM pg_stat_statements ORDER BY query COLLATE "C" | 0