Re: bgwrite process is too lazy
Hi Andres Freund Thank you for explanation > I doubt that slowdown is caused by bgwriter not being active enough. I suspect > what you're seeing is one or more of: > a) The overhead of doing full page writes (due to increasing the WAL > volume). You could verify whether that's the case by turning > full_page_writes off (but note that that's not generally safe!) or see if > the overhead shrinks if you set wal_compression=zstd or wal_compression=lz4 > (don't use pglz, it's too slow). > b) The overhead of renaming WAL segments during recycling. You could see if > this is related by specifying --wal-segsize 512 or such during initdb. I am aware of these optimizations, and these optimizations only mitigate the impact, I didn't turn on wal log compression on purpose during stress test , shared_buffers = '32GB' bgwriter_delay = '10ms' bgwriter_lru_maxpages = '8192' bgwriter_lru_multiplier = '10.0' wal_buffers = '64MB' checkpoint_completion_target = '0.999' checkpoint_timeout = '600' max_wal_size = '32GB' min_wal_size = '16GB' I think in business scenarios where there are many reads and few writes, it is indeed desirable to keep as many dirty pages in memory as possible,However, in scenarios such as push systems and task scheduling systems, which also have a lot of reads and writes, the impact of checkpoints will be more obvious,Adaptive bgwrite or bgwrite triggered when a dirty page reaches a certain watermark eliminates the impact of checkpoints on performance jitter.From what I understand, quite a few commercial databases residing in postgresql have added the adaptive refresh dirty page feature, and from their internal reports, the whole stress testing process was very smooth! Since it's a trade secret, I don't know how they implemented this feature.
Re: Psql meta-command conninfo+
Hi David, Thank you for your feedback. On Fri, Oct 4, 2024 at 11:56 AM David G. Johnston < david.g.johns...@gmail.com> wrote: > It seems to me a more useful definition for what this command should print > out is basically the entire contents of: > > https://www.postgresql.org/docs/current/libpq-status.html > > That page has three sections: > Connection Invariants > Current Status > Encryption (TLS) > > I would suggest that we thus produce three tables - one for each. In the > case of SSL, a message saying “not used” instead of a table full of blanks > probably suffices, though I’d lean to print all of what is available at all > times. > We can try this approach. I would also like to have other's opinions on this approach. Most functions are already used, while some are not required (IMO). I have listed all the functions from the doc link you provided, along with my brief comments based on the latest patch (v35). PQdb - already used PQuser - already used PQpass - no need PQhost - already used PQhostaddr - already used PQport - already used PQtty - no need PQoptions - can be used PQstatus - no need PQtransactionStatus - can be used PQparameterStatus - already used PQprotocolVersion - already used PQserverVersion - no need PQerrorMessage - no need PQsocket - no need PQbackendPID - already used PQconnectionNeedsPassword - no need PQconnectionUsedPassword - can be used PQconnectionUsedGSSAPI - already used PQsslInUse - already used PQsslAttribute - only key_bits attribute not used PQsslAttributeNames - no need PQsslStruct - no need PQgetssl - no need For PQparameterStatus, some parameters are already used. server_version and application_name were already discussed and removed in v12 and v29 respectively. Do we need other parameters? > Within that framework having \conninfo[+[CSE][…]] be the command - > printing out only the table specified would be the behavior (specifying no > suffix letters prints all three) - would be an option. > 3 separate tables without suffix? If others are okay with this, I can work on this approach and will provide a patch before the next CF. Regards, Hunaid Sohail
Re: Add parallel columns for seq scan and index scan on pg_stat_all_tables and _indexes
Le lun. 7 oct. 2024 à 02:41, Michael Paquier a écrit : > On Mon, Oct 07, 2024 at 12:43:18AM +0300, Alena Rybakina wrote: > > Maybe I'm not aware of the whole context of the thread and maybe my > > questions will seem a bit stupid, but honestly > > it's not entirely clear to me how this statistics will help to adjust the > > number of parallel workers. > > We may have situations when during overestimation of the cardinality > during > > query optimization a several number of parallel workers were > unjustifiably > > generated and vice versa - > > due to a high workload only a few number of workers were generated. > > How do we identify such cases so as not to increase or decrease the > number > > of parallel workers when it is not necessary? > > Well. For spiky workloads, only these numbers are not going to help. > If you can map them with the number of times a query related to these > tables has been called, something that pg_stat_statements would be > able to show more information about. > > FWIW, I have doubts that these numbers attached to this portion of the > system are always useful. For OLTP workloads, parallel workers would > unlikely be spawned because even with JOINs we won't work with a high > number of tuples that require them. This could be interested with > analytics, however complex query sequences mean that we'd still need > to look at all the plans involving the relations where there is an > unbalance of planned/spawned workers, because these can usually > involve quite a few gather nodes. At the end of the day, it seems to > me that we would still need data that involves statements to track > down specific plans that are starving. If your application does not > have that many statements, looking at individial plans is OK, but if > you have hundreds of them to dig into, this is time-consuming and > stats at table/index level don't offer data in terms of stuff that > stands out and needs adjustments. > > And this is without the argument of bloating more the stats entries > for each table, even if it matters less now that these stats are in > shmem lately. > We need granularity because we have granularity in the config. There is pg_stat_database because it gives the whole picture and it is easier to monitor. And then, there is pg_stat_statements to analyze problematic statements. And finally there is pg_stat_all* because you can set parallel_workers on a specific table. Anyway, offering various ways of getting the same information is not unheard of. Pretty much like temp_files/temp_bytes in pg_stat_database, temp_blks_read/temp_blks_written in pg_stat_statements and log_temp_files in log files if you ask me :) -- Guillaume.
Re: Should rolpassword be toastable?
On Fri, Oct 4, 2024 at 4:48 PM Nathan Bossart wrote: > .. > Since BLCKSZ can be as low as 1024, I think 512 would be a good choice. > Where did you get the minimal value of 1024 from ? I vaguely remember someone testing with 256 at some point in the past --- Hannu
System views for versions reporting
Hi, Based on the feedback in [1], here is my attempt at implementing system views for versions reporting. It adds pg_system_versions for showing things like core version, compiler, LLVM, etc, and pg_system_libraries for showing linked shared objects. I think everyone has ageed that the first was a good idea, where the second was only suggested -- after some thinking I find shared obects useful enough to include here as well. The main idea is to facilitate bug reporting. In particular, in many JIT related bug reports one of the first questions is often "which LLVM version is used?". Gathering such information is a manual process, mistakes could be made when veryfing llvm-config output or anywhere else. Having a system view for such purposes makes the process a bit more robust. The first three patches are essential for this purpose, the fourth one is somewhat independent and could be concidered in isolation. The output looks like this : =# select * from pg_system_versions; name | version| type --+--+-- Arch | x86_64-linux | Compile Time ICU | 15.1 | Compile Time Core | 18devel | Compile Time Compiler | gcc-14.0.1 | Compile Time LLVM | 18.1.6 | Run Time =# select * from pg_system_libraries; name - /lib64/libkrb5.so.3 /lib64/libz.so.1 linux-vdso.so.1 /lib64/libxml2.so.2 [...] Any feedback is appreciated. 0001-Add-infrastructure-for-pg_system_versions-view Prepares the infrastructure, adds the view without populating it with the data just yet. The view is backed by a hash table, which contains callbacks returning a version string for a particular component. The idea is to allow some flexibility in reporting, making components responsible for how and when the information is exposed. E.g. it allows to say that the version is not available. 0002-Add-core-versions-to-pg_system_versions.patch Actually populates the view with some predefined compile time versions. The versions are registered in PostgresMain, right after BeginReportingGUCOptions -- there is no strong reasoning for that except that it looks fine to me, so feel free to suggest a better place. 0003-Add-JIT-provider-version-to-pg_system_versions.patch Finally adds LLVM version into the view via extending set of JIT provider callbacks. The registration is happening together with the core versions from the previous patch, because the JIT provider itself is initialized only when a first expression is getting compiled. 0004-Add-pg_system_libraries-view.patch Strictly speaking independent from the above patches. Adds the view to show linked shared objects by iterating dl_phdr_info with dl_iterate_phdr. It belongs to the standard C library, so I assume it's portable enough. [1]: https://www.postgresql.org/message-id/flat/znc72ymyoelvk5rjk5ub254v3qvcczfrk6autygjdobfvx2e7p%40s3dssvf34twa >From d903142d178dd8a9c03d07ee4809ac582b9b7818 Mon Sep 17 00:00:00 2001 From: Dmitrii Dolgov <9erthali...@gmail.com> Date: Sat, 5 Oct 2024 18:27:57 +0200 Subject: [PATCH v1 1/4] Add infrastructure for pg_system_versions view Introduce a unified way of reporting versions (PostgreSQL itself, the compiler, the host system, compile and runtime dependencies, etc.) via a new system view pg_system_versions. This is going to be useful for troubleshooting and should enhance bug reports, replacing manual bug-prone collecting of the same information. The view is backed by a hash table, that contains callbacks returning version string for a particular component. The idea is to allow some flexibility in reporting, making components responsible for how and when the information is exposed. --- doc/src/sgml/system-views.sgml | 56 src/backend/catalog/system_views.sql| 8 ++ src/backend/utils/misc/Makefile | 3 +- src/backend/utils/misc/meson.build | 1 + src/backend/utils/misc/system_version.c | 108 src/include/catalog/pg_proc.dat | 6 ++ src/include/utils/system_version.h | 40 + src/test/regress/expected/rules.out | 8 ++ 8 files changed, 229 insertions(+), 1 deletion(-) create mode 100644 src/backend/utils/misc/system_version.c create mode 100644 src/include/utils/system_version.h diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml index 634a4c0..df0b9d3 100644 --- a/doc/src/sgml/system-views.sgml +++ b/doc/src/sgml/system-views.sgml @@ -226,6 +226,11 @@ wait events + + pg_system_versions + system versions + + @@ -5064,4 +5069,55 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx + + pg_system_versions + + + pg_system_versions + + + + The view pg_system_versions provides description + about versions of various
Re: System views for versions reporting
On 10/6/24 11:36, Dmitry Dolgov wrote: Hi, Based on the feedback in [1], here is my attempt at implementing system views for versions reporting. It adds pg_system_versions for showing things like core version, compiler, LLVM, etc, and pg_system_libraries for showing linked shared objects. I think everyone has ageed that the first was a good idea, where the second was only suggested -- after some thinking I find shared obects useful enough to include here as well. The main idea is to facilitate bug reporting. In particular, in many JIT related bug reports one of the first questions is often "which LLVM version is used?". Gathering such information is a manual process, mistakes could be made when veryfing llvm-config output or anywhere else. Having a system view for such purposes makes the process a bit more robust. The first three patches are essential for this purpose, the fourth one is somewhat independent and could be concidered in isolation. The output looks like this : =# select * from pg_system_versions; name | version| type --+--+-- Arch | x86_64-linux | Compile Time ICU | 15.1 | Compile Time Core | 18devel | Compile Time Compiler | gcc-14.0.1 | Compile Time LLVM | 18.1.6 | Run Time I'm not sure why ICU is "Compile Time" rather than "Run Time" when it is not statically linked. Also, if we are going to include ICU here, shouldn't we also include libc version? =# select * from pg_system_libraries; name - /lib64/libkrb5.so.3 /lib64/libz.so.1 linux-vdso.so.1 /lib64/libxml2.so.2 [...] Any feedback is appreciated. I think it would be nice to include a sha256 hash (or something similar) of the libraries as well, so that they can be checked against known good values. -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Add minimal C example and SQL registration example for custom table access methods.
On Fri, May 24, 2024 at 03:59:08PM -0300, Fabrízio de Royes Mello wrote: > Nice... LGTM! I have noticed that this was still in the CF. After fixing a couple of inconsistencies in the markups and the names, trimming down the list of headers to avoid rot and adding a static in from of the const, the result looked pretty much OK, so applied. -- Michael signature.asc Description: PGP signature
RE: Make default subscription streaming option as Parallel
Dear Vignesh, > One key point to consider is that the lock on transaction objects will > be held for a longer duration when using streaming in parallel. This > occurs because the parallel apply worker initiates the transaction as > soon as streaming begins, maintaining the lock until the transaction > is fully completed. As a result, for long-running transactions, this > extended lock can hinder concurrent access that requires a lock. So, the current default is the most conservative and can run anywhere; your proposal is a bit optimistic, right? Since long transactions should be avoided in any case, and everyone knows it, I agree with your point. One comment for your patch; Shouldn't we add a streaming=off case for pg_dump test? You added lines but no one passes it. Best regards, Hayato Kuroda FUJITSU LIMITED
Re: [PROPOSAL] : Disallow use of empty column name in (column_name '') in ALTER or CREATE of foreign table.
On Thu, Aug 22, 2024 at 04:00:13PM +0530, Nishant Sharma wrote: > I may be wrong, but just wanted to share my thoughts on the differences. > So, it > can be considered a different issue/mistake and can be handled separately in > another email thread. +else if (strcmp(def->defname, "column_name") == 0) +{ +char *col_name_opt = defGetString(def); + +/* + * PostgresSQL follows SQL syntax, so we do not allow empty + * column_name option. + */ +if (col_name_opt && col_name_opt[0] == '\0') +ereport(ERROR, +(errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("colum_name option cannot be empty for postgres_fdw"))); +} If we begin to care about empty names in column_name in the FDW command, shouldn't we also care about empry values in schema_name and table_name? Typos: PostgresSQL -> PostgreSQL and colum_name -> column_name. Once you associate table_name and schema_name, you can save in translation by rewording the errmsg like that (no need to mention postgres_fdw, note the quotes around the option name): errmsg("cannot use empty value for option \"%s\"", "column_name"); -- Michael signature.asc Description: PGP signature
Re: Set query_id for query contained in utility statement
On Fri, Oct 04, 2024 at 08:16:00PM +0800, jian he wrote: > about v5 0001 > select_with_parens: > '(' select_no_parens ')'{ $$ = $2; } > | '(' select_with_parens ')'{ $$ = $2; } > ; > > > toplevel | calls | query > --+---+--- > t| 1 | SELECT pg_stat_statements_reset() IS NOT NULL AS t > t| 0 | SELECT toplevel, calls, query FROM pg_stat_statements+ > | | ORDER BY query COLLATE "C", toplevel > t| 1 | explain (select $1) > f| 1 | select $1); > > query "select $1);" looks weird. not sure how to improve it, > or this should be the expected behavior? GOod point, this is confusing. The point is that having only stmt_location is not enough to detect where the element in the query you want to track is because it only points at its start location in the full query string. In an ideal world, what we should have is its start and end, pass it down to pgss_store(), and store only this subquery between the start and end positions in the stats entry. Making that right through the parser may be challenging, though. This concept is something that's perhaps larger than this thread? I think that we want the same kind of thing for values in IN() and ANY() clauses, where we want to track an area for a single normalization parameter, perhaps with a separate node_attr. I am not sure if using the same trackers would make sense, so I am just waving hands a bit here, but the concepts required are quite close. Saying that, a patch set implemented this way would ensure a strict 1:1 mapping between a query ID and the internal query in these EXPLAIN and CREATE commands, which would be good. The first step should be IMO to expand the tests of pgss and track all the behaviors we have historically in the tree about all that. Then, it becomes much easier to track how much we want to tweak them depending on if pgss.track is set to "top" or "all", and easier to see how a behavior changes when manipulating the parse node structures with location data. -- Michael signature.asc Description: PGP signature
Make default subscription streaming option as Parallel
Hi, By default, currently streaming of in-progress transactions for subscriptions is disabled. All transactions are fully decoded on the publisher before being sent to the subscriber. This approach can lead to increased latency and reduced performance, particularly under heavy load. By default, we could enable the parallel streaming option for subscriptions. By doing this, incoming changes will be directly applied by one of the available parallel apply workers. This method significantly improves the performance of commit operations. I conducted a series of tests using logical replication, comparing SQL execution times with streaming set to both parallel and off. The tests varied the logical_decoding_work_mem setting and included the following scenarios: a) Insert, b) Delete, c) Update, d) rollback 5% records, e) rollback 10% records, f) rollback 20% records, g) rollback 50% records. I have written tap tests for the same, the attached files can be copied to src/test/subscription/t and the logical_decoding_work_mem configuration and streaming option in create subscription command should be changed accordingly before running the tests. The tests were executed 5 times and the average of them was taken. The execution time is in seconds. Insert 5kk records Logical Decoding mem | Parallel | off| % Improvement ---|-|---| 64 KB | 37.304 | 69.465 | 46.298 256 KB | 36.327 | 70.671 | 48.597 64 MB | 41.173 | 69.228 | 40.526 Delete 5kk records Logical Decoding mem | Parallel | off| % Improvement ---|-|---| 64 KB | 42.322 | 69.404 | 39.021 256 KB | 43.250 | 66.973 | 35.422 64 MB | 44.183 | 67.873 | 34.903 Update 5kk records Logical Decoding mem | Parallel | off| % Improvement ---|-|---| 64 KB | 93.953| 127.691| 26.422 256 KB | 94.166| 128.541| 26.743 64 MB | 93.367| 134.275| 30.465 Rollback 05% records Logical Decoding mem | Parallel | off| % Improvement ---|-|---| 64 KB | 36.968| 67.161 | 44.957 256 KB | 38.059| 68.021 | 44.047 64 MB | 39.431| 66.878 | 41.041 Rollback 10% records Logical Decoding mem | Parallel | off| % Improvement ---|-|---| 64 KB | 35.966| 63.968 | 43.775 256 KB | 36.597| 64.836 | 43.554 64 MB | 39.069| 64.357 | 39.292 Rollback 20% records Logical Decoding mem | Parallel | off| % Improvement ---|-|---| 64 KB | 37.616| 58.903 | 36.139 256 KB | 37.330| 58.606 | 36.303 64 MB | 38.720| 60.236 | 35.720 Rollback 50% records Logical Decoding mem | Parallel | off| % Improvement ---|-|---| 64 KB | 38.999| 44.776 | 12.902 256 KB | 36.567| 44.530 | 17.882 64 MB | 38.592 | 45.346 | 14.893 The machine configuration that was used is also attached. The tests demonstrate a significant performance improvement when using the parallel streaming option, insert shows 40-48 %improvement, delete shows 34-39 %improvement, update shows 26-30 %improvement. In the case of rollback the improvement is between 12-44%, the improvement slightly reduces with larger amounts of data being rolled back in this case. If there's a significant amount of data to roll back, the performance of streaming in parallel may be comparable to or slightly lower in some instances. However, this is acceptable since commit operations are generally more frequent than rollback operations. One key point to consider is that the lock on transaction objects will be held for a longer duration when using streaming in parallel. This occurs because the parallel apply worker initiates the transaction as soon as streaming begins, maintaining the lock until the transaction is fully completed. As a result, for long-running transactions, this extended lock c
Re: GUC names in messages
On Tue, Sep 10, 2024 at 05:11:13PM +1000, Peter Smith wrote: > I have rebased the two remaining patches. See v12 attached. I've looked over the patch set again, and applied 0002. 0001 could be more ambitious and more consistent, like: - The GUC name coming from the table's record is only used for PGC_ENUM, let's use conf->gen.name across the board so as this counts for custom GUCs. - Once we do the first bullet point, parse_and_validate_value() can be simplified as it does not need its "name" argument anymore. - Once the second bullet point is simplified, going one way up reveals more in set_config_with_handle(). I was wondering about pg_parameter_aclcheck() for a bit until I've noticed convert_GUC_name_for_parameter_acl() that applies a conversion with the contents of the catalogs when checking for a parameter ACL, similar to guc_name_compare(). One limitation is in AlterSystemSetConfigFile() where the parameter name comes from the command and not a GUC record as the ACL check happens before the GUC table lookup, but we could live with that. At the end, I get the attached revised 0001. WDYT? -- Michael From 3b3f4a241a2d127b421a968f5fb19d555e55b91c Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Mon, 7 Oct 2024 13:17:01 +0900 Subject: [PATCH v13] Apply GUC name from central table in more places of guc.c --- src/backend/utils/misc/guc.c | 65 +++ src/test/regress/expected/guc.out | 4 ++ src/test/regress/sql/guc.sql | 3 ++ 3 files changed, 39 insertions(+), 33 deletions(-) diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 13527fc258..507a5d329a 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -3115,7 +3115,6 @@ config_enum_get_options(struct config_enum *record, const char *prefix, * and also calls any check hook the parameter may have. * * record: GUC variable's info record - * name: variable name (should match the record of course) * value: proposed value, as a string * source: identifies source of value (check hooks may need this) * elevel: level to log any error reports at @@ -3127,7 +3126,7 @@ config_enum_get_options(struct config_enum *record, const char *prefix, */ static bool parse_and_validate_value(struct config_generic *record, - const char *name, const char *value, + const char *value, GucSource source, int elevel, union config_var_val *newval, void **newextra) { @@ -3142,7 +3141,7 @@ parse_and_validate_value(struct config_generic *record, ereport(elevel, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("parameter \"%s\" requires a Boolean value", - name))); + conf->gen.name))); return false; } @@ -3162,7 +3161,7 @@ parse_and_validate_value(struct config_generic *record, ereport(elevel, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("invalid value for parameter \"%s\": \"%s\"", - name, value), + conf->gen.name, value), hintmsg ? errhint("%s", _(hintmsg)) : 0)); return false; } @@ -3181,7 +3180,7 @@ parse_and_validate_value(struct config_generic *record, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("%d%s%s is outside the valid range for parameter \"%s\" (%d%s%s .. %d%s%s)", newval->intval, unitspace, unit, - name, + conf->gen.name, conf->min, unitspace, unit, conf->max, unitspace, unit))); return false; @@ -3203,7 +3202,7 @@ parse_and_validate_value(struct config_generic *record, ereport(elevel, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("invalid value for parameter \"%s\": \"%s\"", - name, value), + conf->gen.name, value), hintmsg ? errhint("%s", _(hintmsg)) : 0)); return false; } @@ -3222,7 +3221,7 @@ parse_and_validate_value(struct config_generic *record, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("%g%s%s is outside the valid range for parameter \"%s\" (%g%s%s .. %g%s%s)", newval->realval, unitspace, unit, - name, + conf->gen.name, conf->min, unitspace, unit, conf->max, unitspace, unit))); return false; @@ -3278,7 +3277,7 @@ parse_and_validate_value(struct config_generic *record, ereport(elevel, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("invalid value for parameter \"%s\": \"%s\"", - name, value), + conf->gen.name, value), hintmsg ? errhint("%s", _(hintmsg)) : 0)); if (hintmsg) @@ -3460,7 +3459,7 @@ set_config_with_handle(const char *name, config_handle *handle, ereport(elevel, (errcode(ERRCODE_INVALID_TRANSACTION_STATE), errmsg("parameter \"%s\" cannot be set during a parallel operation", - name))); + record->name))); return 0; } @@ -3476,7 +3475,7 @@ set_config_with_handle(const char *name, config_handle *handle,
Re: Add parallel columns for pg_stat_statements
Hi Michael, Le jeu. 3 oct. 2024 à 09:15, Michael Paquier a écrit : > On Thu, Aug 29, 2024 at 10:08:23PM +0200, Guillaume Lelarge wrote: > > This patch was a bit discussed on [1], and with more details on [2]. It's > > based on another patch sent in 2022 (see [3]). It introduces seven new > > columns in pg_stat_statements: > > > > * parallelized_queries_planned, number of times the query has been > planned > > to be parallelized, > > * parallelized_queries_launched, number of times the query has been > > executed with parallelization, > > Comparing the numbers of workers planned and launched with the number > of times a query has been called and planned should provide a rather > good equivalent, no? I am not sure that these two are mandatory to > have. > > I'm not sure I follow. That would mean that every time a query is executed, it always gets the same amount of workers. Which is not guaranteed to be true. I would agree, though, that parallelized_queries_launched is probably not that interesting. I could get rid of it if you think it should go away. > * parallelized_workers_planned, number of parallel workers planned for > > this query, > > * parallelized_workers_launched, number of parallel workers executed for > > this query, > > Yep. Definitely OK with these two. There is an overlap with what > Benoit has sent here when it comes to publish this data to the > executor state: > > https://www.postgresql.org/message-id/783bc7f7-659a-42fa-99dd-ee0565644...@dalibo.com > > Well, I don't see this as an overlap. Rather more information. > > * parallelized_nodes, number of parallelized nodes, > > * parallelized_nodes_all_workers, number of parallelized nodes which had > > all requested workers, > > > > * parallelized_nodes_no_worker, number of parallelized nodes which had > no > > requested workers. > > I can see why you want to register this extra data on a node-basis, > but how does that help when it comes to tune the parallel GUCs? We > cannot control them at node level and the launched/planned ratio > offers an equivalent of that. Not exactly, but that's enough to get a > picture if there is a draught. > > On this, I would agree with you. They are not that particularly useful to get better setting for parallel GUCs. I can drop them if you want. > A test script (test2.sql) is attached. You can execute it with "psql -Xef > > test2.sql your_database" (your_database should not contain a t1 table as > it > > will be dropped and recreated). > > Let's add proper regression tests instead, including > oldextversions.sql as this bumps the version of the module. See for > example the tests of 6fd5071909a2 that can force workers to spawn > for BRIN and btree queries, validating some of the stats published > here. > Did this on the v2 version of the patch (attached here). Thanks for your review. If you want the parallelized_queries_launched column and the parallelized_nodes_* columns dropped, I can do that on a v3 patch. Regards. -- Guillaume. From 8228f8d521eb9853cba6b5d185e4d9965f29d398 Mon Sep 17 00:00:00 2001 From: Guillaume Lelarge Date: Wed, 28 Aug 2024 15:30:05 +0200 Subject: [PATCH v2] Add parallel columns to pg_stat_statements There are seven new columns: * parallelized_queries_planned (number of times the query has been planned to be parallelized), * parallel_ized_querieslaunched (number of times the query has been executed with parallelization), * parallelized_workers_planned (number of parallel workers planned for this query), * parallelized_workers_launched (number of parallel workers executed for this query), * parallelized_nodes (number of parallelized nodes), * parallelized_nodes_all_workers (number of parallelized nodes which had all requested workers), * parallelized_nodes_no_worker (number of parallelized nodes which had no requested workers). These new columns will help to monitor and better configure query parallelization. --- contrib/pg_stat_statements/Makefile | 4 +- .../expected/oldextversions.out | 69 +++ .../pg_stat_statements/expected/parallel.out | 38 ++ .../pg_stat_statements--1.11--1.12.sql| 80 + .../pg_stat_statements/pg_stat_statements.c | 108 -- .../pg_stat_statements.control| 2 +- .../pg_stat_statements/sql/oldextversions.sql | 5 + contrib/pg_stat_statements/sql/parallel.sql | 27 + doc/src/sgml/pgstatstatements.sgml| 63 ++ src/backend/executor/execUtils.c | 7 ++ src/backend/executor/nodeGather.c | 9 +- src/backend/executor/nodeGatherMerge.c| 8 ++ src/include/nodes/execnodes.h | 6 + 13 files changed, 415 insertions(+), 11 deletions(-) create mode 100644 contrib/pg_stat_statements/expected/parallel.out create mode 100644 contrib/pg_stat_statements/pg_stat_statements--1.11--1.12.sql create mode 100644 contrib/pg_stat_statements/sql/parall
Re: Should CSV parsing be stricter about mid-field quotes?
On 2024-10-04 Fr 12:19 PM, Joel Jacobson wrote: On Sun, Jul 2, 2023, at 07:45, Noah Misch wrote: On Sat, May 20, 2023 at 09:16:30AM +0200, Joel Jacobson wrote: On Fri, May 19, 2023, at 18:06, Daniel Verite wrote: COPY FROM file CSV somewhat differs as your example shows, but it still mishandle \. when unquoted. For instance, consider this file to load with COPY t FROM '/tmp/t.csv' WITH CSV $ cat /tmp/t.csv line 1 \. line 3 line 4 It results in having only "line 1" being imported. Hmm, this is a problem for one of the new use-cases I brought up that would be possible with DELIMITER NONE QUOTE NONE, i.e. to import unstructured log files, where each raw line should be imported "as is" into a single text column. Is there a valid reason why \. is needed for COPY FROM filename? No. It seems to me it would only be necessary for the COPY FROM STDIN case, since files have a natural end-of-file and a known file size. Right. Even for COPY FROM STDIN, it's not needed anymore since the removal of protocol v2. psql would still use it to find the end of inline COPY data, though. Here's another relevant thread: https://postgr.es/m/flat/bfcd57e4-8f23-4c3e-a5db-2571d09208e2%40beta.fastmail.com I was very pleased to see commit 7702337: Do not treat \. as an EOF marker in CSV mode for COPY IN. Great job! Thanks to this fix, maybe there is now interest to resume the discussion on the ideas discussed in this thread? Recap of ideas: 1. Stricter parsing, reject mid-field quotes The idea is to prevent balanced mid-field quotes from being silently removed. Example: % cat example.csv id,post 1,Hello there! 2,http://example.com";>Click me! % psql # \copy posts from example.csv with csv header; COPY 2 # SELECT * FROM posts; id | post +-- 1 | Hello there! 2 | http://example.com>Click me! (2 rows) Note how the quotes around the URL disappeared. 2. Avoid needing hacks like using E'\x01' as quoting char. Introduce QUOTE NONE and DELIMITER NONE, to allow raw lines to be imported "as is" into a single text column. As I think I previously indicated, I'm perfectly happy about 2, because it replaces a far from obvious hack, but I am at best dubious about 1. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: Should rolpassword be toastable?
On Sun, Oct 06, 2024 at 11:42:53AM +0200, Hannu Krosing wrote: > On Fri, Oct 4, 2024 at 4:48 PM Nathan Bossart > wrote: >> Since BLCKSZ can be as low as 1024, I think 512 would be a good choice. > > Where did you get the minimal value of 1024 from ? https://www.postgresql.org/docs/devel/install-make.html#CONFIGURE-OPTION-WITH-BLOCKSIZE -- nathan
Re: [BUG FIX]Connection fails with whitespace after keepalives parameter value
On Thu, Oct 03, 2024 at 08:12:28PM -0400, Tom Lane wrote: > OK, if there's no objections let's push both remaining patches > to HEAD only. Done as of f22e84df1dea and 430ce189fc45. -- Michael signature.asc Description: PGP signature
Re: Add parallel columns for seq scan and index scan on pg_stat_all_tables and _indexes
Hi! Finally found some time to work on this. Tests added on v5 patch (attached). Maybe I'm not aware of the whole context of the thread and maybe my questions will seem a bit stupid, but honestly it's not entirely clear to me how this statistics will help to adjust the number of parallel workers. We may have situations when during overestimation of the cardinality during query optimization a several number of parallel workers were unjustifiably generated and vice versa - due to a high workload only a few number of workers were generated. How do we identify such cases so as not to increase or decrease the number of parallel workers when it is not necessary? -- Regards, Alena Rybakina Postgres Professional
Re: Converting tab-complete.c's else-if chain to a switch
On Mon, Oct 7, 2024 at 12:11 PM Tom Lane wrote: > Hmm, I should think that if you break anything in tab-complete.in.c, > the breakage would propagate to tab-complete.c without difficulty. > Do you have an example of something that the preprocessor would mask? Fair point, and nope. Thought withdrawn.
Re: Add parallel columns for pg_stat_statements
On Sun, Oct 06, 2024 at 03:32:02PM +0200, Guillaume Lelarge wrote: > I'm not sure I follow. That would mean that every time a query is executed, > it always gets the same amount of workers. Which is not guaranteed to be > true. > > I would agree, though, that parallelized_queries_launched is probably not > that interesting. I could get rid of it if you think it should go away. My point is that these stats are useful to know which action may have to be taken when reaching a mean, and numbers in pg_stat_statements offer hints that something is going wrong and that a closer lookup at an EXPLAIN plan may be required, particularly if the total number of workers planned and launched aggregated in the counters is unbalanced across queries. If the planned/launched ratio is balanced across most queries queries, a GUC adjustment may be OK. If the ratio is very unbalanced in a lower set of queries, I'd also look at tweaking GUCs instead like the per_gather. These counters give information that one or the other may be required. > Well, I don't see this as an overlap. Rather more information. Later versions of Benoit's patch have been accumulating this data in the executor state. v4 posted at [1] has the following diffs: --- a/src/include/nodes/execnodes.h +++ b/src/include/nodes/execnodes.h @@ -724,6 +724,9 @@ typedef struct EState */ List *es_insert_pending_result_relations; List *es_insert_pending_modifytables; + + int es_workers_launched; + int es_workers_planned; } EState; Your v2 posted on this thread has that: @@ -707,6 +707,12 @@ typedef struct EState struct EPQState *es_epq_active; booles_use_parallel_mode; /* can we use parallel workers? */ + booles_used_parallel_mode; /* was executed in parallel */ + int es_parallelized_workers_launched; + int es_parallelized_workers_planned; + int es_parallelized_nodes; /* # of parallelized nodes */ + int es_parallelized_nodes_all_workers; /* # of nodes with all workers launched */ + int es_parallelized_nodes_no_worker; /* # of nodes with no workers launched */ es_parallelized_workers_launched and es_workers_launched are the same thing in both. > On this, I would agree with you. They are not that particularly useful to > get better setting for parallel GUCs. I can drop them if you want. Yep. I would remove them for now. This leads to more bloat. > Did this on the v2 version of the patch (attached here). > > Thanks for your review. If you want the parallelized_queries_launched > column and the parallelized_nodes_* columns dropped, I can do that on a v3 > patch. I'd recommend to split that into more independent patches: - Introduce the two counters in EState with the incrementations done in nodeGatherMerge.c and nodeGather.c (mentioned that at [2], you may want to coordinate with Benoit to avoid duplicating the work). - Expand pg_stat_statements to use them for DMLs, SELECTs, well where they matter. - Look at expanding that for utilities that can do parallel jobs: CREATE INDEX and VACUUM, but this has lower priority to me, and this can reuse the same counters as the ones added by patch 2. [1]: https://www.postgresql.org/message-id/6ecad3ad-835c-486c-9ebd-da87a9a97...@dalibo.com [2]: https://www.postgresql.org/message-id/zv46wtmjltuu2...@paquier.xyz -- Michael signature.asc Description: PGP signature
Re: Add parallel columns for seq scan and index scan on pg_stat_all_tables and _indexes
On Mon, Oct 07, 2024 at 12:43:18AM +0300, Alena Rybakina wrote: > Maybe I'm not aware of the whole context of the thread and maybe my > questions will seem a bit stupid, but honestly > it's not entirely clear to me how this statistics will help to adjust the > number of parallel workers. > We may have situations when during overestimation of the cardinality during > query optimization a several number of parallel workers were unjustifiably > generated and vice versa - > due to a high workload only a few number of workers were generated. > How do we identify such cases so as not to increase or decrease the number > of parallel workers when it is not necessary? Well. For spiky workloads, only these numbers are not going to help. If you can map them with the number of times a query related to these tables has been called, something that pg_stat_statements would be able to show more information about. FWIW, I have doubts that these numbers attached to this portion of the system are always useful. For OLTP workloads, parallel workers would unlikely be spawned because even with JOINs we won't work with a high number of tuples that require them. This could be interested with analytics, however complex query sequences mean that we'd still need to look at all the plans involving the relations where there is an unbalance of planned/spawned workers, because these can usually involve quite a few gather nodes. At the end of the day, it seems to me that we would still need data that involves statements to track down specific plans that are starving. If your application does not have that many statements, looking at individial plans is OK, but if you have hundreds of them to dig into, this is time-consuming and stats at table/index level don't offer data in terms of stuff that stands out and needs adjustments. And this is without the argument of bloating more the stats entries for each table, even if it matters less now that these stats are in shmem lately. -- Michael signature.asc Description: PGP signature
Re: BUG #18641: Logical decoding of two-phase commit fails with TOASTed default values
On Tue, Oct 1, 2024 at 7:07 PM Takeshi Ideriha wrote: > > >We forgot to set/unset the flag in functions > >systable_beginscan_ordered and systable_endscan_ordered. BTW, > > Thank you for the clarification. > > >shouldn't this occur even without prepare transaction? If so, we need > >to backpatch this till 14. > > Yes, it occurred also at PG14. > I did some tests using 015_stream.pl with some modification like > below, which tests the subscription about stream is on but two-phase > is off. > The same issue occurred at both current head source code and PG14. > > ``` > --- a/src/test/subscription/t/015_stream.pl > +++ b/src/test/subscription/t/015_stream.pl > @@ -134,9 +134,11 @@ my $node_subscriber = > PostgreSQL::Test::Cluster->new('subscriber'); > $node_subscriber->init; > $node_subscriber->start; > > +my $default = join('', map {chr(65 + rand 26)} (1 .. 1)); > + > # Create some preexisting content on publisher > $node_publisher->safe_psql('postgres', > - "CREATE TABLE test_tab (a int primary key, b bytea)"); > + "CREATE TABLE test_tab (a int primary key, b bytea, t text > DEFAULT '$default')"); > $node_publisher->safe_psql('postgres', > "INSERT INTO test_tab VALUES (1, 'foo'), (2, 'bar')"); > > @@ -144,7 +146,7 @@ $node_publisher->safe_psql('postgres', "CREATE > TABLE test_tab_2 (a int)"); > > # Setup structure on subscriber > $node_subscriber->safe_psql('postgres', > - "CREATE TABLE test_tab (a int primary key, b bytea, c > timestamptz DEFAULT now(), d bigint DEFAULT 999)" > + "CREATE TABLE test_tab (a int primary key, b bytea, t text > DEFAULT '$default', c timestamptz DEFAULT now(), d bigint DEFAULT > 999)" > ); > ``` > > Logs from head source: > ``` > 2024-10-01 12:34:56.694 UTC [575202] LOG: starting PostgreSQL 18devel > on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.4.1 20230605 (Red Hat > 11.4.1-2), 64-bit > ... > 2024-10-01 12:34:57.506 UTC [575258] 015_stream.pl LOG: statement: BEGIN; > 2024-10-01 12:34:57.506 UTC [575258] 015_stream.pl LOG: statement: > INSERT INTO test_tab SELECT i, sha256(i::text::bytea) FROM > generate_series(3, 5000) s(i); > 2024-10-01 12:34:57.524 UTC [575242] tap_sub ERROR: unexpected > table_index_fetch_tuple call during logical decoding > 2024-10-01 12:34:57.524 UTC [575242] tap_sub STATEMENT: > START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4', > streaming 'on', origin 'any', publication_names '"tap_pub"') > 2024-10-01 12:34:57.525 UTC [575242] tap_sub LOG: released logical > replication slot "tap_sub" > 2024-10-01 12:34:57.525 UTC [575242] tap_sub LOG: could not send data > to client: Broken pipe > 2024-10-01 12:34:57.525 UTC [575242] tap_sub FATAL: connection to client lost > ... > 2024-10-01 12:34:57.829 UTC [575260] tap_sub STATEMENT: > START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4', > streaming 'on', origin 'any', publication_names '"tap_pub"') > 2024-10-01 12:34:57.829 UTC [575260] tap_sub LOG: starting logical > decoding for slot "tap_sub" > 2024-10-01 12:34:57.829 UTC [575260] tap_sub DETAIL: Streaming > transactions committing after 0/1583A68, reading WAL from 0/1583A30. > 2024-10-01 12:34:57.829 UTC [575260] tap_sub STATEMENT: > START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4', > streaming 'on', origin 'any', publication_names '"tap_pub"') > 2024-10-01 12:34:57.829 UTC [575260] tap_sub LOG: logical decoding > found consistent point at 0/1583A30 > 2024-10-01 12:34:57.829 UTC [575260] tap_sub DETAIL: There are no > running transactions. > 2024-10-01 12:34:57.829 UTC [575260] tap_sub STATEMENT: > START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4', > streaming 'on', origin 'any', publication_names '"tap_pub"') > 2024-10-01 12:34:57.831 UTC [575260] tap_sub ERROR: unexpected > table_index_fetch_tuple call during logical decoding > 2024-10-01 12:34:57.831 UTC [575260] tap_sub STATEMENT: > START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_version '4', > streaming 'on', origin 'any', publication_names '"tap_pub"') > 2024-10-01 12:34:57.832 UTC [575260] tap_sub LOG: released logical > replication slot "tap_sub" > 2024-10-01 12:34:57.832 UTC [575260] tap_sub LOG: could not send data > to client: Broken pipe > 2024-10-01 12:34:57.832 UTC [575260] tap_sub FATAL: connection to client lost > ``` > > Logs from PG14 source: > > ``` > 2024-10-01 13:20:29.409 UTC [593696] LOG: starting PostgreSQL 14.9 on > x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.4.1 20230605 (Red Hat > 11.4.1-2), 64-bit > ... > 2024-10-01 13:20:31.285 UTC [593750] 015_stream.pl LOG: statement: BEGIN; > 2024-10-01 13:20:31.285 UTC [593750] 015_stream.pl LOG: statement: > INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, > 5000) s(i); > 2024-10-01 13:20:31.301 UTC [593740] tap_sub ERROR: unexpected > table_index_fetch_tuple call during logical decoding > 2024-10-01 13:20:31.301 UTC [593740] tap_sub STATEMENT: > START_REPLICATION SLOT "tap_sub" LOGICAL 0/0 (proto_ve
Re: Converting tab-complete.c's else-if chain to a switch
On Sat, Jul 27, 2024 at 8:03 AM Tom Lane wrote: > I wrote: > > I modified the preprocessor to work like that, and I like the results > > better than what I had. > > I thought this version of the patch would be less subject to merge > conflicts than v1, but it didn't take long for somebody to break it. > Rebased v3 attached -- no nontrivial changes from v2. > > I'd like to get this merged soon ... +1 This works nicely for me and I couldn't find anything that doesn't work as expected, the mechanism seems reasonable, etc. I like how it's moving slowly towards a grammar DSL. Should we try to make sure that tab-complete.in.c remains compilable directly with a build mode that does that with 'cp' instead of the perl script? One very trivial thing, where you say: +# where N is the replacement case label, "HeadMatch" is the original +# function name shorn of "es", and the rest are the function arguments. ... maybe simpler language: 'with "es" removed'? (Thinking of non-native speakers; I understood that perfectly but I'm anglophone in a country overrun with sheep.)
Re: On disable_cost
On 06.10.2024 02:22, David Rowley wrote: To be honest, I don’t understand at all why we don’t count disabled nodes for append here? As I understand it, this is due to the fact that the partitioned table can also be scanned by an index. Besides mergeappend, in general it’s difficult for me to generalize for which nodes this rule applies, can you explain here? There are no special rules here of what to display based on the node type. Maybe you think there are some special rules because of the special cases for Append and MergeAppend in the patch? Those are handled specially as they don't use the Plan's lefttree and righttree fields. To be honest, I didn't quite understand initially why we don't display information about disabled nodes for Append and MergeAppend, therefore I had a question about other cases. Thank you for your explanation it was helpful! I also checked the code to see what parameters there are for these nodes (Append and MergeAppend) and how they are processed. To sum up, they provide for collecting information from child nodes. I agree that they do not need additional display about disabled nodes. Are you saying that the "Disabled: true" should propagate to the root of the plan tree? That fact that master does that is what Laurenz and I are complaining about. I'm not sure if I follow what you're asking. I agree that it's better to display such information for a specific disabled node. It's clearer what's going on and what it means. -- Regards, Alena Rybakina Postgres Professional
Re: Converting tab-complete.c's else-if chain to a switch
Thomas Munro writes: > On Sat, Jul 27, 2024 at 8:03 AM Tom Lane wrote: >> I thought this version of the patch would be less subject to merge >> conflicts than v1, but it didn't take long for somebody to break it. >> Rebased v3 attached -- no nontrivial changes from v2. > +1 Thanks for looking at it! > Should we try to make sure that tab-complete.in.c remains compilable > directly with a build mode that does that with 'cp' instead of the > perl script? Hmm, I should think that if you break anything in tab-complete.in.c, the breakage would propagate to tab-complete.c without difficulty. Do you have an example of something that the preprocessor would mask? > One very trivial thing, where you say: > +# where N is the replacement case label, "HeadMatch" is the original > +# function name shorn of "es", and the rest are the function arguments. > ... maybe simpler language: 'with "es" removed'? (Thinking of > non-native speakers; I understood that perfectly but I'm anglophone in > a country overrun with sheep.) Sure, that's changeable. regards, tom lane
Re: Use heap scan routines directly in vac_update_datfrozenxid()
Soumyadeep Chakraborty writes: > Attached is a simple patch to directly use heap scan routines in > vac_update_datfrozenxid(), avoiding the multilayer overhead from the > sysscan infrastructure. I would think the overhead of that is minuscule. If it isn't, we should try to make it so, not randomly hack up some of the callers to avoid it. The intention certainly was that it wouldn't cost anything compared to what happens within the actual table access. regards, tom lane
Re: Function for listing pg_wal/summaries directory
On Fri, Oct 04, 2024 at 10:02:11AM -0500, Nathan Bossart wrote: > Could you explain why you feel the existing support functions are > insufficient? Because it is not possible to outsource the scan of pg_wal/summaries/ to a different role, no? On HEAD, one would require a full access to the data folder, as pg_ls_waldir() does not recurse. Perhaps, rather than introducing a new function, we should just extend pg_ls_dir_files() so as it can optionally recurse, then use this option in the existing pg_ls_waldir() to show stats for pg_wal/ and pg_wal/summary/? You'd optinally need to print the relative path of the file in the output of the function, meaning that any summary file in should be shown as summaries/blah.summary. Just an idea for the bucket of ideas. This would be unconsistent with the existing pg_ls_archive_statusdir(), though, so using a new function may be just better and simpler. -- Michael signature.asc Description: PGP signature
Re: System username in pg_stat_activity
On Tue, Feb 20, 2024 at 10:32:53PM +0100, Magnus Hagander wrote: > In a way, that's yet another different type of values though -- it > contains accumulated stats. So we really have 3 types -- "info" that's > not really stats (username, etc), "current state" (query, wait events, > state) and "accumulated stats" (counters since start).If we don't want > to combine them all, we should perhaps not combine any and actually > have 3 views? This is the last message sent on this thread, and an entry was still registered in the CF app, so I am marking that as RwF for now. -- Michael signature.asc Description: PGP signature
Re: POC, WIP: OR-clause support for indexes
On Fri, Oct 4, 2024 at 4:34 PM Robert Haas wrote: > On Mon, Sep 23, 2024 at 7:11 AM Alexander Korotkov > wrote: > > Makes sense. Please, check the attached patch freeing the consts list > > while returning NULL from match_orclause_to_indexcol(). > > Some review comments: > > I agree with the comments already given to the effect that the patch > looks much better now. I was initially surprised to see this happening > in match_clause_to_indexcol() but after studying it I think it looks > like the right place. I think it makes sense to think about moving > forward with this, although it would be nice to get Tom's take if we > can. > > I see that the patch makes no update to the header comment for > match_clause_to_indexcol() nor to the comment just above the cascade > of if-statements. I think both need to be updated. > > More generally, many of the comments in this patch seem to just > explain what the code does, and I'd like to reiterate my usual > complaint: as far as possible, comments should explain WHY the code > does what it does. Certainly, in some cases there's nothing to be said > about that e.g. /* Lookup for operator to fetch necessary information > for the SAOP node */ isn't really saying anything non-obvious but it's > reasonable to have the comment here anyway. However, when there is > something more interesting to be said, then we should do that rather > than just reiterate what the reader who knows C can anyway see. For > instance, the lengthy comment beginning with "Iterate over OR > entries." could either be shorter and recapitulate less of the code > that follows, or it could say something more interesting about why > we're doing it like that. > > + /* We allow constant to be Const or Param */ > + if (!IsA(constExpr, Const) && !IsA(constExpr, Param)) > + break; > > This restriction is a lot tighter than the one mentioned in the header > comment of match_clause_to_indexcol ("Our definition of const is > exceedingly liberal"). If there's a reason for that, the comments > should talk about it. If there isn't, it's better to be consistent. > > + /* > + * Check operator is present in the opfamily, expression collation > + * matches index collation. Also, there must be an array type in > + * order to construct an array later. > + */ > + if (!IndexCollMatchesExprColl(index->indexcollations[indexcol], > inputcollid) || > + !op_in_opfamily(matchOpno, index->opfamily[indexcol]) || > + !OidIsValid(arraytype)) > + break; > > I spent some time wondering whether this was safe. The > IndexCollMatchesExprColl() guarantees that either the input collation > is equal to the index collation, or the index collation is 0. If the > index collation is 0 then that I *think* that guarantees that the > indexed type is non-collatable, but this could be a cross-type > comparison, and it's possible that the other type is collatable. In > that case, I don't think anything would prevent us from merging a > bunch of OR clauses with different collations into a single SAOP. I > don't really see how that could be a problem, because if the index is > of a non-collatable type, then presumably the operator doesn't care > about what the collation is, so it should all be fine, I guess? But > I'm not very confident about that conclusion. > > I'm unclear what the current thinking is about the performance of this > patch, both as to planning and as to execution. Do we believe that > this transformation is a categorical win at execution-time? In theory, > OR format alllows for short-circuit execution, but because of the > Const-or-Param restriction above, I don't think that's mostly a > non-issue. But maybe not completely, because I can see from the > regression test changes that it's possible for us to apply this > transformation when the Param is set by an InitPlan or SubPlan. If we > have something like WHERE tenthous = 1 OR tenthous = > (very_expensive_computation() + 1), maybe the patch could lose, > because we'll have to do the very expensive calculation to evaluate > the SAOP, and the OR could stop as soon as we establish that tenthous > != 1. If we only did the transformation when the Param is an external > parameter, then we wouldn't have this issue. Maybe this isn't worth > worrying about; I'm not sure. Are there any other cases where the > transformation can produce something that executes more slowly? > > As far as planning time is concerned, I don't think this is going to > be too bad, because most of the work only needs to be done if there > are OR-clauses, and my intuition is that the optimization will often > apply in such cases, so it seems alright. But I wonder how much > testing has been done of adversarial cases, e.g. lots of non-indexable > clause in the query; or lots of OR clauses in the query but all of > them turn out on inspection to be non-indexable. My expectation would > be that there's no real problem here, but it would be good to verify > that experimentally. Thank you so much for the review. I'
Re: System views for versions reporting
Joe Conway writes: > I think it would be nice to include a sha256 hash (or something similar) > of the libraries as well, so that they can be checked against known good > values. That seems well outside the charter of this patch. Also, how would we even get that information? A typical application doesn't know exactly what libraries it's linked with or where they came from on the filesystem. Maybe one could find that out with sufficient platform-specific hackery, but I don't believe we could do it portably. regards, tom lane
Re: Add parallel columns for seq scan and index scan on pg_stat_all_tables and _indexes
Hello, Le jeu. 5 sept. 2024 à 08:19, Guillaume Lelarge a écrit : > Le jeu. 5 sept. 2024 à 07:36, Bertrand Drouvot < > bertranddrouvot...@gmail.com> a écrit : > >> Hi, >> >> On Wed, Sep 04, 2024 at 04:37:19PM +0200, Guillaume Lelarge wrote: >> > Hi, >> > >> > Le mer. 4 sept. 2024 à 16:18, Bertrand Drouvot < >> bertranddrouvot...@gmail.com> >> > a écrit : >> > > What about adding a comment instead of this extra check? >> > > >> > > >> > Done too in v3. >> >> Thanks! >> >> 1 === >> >> + /* >> +* Don't check counts.parallelnumscans because counts.numscans >> includes >> +* counts.parallelnumscans >> +*/ >> >> "." is missing at the end of the comment. >> >> > Fixed in v4. > > >> 2 === >> >> - if (t > tabentry->lastscan) >> + if (t > tabentry->lastscan && lstats->counts.numscans) >> >> The extra check on lstats->counts.numscans is not needed as it's already >> done >> a few lines before. >> >> > Fixed in v4. > > >> 3 === >> >> + if (t > tabentry->parallellastscan && >> lstats->counts.parallelnumscans) >> >> This one makes sense. >> >> And now I'm wondering if the extra comment added in v3 is really worth it >> (and >> does not sound confusing)? I mean, the parallel check is done once we >> passe >> the initial test on counts.numscans. I think the code is clear enough >> without >> this extra comment, thoughts? >> >> > I'm not sure I understand you here. I kinda like the extra comment though. > > >> 4 === >> >> What about adding a few tests? or do you want to wait a bit more to see >> if " >> there's an agreement on this patch" (as you stated at the start of this >> thread). >> >> > Guess I can start working on that now. It will take some time as I've > never done it before. Good thing I added the patch on the November commit > fest :) > > Finally found some time to work on this. Tests added on v5 patch (attached). Regards. -- Guillaume. From 92474720b3178f74517958fededcf6797de58552 Mon Sep 17 00:00:00 2001 From: Guillaume Lelarge Date: Sun, 6 Oct 2024 21:50:17 +0200 Subject: [PATCH v5] Add parallel columns for pg_stat_all_tables,indexes pg_stat_all_tables gets 4 new columns: parallel_seq_scan, last_parallel_seq_scan, parallel_idx_scan, last_parallel_idx_scan. pg_stat_all_indexes gets 2 new columns: parallel_idx_scan, last_parallel_idx_scan. --- doc/src/sgml/monitoring.sgml | 69 ++- src/backend/access/brin/brin.c | 2 +- src/backend/access/gin/ginscan.c | 2 +- src/backend/access/gist/gistget.c| 4 +- src/backend/access/hash/hashsearch.c | 2 +- src/backend/access/heap/heapam.c | 2 +- src/backend/access/nbtree/nbtsearch.c| 12 +- src/backend/access/spgist/spgscan.c | 2 +- src/backend/catalog/system_views.sql | 6 + src/backend/utils/activity/pgstat_relation.c | 8 + src/backend/utils/adt/pgstatfuncs.c | 6 + src/include/catalog/pg_proc.dat | 8 + src/include/pgstat.h | 17 +- src/test/regress/expected/rules.out | 18 ++ src/test/regress/expected/stats.out | 194 +++ src/test/regress/sql/stats.sql | 92 + 16 files changed, 421 insertions(+), 23 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 331315f8d3..aeaabb0ffe 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -3803,7 +3803,7 @@ description | Waiting for a newly initialized WAL file to reach durable storage seq_scan bigint - Number of sequential scans initiated on this table + Number of sequential scans (including parallel ones) initiated on this table @@ -3812,7 +3812,26 @@ description | Waiting for a newly initialized WAL file to reach durable storage last_seq_scan timestamp with time zone - The time of the last sequential scan on this table, based on the + The time of the last sequential scan (including parallel ones) on this table, based on the + most recent transaction stop time + + + + + + parallel_seq_scan bigint + + + Number of parallel sequential scans initiated on this table + + + + + + last_parallel_seq_scan timestamp with time zone + + + The time of the last parallel sequential scan on this table, based on the most recent transaction stop time @@ -3831,7 +3850,7 @@ description | Waiting for a newly initialized WAL file to reach durable storage idx_scan bigint - Number of index scans initiated on this table + Number of index scans (including parallel ones) initiated on this table @@ -3840,7 +3859,26 @@ description | Waiting for a newly initialized WAL file to reach durable storage last_idx_scan times
Use heap scan routines directly in vac_update_datfrozenxid()
Hi hackers, Attached is a simple patch to directly use heap scan routines in vac_update_datfrozenxid(), avoiding the multilayer overhead from the sysscan infrastructure. The speedup can be noticeable in databases containing a large number of relations (perhaps due to heavy partition table usage). This was proposed in [1]. Experiment setup: * Use -O3 optimized build without asserts, with fsync and autovacuum off, on my laptop. Other gucs are all at defaults. * Create tables using pgbench to inflate pg_class's to a decent size. $ cat << EOF > bench.sql > select txid_current() AS txid \gset > CREATE TABLE t:txid(a int); > EOF $ pgbench -f ./bench.sql -t 20 -c 100 -n bench select pg_size_pretty(pg_relation_size('pg_class')); pg_size_pretty 3508 MB (1 row) * Use instr_time to record the scan time. See attached instr_vac.diff. * Run vacuum on any of the created empty tables in the database bench: Results: * main as of 68dfecbef2: bench=# vacuum t1624; NOTICE: scan took 796.862142 ms bench=# vacuum t1624; NOTICE: scan took 793.730688 ms bench=# vacuum t1624; NOTICE: scan took 793.963655 ms * patch: bench=# vacuum t1624; NOTICE: scan took 682.283366 ms bench=# vacuum t1624; NOTICE: scan took 670.816975 ms bench=# vacuum t1624; NOTICE: scan took 683.821717 ms Regards, Soumyadeep (Broadcom) [1] https://www.postgresql.org/message-id/20221229030329.fbpiitatmowzza6c%40awork3.anarazel.de From 320e54894a1ad45e2c25a4ee88a6409a9dc1a527 Mon Sep 17 00:00:00 2001 From: Soumyadeep Chakraborty Date: Sat, 5 Oct 2024 16:22:56 -0700 Subject: [PATCH v1 1/1] Use heap_getnext() in vac_update_datfrozenxid() Since we are going to do a full sequential scan without a filter, we can avoid overhead from the extra layers of sysscan. --- src/backend/commands/vacuum.c | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index ac8f5d9c25..717e310054 100644 --- a/src/backend/commands/vacuum.c +++ b/src/backend/commands/vacuum.c @@ -1588,7 +1588,7 @@ vac_update_datfrozenxid(void) HeapTuple tuple; Form_pg_database dbform; Relation relation; - SysScanDesc scan; + TableScanDesc scan; HeapTuple classTup; TransactionId newFrozenXid; MultiXactId newMinMulti; @@ -1638,10 +1638,9 @@ vac_update_datfrozenxid(void) */ relation = table_open(RelationRelationId, AccessShareLock); - scan = systable_beginscan(relation, InvalidOid, false, - NULL, 0, NULL); + scan = table_beginscan_catalog(relation, 0, NULL); - while ((classTup = systable_getnext(scan)) != NULL) + while ((classTup = heap_getnext(scan, ForwardScanDirection)) != NULL) { volatile FormData_pg_class *classForm = (Form_pg_class) GETSTRUCT(classTup); TransactionId relfrozenxid = classForm->relfrozenxid; @@ -1707,7 +1706,7 @@ vac_update_datfrozenxid(void) } /* we're done with pg_class */ - systable_endscan(scan); + table_endscan(scan); table_close(relation, AccessShareLock); /* chicken out if bogus data found */ -- 2.43.0 diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index 717e310054..db3e8a4baf 100644 --- a/src/backend/commands/vacuum.c +++ b/src/backend/commands/vacuum.c @@ -1599,6 +1599,9 @@ vac_update_datfrozenxid(void) ScanKeyData key[1]; void *inplace_state; + instr_time before; + instr_time after; + /* * Restrict this task to one backend per database. This avoids race * conditions that would move datfrozenxid or datminmxid backward. It @@ -1636,6 +1639,7 @@ vac_update_datfrozenxid(void) * * See vac_truncate_clog() for the race condition to prevent. */ + INSTR_TIME_SET_CURRENT(before); relation = table_open(RelationRelationId, AccessShareLock); scan = table_beginscan_catalog(relation, 0, NULL); @@ -1708,7 +1712,9 @@ vac_update_datfrozenxid(void) /* we're done with pg_class */ table_endscan(scan); table_close(relation, AccessShareLock); - + INSTR_TIME_SET_CURRENT(after); + INSTR_TIME_SUBTRACT(after, before); + elog(NOTICE, "scan took %lf", INSTR_TIME_GET_MILLISEC(after)); /* chicken out if bogus data found */ if (bogus) return;