Re: missing documentation for streaming in-progress transactions
On Fri, Apr 9, 2021 at 9:58 AM Justin Pryzby wrote: > > On Wed, Apr 07, 2021 at 05:45:16PM +0530, Amit Kapila wrote: > > On Wed, Apr 7, 2021 at 1:11 PM Ajin Cherian wrote: > > > > > > Found that some documentation hasn't been updated for the changes made as > > > part of > > > streaming large in-progress transactions. Attached a patch that adds the > > > missing changes. Let me know if anything more needs to be added or any > > > comments on this change. > > > > > > > Thanks, this mostly looks good to me. I have slightly modified it. > > See, what do you think of the attached? > > + Protocol version. Currently versions 1 and > + 2 are supported. The version 2 > + is supported only for server versions 14 and above, and is used to > allow > + streaming of large in-progress transactions. > > The diff briefly confused me, since this is in protocol.sgml, and since the > libpq protocol version is 1/2/3, with 2 being removed in v14 (3174d69fb). > I suggest to say "replication protocol version 2". > > I realize that the headings make this more clear when reading the .html, so > maybe there's no issue. > Yeah, this was the reason to not include replication. If one is reading the document or even *.sgml, there shouldn't be any confusion but if you or others feel so, we can use 'replication' as well. -- With Regards, Amit Kapila.
Re: doc review for v14
On Thu, Apr 08, 2021 at 11:40:08AM -0500, Justin Pryzby wrote: > Another round of doc review, not yet including all of yesterday's commits. Thanks for compiling all that. I got through the whole set and applied the most relevant parts on HEAD. Some of them applied down to 9.6, so I have fixed it down where needed, for the parts that did not conflict too heavily. -- Michael signature.asc Description: PGP signature
Re: Simplify backend terminate and wait logic in postgres_fdw test
On Fri, Apr 9, 2021 at 7:29 AM Michael Paquier wrote: > > On Fri, Apr 09, 2021 at 06:53:21AM +0530, Bharath Rupireddy wrote: > > I didn't think of the warning cases, my bad. How about using SET > > client_min_messages = 'ERROR'; before we call > > pg_wait_for_backend_termination? We can only depend on the return > > value of pg_wait_for_backend_termination, when true we can exit. This > > way the buildfarm will not see warnings. Thoughts? > > You could do that, but I would also bet that this is going to get > forgotten in the future if this gets extended in more SQL tests that > are output-sensitive, in or out of core. Honestly, I can get behind a > warning in pg_wait_for_backend_termination() to inform that the > process poked at is not a PostgreSQL one, because it offers new and > useful information to the user. But, and my apologies for sounding a > bit noisy, I really don't get why pg_wait_until_termination() has any > need to do that. From what I can see, it provides the following > information: > - A PID, that we already know from the caller or just from > pg_stat_activity. > - A timeout, already known as well. > - The fact that the process did not terminate, information given by > the "false" status, only used in this case. > > So there is no new information here to the user, only a duplicate of > what's already known to the caller of this function. I see more > advantages in removing this WARNING rather than keeping it. IMO it does make sense to provide a warning for a bool returning function, if there are multiple situations in which the function returns false. This will give clear information as to why the false is returned. pg_terminate_backend: false is returned 1) when the process with given pid is not a backend(warning "PID %d is not a PostgreSQL server process") 2) if the kill() fails(warning "could not send signal to process %d: %m") 3) if the timeout is specified and the backend is not terminated within it(warning "backend with PID %d did not terminate within %lld milliseconds"). pg_cancel_backend: false is returned 1) when the process with the given pid is not a backend 2) if the kill() fails. pg_wait_for_backend_termination: false is returned 1) when the process with a given pid is not a backend 2) the backend is not terminated within the timeout. If we ensure that all the above functions return false in only one situation and error in all other situations, then removing warnings makes sense. Having said above, there seems to be a reason for issuing a warning and returning false instead of error, that is the callers can just call these functions in a loop until they return true. See the below comments: /* * This is just a warning so a loop-through-resultset will not abort * if one backend terminated on its own during the run. */ /* Again, just a warning to allow loops */ I would like to keep the behaviour of these functions as-is. > You could do that, but I would also bet that this is going to get > forgotten in the future if this gets extended in more SQL tests that > are output-sensitive, in or out of core On the above point of hackers (wanting to use these functions in more SQL tests) forgetting that the functions pg_terminate_backend, pg_cancel_backend, pg_wait_for_backend_termination issue a warning in some cases which might pollute the tests if used without suppressing these warnings, I feel it is best left to the patch implementers and the reviewers. On our part, we mentioned that the functions pg_terminate_backend and pg_wait_for_backend_termination would emit a warning on timeout "On timeout a warning is emitted and false is returned" With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Lots of incorrect comments in nodeFuncs.c
On Fri, 9 Apr 2021 at 13:52, Michael Paquier wrote: > > On Thu, Apr 08, 2021 at 09:21:30PM -0400, Alvaro Herrera wrote: > > On 2021-Apr-08, Tom Lane wrote: > >> Maybe like > >> > >> case T_ScalarArrayOpExpr: > >> /* ScalarArrayOpExpr's result is boolean ... */ > >> coll = InvalidOid; /* ... so it has no collation > >> */ > >> break; > > > > This is much clearer, yeah. > > +1. Yeah, that's much better. For the exprSetCollation case, I ended up with: case T_ScalarArrayOpExpr: /* ScalarArrayOpExpr's result is boolean ... */ Assert(!OidIsValid(collation)); /* ... so never set a collation */ I wanted something more like /* ... so we must never set a collation */ but that put the line longer than 80. I thought wrapping to a 2nd line was excessive, so I shortened it to that. David improve_possibly_misleading_comments_in_nodefuncs.patch Description: Binary data
Re: missing documentation for streaming in-progress transactions
On Wed, Apr 07, 2021 at 05:45:16PM +0530, Amit Kapila wrote: > On Wed, Apr 7, 2021 at 1:11 PM Ajin Cherian wrote: > > > > Found that some documentation hasn't been updated for the changes made as > > part of > > streaming large in-progress transactions. Attached a patch that adds the > > missing changes. Let me know if anything more needs to be added or any > > comments on this change. > > > > Thanks, this mostly looks good to me. I have slightly modified it. > See, what do you think of the attached? + Protocol version. Currently versions 1 and + 2 are supported. The version 2 + is supported only for server versions 14 and above, and is used to allow + streaming of large in-progress transactions. The diff briefly confused me, since this is in protocol.sgml, and since the libpq protocol version is 1/2/3, with 2 being removed in v14 (3174d69fb). I suggest to say "replication protocol version 2". I realize that the headings make this more clear when reading the .html, so maybe there's no issue. -- Justin
Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)
Breaking with tradition, the previous patch included one too *few* changes, and failed to resolve the OID collisions. >From d3d33b25e8571f5a2a3124e85164321f19ca Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Mon, 16 Mar 2020 14:12:55 -0500 Subject: [PATCH v28 01/11] Document historic behavior of links to directories.. Backpatch to 9.5: pg_stat_file --- doc/src/sgml/func.sgml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 0606b6a9aa..aa0dcde886 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -27018,7 +27018,7 @@ SELECT convert_from(pg_read_binary_file('file_in_utf8.txt'), 'UTF8'); Returns a record containing the file's size, last access time stamp, last modification time stamp, last file status change time stamp (Unix platforms only), file creation time stamp (Windows only), and a flag -indicating if it is a directory. +indicating if it is a directory (or a symbolic link to a directory). This function is restricted to superusers by default, but other users -- 2.17.0 >From c6300e4cc9fd6826530ac5a5c49eaac7f02c49c0 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Tue, 17 Mar 2020 13:16:24 -0500 Subject: [PATCH v28 02/11] Add tests on pg_ls_dir before changing it --- src/test/regress/expected/misc_functions.out | 24 src/test/regress/sql/misc_functions.sql | 8 +++ 2 files changed, 32 insertions(+) diff --git a/src/test/regress/expected/misc_functions.out b/src/test/regress/expected/misc_functions.out index e845042d38..ea0fc48dbd 100644 --- a/src/test/regress/expected/misc_functions.out +++ b/src/test/regress/expected/misc_functions.out @@ -214,6 +214,30 @@ select count(*) > 0 from t (1 row) +select * from (select pg_ls_dir('.', false, true) as name) as ls where ls.name='.'; -- include_dot_dirs=true + name +-- + . +(1 row) + +select * from (select pg_ls_dir('.', false, false) as name) as ls where ls.name='.'; -- include_dot_dirs=false + name +-- +(0 rows) + +select pg_ls_dir('does not exist', true, false); -- ok with missingok=true + pg_ls_dir +--- +(0 rows) + +select pg_ls_dir('does not exist'); -- fails with missingok=false +ERROR: could not open directory "does not exist": No such file or directory +-- Check that expected columns are present +select * from pg_stat_file('.') limit 0; + size | access | modification | change | creation | isdir +--++--++--+--- +(0 rows) + -- -- Test adding a support function to a subject function -- diff --git a/src/test/regress/sql/misc_functions.sql b/src/test/regress/sql/misc_functions.sql index a398349afc..eb6ac12ab4 100644 --- a/src/test/regress/sql/misc_functions.sql +++ b/src/test/regress/sql/misc_functions.sql @@ -69,6 +69,14 @@ select count(*) > 0 from where spcname = 'pg_default') pts join pg_database db on pts.pts = db.oid; +select * from (select pg_ls_dir('.', false, true) as name) as ls where ls.name='.'; -- include_dot_dirs=true +select * from (select pg_ls_dir('.', false, false) as name) as ls where ls.name='.'; -- include_dot_dirs=false +select pg_ls_dir('does not exist', true, false); -- ok with missingok=true +select pg_ls_dir('does not exist'); -- fails with missingok=false + +-- Check that expected columns are present +select * from pg_stat_file('.') limit 0; + -- -- Test adding a support function to a subject function -- -- 2.17.0 >From 5d352f9fee18f44a03f5c373b4aec71f88933402 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Mon, 9 Mar 2020 22:40:24 -0500 Subject: [PATCH v28 03/11] Add pg_ls_dir_metadata to list a dir with file metadata.. Generalize pg_ls_dir_files and retire pg_ls_dir Need catversion bumped? --- doc/src/sgml/func.sgml | 21 ++ src/backend/catalog/system_views.sql | 1 + src/backend/utils/adt/genfile.c | 233 +++ src/include/catalog/pg_proc.dat | 12 + src/test/regress/expected/misc_functions.out | 24 ++ src/test/regress/input/tablespace.source | 5 + src/test/regress/output/tablespace.source| 8 + src/test/regress/sql/misc_functions.sql | 11 + 8 files changed, 222 insertions(+), 93 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index aa0dcde886..507a6d73f8 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -25821,6 +25821,27 @@ postgres=# SELECT * FROM pg_walfile_name_offset(pg_stop_backup()); + + + + pg_ls_dir_metadata + +pg_ls_dir_metadata ( dirname text +, missing_ok boolean, +include_dot_dirs boolean ) +setof record +( filename text, +size bigint, +modification timestamp with time zone ) + + +For each file in the specified directory, list the file and its +
Re: missing documentation for streaming in-progress transactions
On Fri, Apr 9, 2021 at 8:29 AM Ajin Cherian wrote: > > On Fri, Apr 9, 2021 at 10:23 AM Euler Taveira wrote: >> >> >> I didn't like this style because it is not descriptive enough. It is also >> not a >> style adopted by Postgres. I suggest to add something like "This field was >> introduced in version 2" or "This field is available since version 2" after >> the >> field description. > > > I have updated this to "Since protocol version 2" >> >> >> +Xid of the sub-transaction (will be same as xid of the >> transaction for top-level >> +transactions). >> + >> >> Although, sub-transaction is also used in the documentation, I suggest to use >> subtransaction. Maybe change the other sub-transaction occurrences too. > > > Updated. > I don't like repeating the same thing for all new messages. So added separate para for the same and few other changes. See what do you think of the attached? -- With Regards, Amit Kapila. v5-0001-doc-Update-information-of-new-messages-for-logica.patch Description: Binary data
Re: PostmasterIsAlive() in recovery (non-USE_POST_MASTER_DEATH_SIGNAL builds)
On Wed, Mar 31, 2021 at 7:02 PM Thomas Munro wrote: > On Fri, Mar 12, 2021 at 7:55 PM Thomas Munro wrote: > > On Thu, Mar 11, 2021 at 7:34 PM Michael Paquier wrote: > > > Wow. This probably means that we would be able to get rid of > > > USE_POSTMASTER_DEATH_SIGNAL? > > > > We'll still need it, because there'd still be systems with no signal: > > NetBSD, OpenBSD, AIX, HPUX, illumos. > > Erm, and of course macOS. > > There is actually something we could do here: ioctl(I_SETSIG) for > SysV-derived systems and fcntl(O_ASYNC) for BSD-derived systems seems > like a promising way to get a SIGIO signal when the postmaster goes > away and the pipe becomes readable. Previously I'd discounted this, > because it's not in POSIX and I doubted it would work well on other > systems. But I was flicking through Stevens' UNIX book while trying > to figure out that POLLHUP stuff from a nearby thread (though it's > useless for that purpose) and I learned from section 12.6 that SIGIO > is fairly ancient, originating in 4.2BSD, adopted by SVR4, so it's > likely present in quite a few systems, maybe even all of our support > platforms if we're prepared to do it two different ways. Just a > thought. Alright, here's a proof-of-concept patch that does that. Adding to the next CF. This seems to work on Linux, macOS, FreeBSD and OpenBSD (and I assume any other BSD). Can anyone tell me if it works on illumos, AIX or HPUX, and if not, how to fix it or disable the feature gracefully? For now the patch assumes that if you have SIGIO then you can do this; perhaps it should also test for O_ASYNC? Perhaps HPUX has the signal but requires a different incantation with I_SETSIG? Full disclosure: The reason for my interest in this subject is that I have a work-in-progress patch set to make latches, locks and condition variables more efficient using futexes on several OSes, but it needs a signal to wake on postmaster death. From acff1b99d464eca453cdde3f5ca8079c925e47ac Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Fri, 9 Apr 2021 16:00:07 +1200 Subject: [PATCH] Use SIGIO to detect postmaster death. Previously we used non-POSIX calls prctl/procctl to requests a signal on death of our parent process on Linux and FreeBSD. Instead, use O_ASYNC to request SIGIO when the pipe is closed by the postmaster. The effect is about the same, but it should work on many other Unix systems too. It's not in POSIX either, so it remains optional. Discussion: https://postgr.es/m/CA%2BhUKGJiejg%3DGPTkf3S53N0-Vr4fOQ4wev7DmAVVLHbh%3DO9%2Bdg%40mail.gmail.com --- configure | 2 +- configure.ac | 2 -- src/backend/storage/ipc/pmsignal.c | 41 -- src/include/pg_config.h.in | 6 - src/include/storage/pmsignal.h | 11 +--- src/tools/msvc/Solution.pm | 2 -- 6 files changed, 12 insertions(+), 52 deletions(-) diff --git a/configure b/configure index 70f4555264..9b69fb203a 100755 --- a/configure +++ b/configure @@ -13345,7 +13345,7 @@ $as_echo "#define HAVE_STDBOOL_H 1" >>confdefs.h fi -for ac_header in atomic.h copyfile.h execinfo.h getopt.h ifaddrs.h langinfo.h mbarrier.h poll.h sys/epoll.h sys/event.h sys/ipc.h sys/prctl.h sys/procctl.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/sockio.h sys/tas.h sys/uio.h sys/un.h termios.h ucred.h wctype.h +for ac_header in atomic.h copyfile.h execinfo.h getopt.h ifaddrs.h langinfo.h mbarrier.h poll.h sys/epoll.h sys/event.h sys/ipc.h sys/pstat.h sys/resource.h sys/select.h sys/sem.h sys/shm.h sys/sockio.h sys/tas.h sys/uio.h sys/un.h termios.h ucred.h wctype.h do : as_ac_Header=`$as_echo "ac_cv_header_$ac_header" | $as_tr_sh` ac_fn_c_check_header_mongrel "$LINENO" "$ac_header" "$as_ac_Header" "$ac_includes_default" diff --git a/configure.ac b/configure.ac index ba67c95bcc..f08c60acd6 100644 --- a/configure.ac +++ b/configure.ac @@ -1360,8 +1360,6 @@ AC_CHECK_HEADERS(m4_normalize([ sys/epoll.h sys/event.h sys/ipc.h - sys/prctl.h - sys/procctl.h sys/pstat.h sys/resource.h sys/select.h diff --git a/src/backend/storage/ipc/pmsignal.c b/src/backend/storage/ipc/pmsignal.c index 280c2395c9..e9f5b4fc41 100644 --- a/src/backend/storage/ipc/pmsignal.c +++ b/src/backend/storage/ipc/pmsignal.c @@ -14,6 +14,7 @@ */ #include "postgres.h" +#include #include #include @@ -92,23 +93,6 @@ postmaster_death_handler(int signo) { postmaster_possibly_dead = true; } - -/* - * The available signals depend on the OS. SIGUSR1 and SIGUSR2 are already - * used for other things, so choose another one. - * - * Currently, we assume that we can always find a signal to use. That - * seems like a reasonable assumption for all platforms that are modern - * enough to have a parent-death signaling mechanism. - */ -#if defined(SIGINFO) -#define POSTMASTER_DEATH_SIGNAL SIGINFO -#elif defined(SIGPWR) -#define POSTMASTER_DEATH_SIGNAL SIGPWR -#else -#error "cannot find a signal to use for
Re: [POC] Fast COPY FROM command for the table with foreign partitions
On Fri, Apr 9, 2021 at 9:49 AM Justin Pryzby wrote: > I rebased this patch to resolve a trivial 1 line conflict from c5b7ba4e6. Thanks for rebasing! Actually, I've started reviewing this, but I couldn't finish my review. My apologies for not having much time on this. I'll continue to work on it for PG15. Sorry for the empty email. Best regards, Etsuro Fujita
Re: [POC] Fast COPY FROM command for the table with foreign partitions
On Fri, Apr 9, 2021 at 9:49 AM Justin Pryzby wrote: > > I rebased this patch to resolve a trivial 1 line conflict from c5b7ba4e6. > > -- > Justin
Re: WIP: WAL prefetch (another approach)
Here's some little language fixes. BTW, before beginning "recovery", PG syncs all the data dirs. This can be slow, and it seems like the slowness is frequently due to file metadata. For example, that's an obvious consequence of an OS crash, after which the page cache is empty. I've made a habit of running find /zfs -ls |wc to pre-warm it, which can take a little bit, but then the recovery process starts moments later. I don't have any timing measurements, but I expect that starting to stat() all data files as soon as possible would be a win. commit cc9707de333fe8242607cde9f777beadc68dbf04 Author: Justin Pryzby Date: Thu Apr 8 10:43:14 2021 -0500 WIP: doc review: Optionally prefetch referenced data in recovery. 1d257577e08d3e598011d6850fd1025858de8c8c diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index bc4a8b2279..139dee7aa2 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -3621,7 +3621,7 @@ include_dir 'conf.d' pool after that. However, on file systems with a block size larger than PostgreSQL's, prefetching can avoid a -costly read-before-write when a blocks are later written. +costly read-before-write when blocks are later written. The default is off. diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml index 24cf567ee2..36e00c92c2 100644 --- a/doc/src/sgml/wal.sgml +++ b/doc/src/sgml/wal.sgml @@ -816,9 +816,7 @@ prefetching mechanism is most likely to be effective on systems with full_page_writes set to off (where that is safe), and where the working - set is larger than RAM. By default, prefetching in recovery is enabled - on operating systems that have posix_fadvise - support. + set is larger than RAM. By default, prefetching in recovery is disabled. diff --git a/src/backend/access/transam/xlogprefetch.c b/src/backend/access/transam/xlogprefetch.c index 28764326bc..363c079964 100644 --- a/src/backend/access/transam/xlogprefetch.c +++ b/src/backend/access/transam/xlogprefetch.c @@ -31,7 +31,7 @@ * stall; this is counted with "skip_fpw". * * The only way we currently have to know that an I/O initiated with - * PrefetchSharedBuffer() has that recovery will eventually call ReadBuffer(), + * PrefetchSharedBuffer() has that recovery will eventually call ReadBuffer(), XXX: what ?? * and perform a synchronous read. Therefore, we track the number of * potentially in-flight I/Os by using a circular buffer of LSNs. When it's * full, we have to wait for recovery to replay records so that the queue @@ -660,7 +660,7 @@ XLogPrefetcherScanBlocks(XLogPrefetcher *prefetcher) /* * I/O has possibly been initiated (though we don't know if it was * already cached by the kernel, so we just have to assume that it -* has due to lack of better information). Record this as an I/O +* was due to lack of better information). Record this as an I/O * in progress until eventually we replay this LSN. */ XLogPrefetchIncrement(>prefetch); diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 090abdad8b..8c72ba1f1a 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -2774,7 +2774,7 @@ static struct config_int ConfigureNamesInt[] = { {"wal_decode_buffer_size", PGC_POSTMASTER, WAL_ARCHIVE_RECOVERY, gettext_noop("Maximum buffer size for reading ahead in the WAL during recovery."), - gettext_noop("This controls the maximum distance we can read ahead n the WAL to prefetch referenced blocks."), + gettext_noop("This controls the maximum distance we can read ahead in the WAL to prefetch referenced blocks."), GUC_UNIT_BYTE }, _decode_buffer_size,
Re: TRUNCATE on foreign table
2021年4月8日(木) 22:14 Fujii Masao : > > On 2021/04/08 22:02, Kohei KaiGai wrote: > >> Anyway, attached is the updated version of the patch. This is still based > >> on the latest Kazutaka-san's patch. That is, extra list for ONLY is still > >> passed to FDW. What about committing this version at first? Then we can > >> continue the discussion and change the behavior later if necessary. > > Pushed! Thank all involved in this development!! > For record, I attached the final patch I committed. > > > > Ok, it's fair enought for me. > > > > I'll try to sort out my thought, then raise a follow-up discussion if > > necessary. > > Thanks! > > The followings are the open items and discussion points that I'm thinking of. > > 1. Currently the extra information (TRUNCATE_REL_CONTEXT_NORMAL, > TRUNCATE_REL_CONTEXT_ONLY or TRUNCATE_REL_CONTEXT_CASCADING) about how a > foreign table was specified as the target to truncate in TRUNCATE command is > collected and passed to FDW. Does this really need to be passed to FDW? Seems > Stephen, Michael and I think that's necessary. But Kaigai-san does not. I > also think that TRUNCATE_REL_CONTEXT_CASCADING can be removed because there > seems no use case for that maybe. > > 2. Currently when the same foreign table is specified multiple times in the > command, the extra information only for the foreign table found first is > collected. For example, when "TRUNCATE ft, ONLY ft" is executed, > TRUNCATE_REL_CONTEXT_NORMAL is collected and _ONLY is ignored because "ft" is > found first. Is this OK? Or we should collect all, e.g., both _NORMAL and > _ONLY should be collected in that example? I think that the current approach > (i.e., collect the extra info about table found first if the same table is > specified multiple times) is good because even local tables are also treated > the same way. But Kaigai-san does not. > > 3. Currently postgres_fdw specifies ONLY clause in TRUNCATE command that it > constructs. That is, if the foreign table is specified with ONLY, > postgres_fdw also issues the TRUNCATE command for the corresponding remote > table with ONLY to the remote server. Then only root table is truncated in > remote server side, and the tables inheriting that are not truncated. Is this > behavior desirable? Seems Michael and I think this behavior is OK. But > Kaigai-san does not. > Prior to the discussion of 1-3, I like to clarify the role of foreign-tables. (Likely, it will lead a natural conclusion for the above open items.) As literal of SQL/MED (Management of External Data), a foreign table is a representation of external data in PostgreSQL. It allows to read and (optionally) write the external data wrapped by FDW drivers, as if we usually read / write heap tables. By the FDW-APIs, the core PostgreSQL does not care about the structure, location, volume and other characteristics of the external data itself. It expects FDW-APIs invocation will perform as if we access a regular heap table. On the other hands, we can say local tables are representation of "internal" data in PostgreSQL. A heap table is consists of one or more files (per BLCKSZ * RELSEG_SIZE), and table-am intermediates the on-disk data to/from on-memory structure (TupleTableSlot). Here are no big differences in the concept. Ok? As you know, ONLY clause controls whether TRUNCATE command shall run on child-tables also, not only the parent. If "ONLY parent_table" is given, its child tables are not picked up by ExecuteTruncate(), unless child tables are not listed up individually. Then, once ExecuteTruncate() picked up the relations, it makes the relations empty using table-am (relation_set_new_filenode), and the callee (heapam_relation_set_new_filenode) does not care about whether the table is specified with ONLY, or not. It just makes the data represented by the table empty (in transactional way). So, how foreign tables shall perform? Once ExecuteTruncate() picked up a foreign table, according to ONLY-clause, does FDW driver shall consider the context where the foreign tables are specified? And, what behavior is consistent? I think that FDW driver shall make the external data represented by the foreign table empty, regardless of the structure, location, volume and others. Therefore, if we follow the above assumption, we don't need to inform the context where foreign-tables are picked up (TRUNCATE_REL_CONTEXT_*), so postgres_fdw shall not control the remote TRUNCATE query according to the flags. It always truncate the entire tables (if multiple) on behalf of the foreign tables. As an aside, if postgres_fdw maps are remote table with "ONLY" clause, it is exactly a situation where we add "ONLY" clause on the truncate command, because it is a representation of the remote "ONLY parent_table" in this case. How about your thought? -- HeteroDB, Inc / The PG-Strom Project KaiGai Kohei
Re: psql - add SHOW_ALL_RESULTS option
On Fri, Apr 9, 2021 at 6:36 AM Michael Paquier wrote: > > On Fri, Apr 09, 2021 at 01:11:35AM +0800, Julien Rouhaud wrote: > > On Thu, Apr 08, 2021 at 07:04:01PM +0200, Fabien COELHO wrote: > >> It is definitely a open item. I'm not sure where you want to add it… > >> possibly the "Pg 14 Open Items" wiki page? > > > > Correct. > > I was running a long query this morning and wondered why the > cancellation was suddenly broken. So I am not alone, and here you are > with already a solution :) > > So, studying through 3a51306, this stuff has changed the query > execution from a sync PQexec() to an async PQsendQuery(). And the > proposed fix changes back to the behavior where the cancellation > reset happens after getting a result, as there is no need to cancel > anything. > > No strong objections from here if the consensus is to make > SendQueryAndProcessResults() handle the cancel reset properly though I > am not sure if this is the cleanest way to do things, but let's make > at least the whole business consistent in the code for all those code > paths. For example, PSQLexecWatch() does an extra ResetCancelConn() > that would be useless once we are done with > SendQueryAndProcessResults(). Also, I can see that > SendQueryAndProcessResults() would not issue a cancel reset if the > query fails, for \watch when cancel is pressed, and for \watch with > COPY. So, my opinion here would be to keep ResetCancelConn() within > PSQLexecWatch(), just add an extra one in SendQuery() to make all the > three code paths printing results consistent, and leave > SendQueryAndProcessResults() out of the cancellation logic. Hi, I'm also facing the query cancellation issue, I have to kill the backend everytime to cancel a query, it's becoming difficult. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: [PATCH] force_parallel_mode and GUC categories
On Fri, Apr 09, 2021 at 10:50:53AM +0900, Michael Paquier wrote: > On Sat, Apr 03, 2021 at 08:25:46PM -0500, Justin Pryzby wrote: > > Forking this thread > > https://www.postgresql.org/message-id/20210403154336.GG29125%40momjian.us > > Didn't see this one, thanks for forking. > > - {"force_parallel_mode", PGC_USERSET, QUERY_TUNING_OTHER, > + {"force_parallel_mode", PGC_USERSET, DEVELOPER_OPTIONS, > And not this one either, as it is mainly a planner thing, like the > other parameters in the same area. This is the main motive behind the patch. Developer options aren't shown in postgresql.conf.sample, which it seems like sometimes people read through quickly, setting a whole bunch of options that sound good, sometimes including this one. And in the best case they then ask on -performance why their queries are slow and we tell them to turn it back off to fix their issues. This changes to no longer put it in .sample, and calling it a "dev" option seems to be the classification and mechanism by which to do that. -- Justin ps, Maybe you saw that I'd already resent without including the accidental junk hunks.
Re: Why is Query NOT getting cancelled with SIGINT in PG14?
On Fri, Apr 9, 2021 at 8:38 AM Justin Pryzby wrote: > > On Fri, Apr 09, 2021 at 08:24:51AM +0530, Bharath Rupireddy wrote: > > Looks like the running query is not getting cancelled even though I > > issue CTRL+C from psql or kill the backend with SIGINT. This only > > happens with PG14 not in PG13. Am I missing something here? Is it a > > bug? > > Yes, see here: > https://www.postgresql.org/message-id/flat/OSZPR01MB631017521EE6887ADC9492E8FD759%40OSZPR01MB6310.jpnprd01.prod.outlook.com#e9228ef1ae32315f8b0df3fa67a32e06 Thanks. I missed to follow that thread. I will respond there. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Why is Query NOT getting cancelled with SIGINT in PG14?
On Fri, Apr 09, 2021 at 08:24:51AM +0530, Bharath Rupireddy wrote: > Looks like the running query is not getting cancelled even though I > issue CTRL+C from psql or kill the backend with SIGINT. This only > happens with PG14 not in PG13. Am I missing something here? Is it a > bug? Yes, see here: https://www.postgresql.org/message-id/flat/OSZPR01MB631017521EE6887ADC9492E8FD759%40OSZPR01MB6310.jpnprd01.prod.outlook.com#e9228ef1ae32315f8b0df3fa67a32e06 -- Justin
Re: missing documentation for streaming in-progress transactions
On Fri, Apr 9, 2021 at 10:23 AM Euler Taveira wrote: > > I didn't like this style because it is not descriptive enough. It is also > not a > style adopted by Postgres. I suggest to add something like "This field was > introduced in version 2" or "This field is available since version 2" > after the > field description. > I have updated this to "Since protocol version 2" > > +Xid of the sub-transaction (will be same as xid of the > transaction for top-level > +transactions). > + > > Although, sub-transaction is also used in the documentation, I suggest to > use > subtransaction. Maybe change the other sub-transaction occurrences too. > Updated. regards, Ajin Cherian Fujitsu Australia v4-0001-doc-Update-information-of-new-messages-for-logica.patch Description: Binary data
Why is Query NOT getting cancelled with SIGINT in PG14?
Hi, Looks like the running query is not getting cancelled even though I issue CTRL+C from psql or kill the backend with SIGINT. This only happens with PG14 not in PG13. Am I missing something here? Is it a bug? create table t1(a1 int); insert into t1 select * from generate_series(1,100); --> I chose an intentionally long running query, now either issue CTRL+C or kill the backend with SIGINT, the query doesn't get cancelled. Note that I don't even see "Cancel request sent" message on psql when I issue CTRL+C. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: test runner (was Re: SQL-standard function body)
Hi, On 2021-04-09 08:39:46 +0900, Michael Paquier wrote: > On Thu, Apr 08, 2021 at 10:50:39AM -0700, Andres Freund wrote: > > Obviously all very far from being ready, but this seemed like a good > > enough excuse to mention it ;) > > This is nice. Are there any parallelism capabilities? Yes. It defaults to number-of-cores processes, but obviously can also be specified explicitly. One very nice part about it is that it'd work largely the same on windows (which has practically unusable testing right now). It probably doesn't yet, because I just tried to get it build and run tests at all, but it shouldn't be a lot of additional work. Greetings, Andres Freund
Re: Simplify backend terminate and wait logic in postgres_fdw test
At Fri, 9 Apr 2021 10:59:44 +0900, Michael Paquier wrote in > On Fri, Apr 09, 2021 at 06:53:21AM +0530, Bharath Rupireddy wrote: > > I didn't think of the warning cases, my bad. How about using SET > > client_min_messages = 'ERROR'; before we call > > pg_wait_for_backend_termination? We can only depend on the return > > value of pg_wait_for_backend_termination, when true we can exit. This > > way the buildfarm will not see warnings. Thoughts? > > You could do that, but I would also bet that this is going to get > forgotten in the future if this gets extended in more SQL tests that > are output-sensitive, in or out of core. Honestly, I can get behind a > warning in pg_wait_for_backend_termination() to inform that the > process poked at is not a PostgreSQL one, because it offers new and > useful information to the user. But, and my apologies for sounding a > bit noisy, I really don't get why pg_wait_until_termination() has any > need to do that. From what I can see, it provides the following > information: > - A PID, that we already know from the caller or just from > pg_stat_activity. > - A timeout, already known as well. > - The fact that the process did not terminate, information given by > the "false" status, only used in this case. > > So there is no new information here to the user, only a duplicate of > what's already known to the caller of this function. I see more > advantages in removing this WARNING rather than keeping it. FWIW I agree to Michael. I faintly remember that I thought the same while reviewing but it seems that I forgot to write a comment like that. It's a work of the caller, concretely the existing callers and any possible script that calls the function. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: TRUNCATE on foreign table
On Thu, Apr 8, 2021 at 6:47 PM Bharath Rupireddy < bharath.rupireddyforpostg...@gmail.com> wrote: > On Thu, Apr 8, 2021 at 6:44 PM Fujii Masao > wrote: > > The followings are the open items and discussion points that I'm > thinking of. > > > > 1. Currently the extra information (TRUNCATE_REL_CONTEXT_NORMAL, > TRUNCATE_REL_CONTEXT_ONLY or TRUNCATE_REL_CONTEXT_CASCADING) about how a > foreign table was specified as the target to truncate in TRUNCATE command > is collected and passed to FDW. Does this really need to be passed to FDW? > Seems Stephen, Michael and I think that's necessary. But Kaigai-san does > not. I also think that TRUNCATE_REL_CONTEXT_CASCADING can be removed > because there seems no use case for that maybe. > > I think we should remove the unused enums/macros, instead we could > mention a note of the extensibility of those enums/macros in the > comments section around the enum/macro definitions. > > > 2. Currently when the same foreign table is specified multiple times in > the command, the extra information only for the foreign table found first > is collected. For example, when "TRUNCATE ft, ONLY ft" is executed, > TRUNCATE_REL_CONTEXT_NORMAL is collected and _ONLY is ignored because "ft" > is found first. Is this OK? Or we should collect all, e.g., both _NORMAL > and _ONLY should be collected in that example? I think that the current > approach (i.e., collect the extra info about table found first if the same > table is specified multiple times) is good because even local tables are > also treated the same way. But Kaigai-san does not. > > IMO, the foreign truncate command should be constructed by collecting > all the information i.e. "TRUNCATE ft, ONLY ft" and let the remote > server execute how it wants to execute. That will be consistent and no > extra logic is required to track the already seen foreign tables while > foreign table collection/foreign truncate command is being prepared on > the local server. > > I was thinking that the postgres throws error or warning for commands > such as truncate, vaccum, analyze when the same tables are specified, > but seems like that's not what it does. > > > 3. Currently postgres_fdw specifies ONLY clause in TRUNCATE command that > it constructs. That is, if the foreign table is specified with ONLY, > postgres_fdw also issues the TRUNCATE command for the corresponding remote > table with ONLY to the remote server. Then only root table is truncated in > remote server side, and the tables inheriting that are not truncated. Is > this behavior desirable? Seems Michael and I think this behavior is OK. But > Kaigai-san does not. > > I'm okay with the behaviour as it is consistent with what ONLY does to > local tables. Documenting this behaviour(if not done already) is a > better way I think. > > > 4. Tab-completion for TRUNCATE should be updated so that also foreign > tables are displayed. > > It will be good to have. > > With Regards, > Bharath Rupireddy. > EnterpriseDB: http://www.enterprisedb.com w.r.t. point #1: bq. I think we should remove the unused enums/macros, I agree. When there is more concrete use case which requires new enum, we can add enum whose meaning would be clearer. Cheers
Re: Simplify backend terminate and wait logic in postgres_fdw test
On Fri, Apr 09, 2021 at 06:53:21AM +0530, Bharath Rupireddy wrote: > I didn't think of the warning cases, my bad. How about using SET > client_min_messages = 'ERROR'; before we call > pg_wait_for_backend_termination? We can only depend on the return > value of pg_wait_for_backend_termination, when true we can exit. This > way the buildfarm will not see warnings. Thoughts? You could do that, but I would also bet that this is going to get forgotten in the future if this gets extended in more SQL tests that are output-sensitive, in or out of core. Honestly, I can get behind a warning in pg_wait_for_backend_termination() to inform that the process poked at is not a PostgreSQL one, because it offers new and useful information to the user. But, and my apologies for sounding a bit noisy, I really don't get why pg_wait_until_termination() has any need to do that. From what I can see, it provides the following information: - A PID, that we already know from the caller or just from pg_stat_activity. - A timeout, already known as well. - The fact that the process did not terminate, information given by the "false" status, only used in this case. So there is no new information here to the user, only a duplicate of what's already known to the caller of this function. I see more advantages in removing this WARNING rather than keeping it. -- Michael signature.asc Description: PGP signature
Re: Lots of incorrect comments in nodeFuncs.c
On Thu, Apr 08, 2021 at 09:21:30PM -0400, Alvaro Herrera wrote: > On 2021-Apr-08, Tom Lane wrote: >> Maybe like >> >> case T_ScalarArrayOpExpr: >> /* ScalarArrayOpExpr's result is boolean ... */ >> coll = InvalidOid; /* ... so it has no collation */ >> break; > > This is much clearer, yeah. +1. -- Michael signature.asc Description: PGP signature
Re: [PATCH] force_parallel_mode and GUC categories
On Sat, Apr 03, 2021 at 08:25:46PM -0500, Justin Pryzby wrote: > Forking this thread > https://www.postgresql.org/message-id/20210403154336.GG29125%40momjian.us Didn't see this one, thanks for forking. > I understood "developer" to mean someone who's debugging postgres itself, not > (say) a function written using pl/pgsql. Like backtrace_functions, > post_auth_delay, jit_profiling_support. > > But I see that some "dev" options are more user-facing (for a sufficiently > advanced user): > ignore_checksum_failure, ignore_invalid_pages, zero_damaged_pages. > > Also, I understood this to mean the "category" in pg_settings, but I guess > what's important here is the absense of the GUC in the sample/template config > file. pg_settings.category and the sample headings it appears are intended to > be synchronized, but a few of them are out of sync. See attached. > > +1 to move this to "developer" options and remove it from the sample config: > > # - Other Planner Options - > #force_parallel_mode = off 0001 has some changes to pg_config_manual.h related to valgrind and memory randomization. You may want to remove that before posting a patch. - {"track_commit_timestamp", PGC_POSTMASTER, REPLICATION, + {"track_commit_timestamp", PGC_POSTMASTER, REPLICATION_SENDING, I can get behind this change for clarity where it gets actively used. - {"track_activity_query_size", PGC_POSTMASTER, RESOURCES_MEM, + {"track_activity_query_size", PGC_POSTMASTER, STATS_COLLECTOR, But not this one, because it is a memory setting. - {"force_parallel_mode", PGC_USERSET, QUERY_TUNING_OTHER, + {"force_parallel_mode", PGC_USERSET, DEVELOPER_OPTIONS, And not this one either, as it is mainly a planner thing, like the other parameters in the same area. The last change is related to log_autovacuum_min_duration, and I can get behind the argument you are making to group all log activity parameters together. Now, about this part: +#log_autovacuum_min_duration = -1 # -1 disables, 0 logs all actions and + # their durations, > 0 logs only + # actions running at least this number + # of milliseconds. I think that we should clarify in the description that this is an autovacuum-only thing, say by appending a small sentence about the fact that it logs autovacuum activities, in a similar fashion to log_temp_files. Moving the parameter out of the autovacuum section makes it lose a bit of context. @@ -6903,6 +6903,7 @@ fetch_more_data_begin(AsyncRequest *areq) charsql[64]; Assert(!fsstate->conn_state->pendingAreq); + Assert(fsstate->conn); What's this diff doing here? -- Michaelx signature.asc Description: PGP signature
Re: TRUNCATE on foreign table
On Thu, Apr 8, 2021 at 6:44 PM Fujii Masao wrote: > The followings are the open items and discussion points that I'm thinking of. > > 1. Currently the extra information (TRUNCATE_REL_CONTEXT_NORMAL, > TRUNCATE_REL_CONTEXT_ONLY or TRUNCATE_REL_CONTEXT_CASCADING) about how a > foreign table was specified as the target to truncate in TRUNCATE command is > collected and passed to FDW. Does this really need to be passed to FDW? Seems > Stephen, Michael and I think that's necessary. But Kaigai-san does not. I > also think that TRUNCATE_REL_CONTEXT_CASCADING can be removed because there > seems no use case for that maybe. I think we should remove the unused enums/macros, instead we could mention a note of the extensibility of those enums/macros in the comments section around the enum/macro definitions. > 2. Currently when the same foreign table is specified multiple times in the > command, the extra information only for the foreign table found first is > collected. For example, when "TRUNCATE ft, ONLY ft" is executed, > TRUNCATE_REL_CONTEXT_NORMAL is collected and _ONLY is ignored because "ft" is > found first. Is this OK? Or we should collect all, e.g., both _NORMAL and > _ONLY should be collected in that example? I think that the current approach > (i.e., collect the extra info about table found first if the same table is > specified multiple times) is good because even local tables are also treated > the same way. But Kaigai-san does not. IMO, the foreign truncate command should be constructed by collecting all the information i.e. "TRUNCATE ft, ONLY ft" and let the remote server execute how it wants to execute. That will be consistent and no extra logic is required to track the already seen foreign tables while foreign table collection/foreign truncate command is being prepared on the local server. I was thinking that the postgres throws error or warning for commands such as truncate, vaccum, analyze when the same tables are specified, but seems like that's not what it does. > 3. Currently postgres_fdw specifies ONLY clause in TRUNCATE command that it > constructs. That is, if the foreign table is specified with ONLY, > postgres_fdw also issues the TRUNCATE command for the corresponding remote > table with ONLY to the remote server. Then only root table is truncated in > remote server side, and the tables inheriting that are not truncated. Is this > behavior desirable? Seems Michael and I think this behavior is OK. But > Kaigai-san does not. I'm okay with the behaviour as it is consistent with what ONLY does to local tables. Documenting this behaviour(if not done already) is a better way I think. > 4. Tab-completion for TRUNCATE should be updated so that also foreign tables > are displayed. It will be good to have. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: [POC] Fast COPY FROM command for the table with foreign partitions
On Thu, Apr 8, 2021 at 5:49 PM Justin Pryzby wrote: > I rebased this patch to resolve a trivial 1 line conflict from c5b7ba4e6. > > -- > Justin > Hi, In src/backend/commands/copyfrom.c : + if (resultRelInfo->ri_RelationDesc->rd_rel->relkind == RELKIND_FOREIGN_TABLE) There are a few steps of indirection. Adding assertion before the if statement on resultRelInfo->ri_RelationDesc, etc would help catch potential invalid pointer. +CopyToStart(CopyToState cstate) ... +CopyToFinish(CopyToState cstate) Since 'copy to' is the action, it would be easier to read the method names if they're called StartCopyTo, FinishCopyTo, respectively. That way, the method names would be consistent with existing ones, such as: extern uint64 DoCopyTo(CopyToState cstate); +* If a partition's root parent isn't allowed to use it, neither is the In the above sentence, 'it' refers to multi insert. It would be more readable to explicitly mention 'multi insert' instead of 'it' Cheers
RE: 2019-03 CF now in progress
From: David Steele > Overall, 118 of 262 entries were closed during this commitfest (45%). > That includes 91 patches committed since March 1, which is pretty > fantastic. Thank you to everyone, especially the committers, for your > hard work! The number of committed patches in the last CF is record-breaking in recent years: PG 14: 124 PG 13: 90 PG 12: 100 PG 11: 117 PG 10: 116 PG 9.6: 74 PG 9.5: 102 I take off my hat to the hard work of committers and CFM. OTOH, there seems to be many useful-looking features pending as ready for committer. I look forward to seeing them committed early in PG 15. Regards Takayuki Tsunakawa
Re: Simplify backend terminate and wait logic in postgres_fdw test
On Fri, Apr 9, 2021 at 5:51 AM Michael Paquier wrote: > > On Thu, Apr 08, 2021 at 06:27:56PM +0530, Bharath Rupireddy wrote: > > Agree. Please see the attached patch, I removed a fixed waiting time. > > Instead of relying on pg_stat_activity, pg_sleep and > > pg_stat_clear_snapshot, now it depends on pg_terminate_backend and > > pg_wait_for_backend_termination. This way we could reduce the > > functions that the procedure terminate_backend_and_wait uses and also > > the new functions pg_terminate_backend and > > pg_wait_for_backend_termination gets a test coverage. > > + EXIT WHEN is_terminated; > + SELECT * INTO is_terminated FROM > pg_wait_for_backend_termination(pid_v, 1000); > This is still a regression if the termination takes more than 1s, > no? In such a case terminate_backend_and_wait() would issue a WARNING > and pollute the regression test output. I can see the point of what > you are achieving here, and that's a good idea, but from the point of > view of the buildfarm the WARNING introduced by aaf0432 is a no-go. I didn't think of the warning cases, my bad. How about using SET client_min_messages = 'ERROR'; before we call pg_wait_for_backend_termination? We can only depend on the return value of pg_wait_for_backend_termination, when true we can exit. This way the buildfarm will not see warnings. Thoughts? With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: Lots of incorrect comments in nodeFuncs.c
On 2021-Apr-08, Tom Lane wrote: > Maybe like > > case T_ScalarArrayOpExpr: > /* ScalarArrayOpExpr's result is boolean ... */ > coll = InvalidOid; /* ... so it has no collation */ > break; This is much clearer, yeah. -- Álvaro Herrera Valdivia, Chile
Re: 2019-03 CF now in progress
On Thu, Apr 08, 2021 at 09:49:20AM -0400, David Steele wrote: > Overall, 118 of 262 entries were closed during this commitfest (45%). That > includes 91 patches committed since March 1, which is pretty fantastic. > Thank you to everyone, especially the committers, for your hard work! I think that the biggest thanks here goes to you, for cleaning the CF entries completely. So, thanks! -- Michael signature.asc Description: PGP signature
Re: psql - add SHOW_ALL_RESULTS option
On Fri, Apr 09, 2021 at 01:11:35AM +0800, Julien Rouhaud wrote: > On Thu, Apr 08, 2021 at 07:04:01PM +0200, Fabien COELHO wrote: >> It is definitely a open item. I'm not sure where you want to add it… >> possibly the "Pg 14 Open Items" wiki page? > > Correct. I was running a long query this morning and wondered why the cancellation was suddenly broken. So I am not alone, and here you are with already a solution :) So, studying through 3a51306, this stuff has changed the query execution from a sync PQexec() to an async PQsendQuery(). And the proposed fix changes back to the behavior where the cancellation reset happens after getting a result, as there is no need to cancel anything. No strong objections from here if the consensus is to make SendQueryAndProcessResults() handle the cancel reset properly though I am not sure if this is the cleanest way to do things, but let's make at least the whole business consistent in the code for all those code paths. For example, PSQLexecWatch() does an extra ResetCancelConn() that would be useless once we are done with SendQueryAndProcessResults(). Also, I can see that SendQueryAndProcessResults() would not issue a cancel reset if the query fails, for \watch when cancel is pressed, and for \watch with COPY. So, my opinion here would be to keep ResetCancelConn() within PSQLexecWatch(), just add an extra one in SendQuery() to make all the three code paths printing results consistent, and leave SendQueryAndProcessResults() out of the cancellation logic. >> I tried but I do not have enough >> privileges, if you can do it please proceed. I added an entry in the next CF >> in the bugfix section. > > That's strange, I don't think you need special permission there. It's > working for me so I added an item with a link to the patch! As long as you have a community account, you should have the possibility to edit the page. So if you feel that any change is required, please feel free to do so, of course. -- Michael signature.asc Description: PGP signature
Re: [POC] Fast COPY FROM command for the table with foreign partitions
I rebased this patch to resolve a trivial 1 line conflict from c5b7ba4e6. -- Justin >From 0987ca4f62fb8c9b43a3fe142d955d8a9cb6f36f Mon Sep 17 00:00:00 2001 From: Takayuki Tsunakawa Date: Tue, 9 Feb 2021 12:50:00 +0900 Subject: [PATCH] Fast COPY FROM into the foreign or sharded table. This feature enables bulk COPY into foreign table when multi-insert is possible and foreign table has non-zero number of columns. The following routines are added to the FDW interface: * BeginForeignCopy * ExecForeignCopy * EndForeignCopy BeginForeignCopy and EndForeignCopy initialize and free the CopyState of bulk COPY. The ExecForeignCopy routine runs 'COPY ... FROM STDIN' command to the foreign server, in an iterative manner to send tuples using the CopyTo() machinery. Code that constructs a list of columns for a given foreign relation in the deparseAnalyzeSql() routine is split into deparseRelColumnList(). It is reused in deparseCopyFromSql(). Added TAP-tests on the specific corner cases of COPY FROM STDIN operation. By the analogy of CopyFrom() the CopyState structure was extended with data_dest_cb callback. It is used to send the text representation of a tuple to a custom destination. The PgFdwModifyState structure is extended with the cstate field. It is needed for avoid repeated initialization of CopyState. Also for this reason CopyTo() routine is split into the set of routines CopyToStart()/ CopyTo()/CopyToFinish(). When 0d5f05cde introduced support for using multi-insert mode when copying into partitioned tables, it introduced single variable of enum type CopyInsertMethod shared across all potential target relations (partitions) that, along with some target relation properties, dictated whether to engage multi-insert mode for a given target relation. Change that decision logic to the combination of ExecMultiInsertAllowed() and its caller. The former encapsulates the common criteria to allow multi-insert. The latter uses additional criteria and sets the new boolean field ri_usesMultiInsert of ResultRelInfo. That prevents repeated computation of the same information in some cases, especially for partitions, and the new arrangement results in slightly more readability. Enum CopyInsertMethod is removed. Authors: Andrey Lepikhov, Ashutosh Bapat, Amit Langote, Takayuki Tsunakawa Reviewed-by: Ashutosh Bapat, Amit Langote, Takayuki Tsunakawa Discussion: https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru --- contrib/postgres_fdw/deparse.c| 63 +++- .../postgres_fdw/expected/postgres_fdw.out| 46 ++- contrib/postgres_fdw/postgres_fdw.c | 141 + contrib/postgres_fdw/postgres_fdw.h | 1 + contrib/postgres_fdw/sql/postgres_fdw.sql | 45 +++ doc/src/sgml/fdwhandler.sgml | 71 - src/backend/commands/copy.c | 2 +- src/backend/commands/copyfrom.c | 271 -- src/backend/commands/copyto.c | 88 -- src/backend/executor/execMain.c | 44 +++ src/backend/executor/execPartition.c | 37 ++- src/include/commands/copy.h | 5 + src/include/commands/copyfrom_internal.h | 10 - src/include/executor/executor.h | 1 + src/include/foreign/fdwapi.h | 15 + src/include/nodes/execnodes.h | 8 +- 16 files changed, 637 insertions(+), 211 deletions(-) diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index bdc4c3620d..bf93c1d091 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -185,6 +185,8 @@ static void appendAggOrderBy(List *orderList, List *targetList, static void appendFunctionName(Oid funcid, deparse_expr_cxt *context); static Node *deparseSortGroupClause(Index ref, List *tlist, bool force_colno, deparse_expr_cxt *context); +static List *deparseRelColumnList(StringInfo buf, Relation rel, + bool enclose_in_parens); /* * Helper functions @@ -1859,6 +1861,23 @@ deparseUpdateSql(StringInfo buf, RangeTblEntry *rte, withCheckOptionList, returningList, retrieved_attrs); } +/* + * Deparse remote COPY FROM statement + * + * Note that this explicitly specifies the list of COPY's target columns + * to account for the fact that the remote table's columns may not match + * exactly with the columns declared in the local definition. + */ +void +deparseCopyFromSql(StringInfo buf, Relation rel) +{ + appendStringInfoString(buf, "COPY "); + deparseRelation(buf, rel); + (void) deparseRelColumnList(buf, rel, true); + + appendStringInfoString(buf, " FROM STDIN "); +} + /* * deparse remote UPDATE statement * @@ -2120,6 +2139,30 @@ deparseAnalyzeSizeSql(StringInfo buf, Relation rel) */ void deparseAnalyzeSql(StringInfo buf, Relation rel, List **retrieved_attrs) +{ + appendStringInfoString(buf, "SELECT "); + *retrieved_attrs = deparseRelColumnList(buf, rel,
Re: Remove page-read callback from XLogReaderState.
At Thu, 8 Apr 2021 23:51:34 +1200, Thomas Munro wrote in > On Thu, Apr 8, 2021 at 9:46 PM Thomas Munro wrote: > > I squashed the patch set into one because half of them were fixups, > > and the two main patches were really parts of the same change and > > should go in together. > > > > I fixed a few compiler warnings (GCC 10.2 reported several > > uninitialised variables, comparisons that are always true, etc) and > > some comments. You can see these in the fixup patch. > > Pushed. Luckily there are plenty more improvements possible for > XLogReader/XLogDecoder in the next cycle. I'm surprised to see this pushed this soon. Thanks for pushing this! And thanks for fixing the remaining mistakes including some stupid ones.. At Thu, 8 Apr 2021 10:04:26 +1200, Thomas Munro wrote in > There is a stray elog(HOGE) :-) U! This looks like getting slipped-in while investigating another issue.. Thanks for preventing the repository from being contaminated by such a thing.. At Thu, 8 Apr 2021 21:46:06 +1200, Thomas Munro wrote in > I think maybe it it should really be XLogReaderSetInputData(state, > tli, data, size) in a later release. In the meantime, I changed it to > XLogReaderSetInputData(state, size), hope that name make sense... Sounds better. I didn't like that page-readers are allowed to touch XLogReaderStats.seg directly. Anyway it would be a small change. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: missing documentation for streaming in-progress transactions
On Thu, Apr 8, 2021, at 4:25 AM, Ajin Cherian wrote: > Updated. - Protocol version. Currently only version 1 is - supported. - + Protocol version. Currently versions 1 and + 2 are supported. The version 2 + is supported only for server versions 14 and above, and is used to allow + streaming of large in-progress transactions. + s/server versions/server version/ I suggest that the last part of the sentence might be "and it allows streaming of large in-progress transactions" + Since: 2 + + I didn't like this style because it is not descriptive enough. It is also not a style adopted by Postgres. I suggest to add something like "This field was introduced in version 2" or "This field is available since version 2" after the field description. +Xid of the sub-transaction (will be same as xid of the transaction for top-level +transactions). + Although, sub-transaction is also used in the documentation, I suggest to use subtransaction. Maybe change the other sub-transaction occurrences too. -- Euler Taveira EDB https://www.enterprisedb.com/
RE: [PATCH] Tracking statements entry timestamp in pg_stat_statements
Dear Andrei, > I think, yes, version of pg_stat_statements is need to be updated. Is > it will be 1.10 in PG15? I think you are right. According to [1] we can bump up the version per one PG major version, and any features are not committed yet for 15. [1]: https://www.postgresql.org/message-id/20201202040516.GA43757@nol Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: Simplify backend terminate and wait logic in postgres_fdw test
On Thu, Apr 08, 2021 at 06:27:56PM +0530, Bharath Rupireddy wrote: > Agree. Please see the attached patch, I removed a fixed waiting time. > Instead of relying on pg_stat_activity, pg_sleep and > pg_stat_clear_snapshot, now it depends on pg_terminate_backend and > pg_wait_for_backend_termination. This way we could reduce the > functions that the procedure terminate_backend_and_wait uses and also > the new functions pg_terminate_backend and > pg_wait_for_backend_termination gets a test coverage. + EXIT WHEN is_terminated; + SELECT * INTO is_terminated FROM pg_wait_for_backend_termination(pid_v, 1000); This is still a regression if the termination takes more than 1s, no? In such a case terminate_backend_and_wait() would issue a WARNING and pollute the regression test output. I can see the point of what you are achieving here, and that's a good idea, but from the point of view of the buildfarm the WARNING introduced by aaf0432 is a no-go. I honestly don't quite get the benefit in issuing a WARNING in this case anyway, as the code already returns false on timeout so as caller would know the status of the operation. -- Michael signature.asc Description: PGP signature
Re: Lots of incorrect comments in nodeFuncs.c
David Rowley writes: > hmm ok. I imagine there must be a better way to say that then since > it confused at least 1 reader so far. My problem is that I assumed > "result" meant the result of the function that the comment is written > in, not the result of evaluating the given expression during > execution. If that was more clear then I'd not have been misled. Maybe like case T_ScalarArrayOpExpr: /* ScalarArrayOpExpr's result is boolean ... */ coll = InvalidOid; /* ... so it has no collation */ break; ? regards, tom lane
Re: SQL:2011 PERIODS vs Postgres Ranges?
On Thu, Apr 8, 2021 at 7:22 AM David Steele wrote: > > Paul, you can submit to the next CF when you are ready with a new patch. Thanks David! I've made a lot of progress but still need to finish support for CASCADE on temporal foreign keys. I've been swamped with other things, but hopefully I can get something during this current CF. Paul
Re: test runner (was Re: SQL-standard function body)
On Thu, Apr 08, 2021 at 10:50:39AM -0700, Andres Freund wrote: > Obviously all very far from being ready, but this seemed like a good > enough excuse to mention it ;) This is nice. Are there any parallelism capabilities? -- Michael signature.asc Description: PGP signature
Re: Lots of incorrect comments in nodeFuncs.c
On Fri, 9 Apr 2021 at 10:11, Tom Lane wrote: > > David Rowley writes: > > I noticed that nodeFuncs.c appears to have some pretty sloppy work > > done in many of the comments. Many look like they've just not been > > updated from a copy/paste/edit from another node function. > > The attached aims to clean these up. > > I believe every one of these changes is wrong. > For instance: > > case T_ScalarArrayOpExpr: > - coll = InvalidOid; /* result is always boolean */ > + coll = InvalidOid; /* result is always > InvalidOid */ > break; > > The point here is that the result type of ScalarArrayOpExpr is boolean, > which has no collation, therefore reporting its collation as InvalidOid > is correct. Maybe there's a clearer way to say that, but your text is > more confusing not less so. hmm ok. I imagine there must be a better way to say that then since it confused at least 1 reader so far. My problem is that I assumed "result" meant the result of the function that the comment is written in, not the result of evaluating the given expression during execution. If that was more clear then I'd not have been misled. David
Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
On Fri, 9 Apr 2021 at 09:32, Tomas Vondra wrote: > > I ran the same set of benchmarks on the v6, which I think should be > mostly identical to what was committed. I extended the test a bit to > test table with 0, 1, 5 and 1000 rows, and also with int and text > values, to see how it works with more expensive comparators. > > I built binaries with gcc 9.2.0 and clang 11.0.0, the full results are > attached. There's a bit of difference between gcc and clang, but the > general behavior is about the same, so I'll only present gcc results to > keep this simple. I'll only throughput comparison to master, so >1.0 > means good, <1.0 means bad. If you're interested in actual tps, see the > full results. > > For the v5 patch (actually v4-0001 + v5-0002) and v6, the results are: > > integer column / v5 > === > > rows 10/p100/p 16/p 10/s100/s 16/s > --- >0 97% 97% 97% 107% 126% 108% >1 95% 82% 94% 108% 132% 110% >5 95% 83% 95% 108% 132% 110% > 1000 129% 481% 171% 131% 382% 165% I think we should likely ignore the v5 patch now. The reason is that it was pretty much unfinished and there were many places that I'd not yet added support for the HashedScalarArrayOpExpr node type yet. This could cause these nodes to be skipped during node mutations or node walking which would certainly make planning faster, just not in a way that's correct. > integer column / v6 > === > > rows 10/p100/p 16/p 10/s100/s 16/s > --- >0 97% 97% 97% 98% 98% 98% >1 96% 84% 95% 97% 97% 98% >5 97% 85% 96% 98% 97% 97% > 1000 129% 489% 172% 128% 330% 162% This is a really informative set of results. I can only guess that the slowdown of the 100/prepared query is down to building the hash table. I think that because the 0 rows test does not show the slowdown and we only build the table when evaluating for the first time. There's a slightly larger hit on 1 row vs 5 rows, which makes sense since the rewards of the hash lookup start paying off more with more rows. Looking at your tps numbers, I think I can see why we get the drop in performance with prepared statements but not simple statements. This seems to just be down to the fact that the planning time dominates in the simple statement case. For example, the "1 row" test for 100/s for v6 is 10023.3 tps, whereas the 100/p result is 44093.8 tps. With master, prepared gets 52400.0 tps. So we could say the hash table build costs us 8306.2 tps, or 3.59 microseconds per execution, per: postgres=# select 100 / 52400.0 - 100 / 44093.8; ?column? - -3.5949559161508538 (1 row) If we look at the tps for the simple query version of the same test. Master did 10309.6 tps, v6 did 10023.3 tps. If we apply that 3.59 microsecond slowdown to master's tps, then we get pretty close to within 1% of the v6 tps: postgres=# select 100 / (100 / 10309.6 + 3.59); ?column? --- 9941.6451581291294165 (1 row) > text column / v6 > > > rows 10/p100/p 16/p 10/s100/s 16/s > --- >0 101% 101% 101% 98% 99% 99% >1 98% 82% 96% 98% 96% 97% >5 100% 84% 98% 98% 96% 98% > 1000 297%1645% 408% 255%1000% 336% > > > Overall, the behavior for integer and text columns is the same, for both > patches. There's a couple interesting observations: > > 1) For the "simple" query mode, v5 helped quite a bit (20-30% speedup), > but v6 does not seem to help at all - it's either same or slower than > unpatched master. I think that's related to the fact that I didn't finish adding HashedScalarArrayOpExpr processing to all places that needed it. > I wonder why is that, and if we could get some of the speedup with v6? > At first I thought that maybe v5 is not building the hash table in cases > where v6 does, but that shouldn't be faster than master. I don't think v5 and v6 really do anything much differently in the executor. The only difference is really during ExecInitExprRec() when we initialize the expression. With v5 we had a case T_HashedScalarArrayOpExpr: to handle the new node type, but in v6 we have if (OidIsValid(opexpr->hashfuncid)). Oh, wait. I did add the missing permissions check on the hash function, so that will account for something. As far as I can see, that's required. > 2) For the "prepared" mode, there's a clear performance hit the
Re: Typo in jsonfuncs.c
Hi Julien and Amit Kapila, On 2021/04/08 17:33, Julien Rouhaud wrote: On Thu, Apr 08, 2021 at 10:06:56AM +0900, Tatsuro Yamada wrote: Hi, I found a typo in jsonfuncs.c, probably. s/an an/an/ Please find attached patch. For the archives' sake, this has been pushed as of 8ffb003591. Julien, thanks for the info! :-D Also, thanks for taking your time to push this, Amit. Regards, Tatsuro Yamada
Re: weird interaction between asynchronous queries and pg_sleep
On Thu, Apr 8, 2021 at 1:05 PM Merlin Moncure wrote: > This effect is only noticeable when the remote query is returning > volumes of data. My question is, is there any way to sleep loop > client side without giving up 3x performance penalty? Why is that > that when more local sleep queries are executed, performance improves? Looking at this more, it looks like that when sleeping with pg_sleep, libpq does not receive the data. I think for this type of pattern to work correctly, dblink would need a custom sleep function wrapping poll (or epoll) that consumes input on the socket when signalled read ready. merlin
Re: SQL-standard function body
On Thu, Apr 08, 2021 at 12:21:05PM -0400, Tom Lane wrote: > I see that contrib/test_decoding also sets NO_INSTALLCHECK = 1, > and the reason it gets tested is that the buildfarm script has > a special module for that. I guess we need to clone that module, > or maybe better, find a way to generalize it. > > There are also some src/test/modules modules that set NO_INSTALLCHECK, > but apparently those do have coverage (modules-check is the step that > runs their SQL tests, and then the TAP tests if any get broken out > as separate buildfarm steps). FWIW, on Windows any module with NO_INSTALLCHECK does not get tested as we rely mostly on an installed server to do all the tests and avoid the performance impact of setting up a new server for each module's test. -- Michael signature.asc Description: PGP signature
Re: pg_amcheck contrib application
On Thu, Apr 8, 2021 at 6:51 PM Mark Dilger wrote: > #2 is if chunk_seq goes up but skips numbers. #3 is if chunk_seq ever goes > down, meaning the index scan did something unexpected. Yeah, sure. But I think we could probably treat those the same way. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_amcheck contrib application
> On Apr 8, 2021, at 3:11 PM, Robert Haas wrote: > > On Thu, Apr 8, 2021 at 5:21 PM Mark Dilger > wrote: >> All this leads me to believe that we should report the following: >> >> 1) If the total number of chunks retrieved differs from the expected number, >> report how many we expected vs. how many we got >> 2) If the chunk_seq numbers are discontiguous, report each discontiguity. >> 3) If the index scan returned chunks out of chunk_seq order, report that >> 4) If any chunk is not the expected size, report that >> >> So, for your example of chunk 1 missing from chunks [0..17], we'd report >> that we got one fewer chunks than we expected, that the second chunk seen >> was discontiguous from the first chunk seen, that the final chunk seen was >> smaller than expected by M bytes, and that the total size was smaller than >> we expected by N bytes. The third of those is somewhat misleading, since >> the final chunk was presumably the right size; we just weren't expecting to >> hit a partial chunk quite yet. But I don't see how to make that better in >> the general case. > > Hmm, that might be OK. It seems like it's going to be a bit verbose in > simple cases like 1 missing chunk, but on the plus side, it avoids a > mountain of output if the raw size has been overwritten with a > gigantic bogus value. But, how is #2 different from #3? Those sound > like the same thing to me. #2 is if chunk_seq goes up but skips numbers. #3 is if chunk_seq ever goes down, meaning the index scan did something unexpected. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Lots of incorrect comments in nodeFuncs.c
David Rowley writes: > I noticed that nodeFuncs.c appears to have some pretty sloppy work > done in many of the comments. Many look like they've just not been > updated from a copy/paste/edit from another node function. > The attached aims to clean these up. I believe every one of these changes is wrong. For instance: case T_ScalarArrayOpExpr: - coll = InvalidOid; /* result is always boolean */ + coll = InvalidOid; /* result is always InvalidOid */ break; The point here is that the result type of ScalarArrayOpExpr is boolean, which has no collation, therefore reporting its collation as InvalidOid is correct. Maybe there's a clearer way to say that, but your text is more confusing not less so. Likewise, the point of the annotations in exprSetCollation is to not let a collation be applied to a node that must have a noncollatable result type. regards, tom lane
Re: pg_amcheck contrib application
On Thu, Apr 8, 2021 at 5:21 PM Mark Dilger wrote: > All this leads me to believe that we should report the following: > > 1) If the total number of chunks retrieved differs from the expected number, > report how many we expected vs. how many we got > 2) If the chunk_seq numbers are discontiguous, report each discontiguity. > 3) If the index scan returned chunks out of chunk_seq order, report that > 4) If any chunk is not the expected size, report that > > So, for your example of chunk 1 missing from chunks [0..17], we'd report that > we got one fewer chunks than we expected, that the second chunk seen was > discontiguous from the first chunk seen, that the final chunk seen was > smaller than expected by M bytes, and that the total size was smaller than we > expected by N bytes. The third of those is somewhat misleading, since the > final chunk was presumably the right size; we just weren't expecting to hit a > partial chunk quite yet. But I don't see how to make that better in the > general case. Hmm, that might be OK. It seems like it's going to be a bit verbose in simple cases like 1 missing chunk, but on the plus side, it avoids a mountain of output if the raw size has been overwritten with a gigantic bogus value. But, how is #2 different from #3? Those sound like the same thing to me. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Autovacuum on partitioned table (autoanalyze)
Justin Pryzby writes: > On Thu, Apr 08, 2021 at 05:56:25PM -0400, Alvaro Herrera wrote: >> This new bit reads weird: >> >> +Most parameters are not supported on partitioned tables, with exceptions >> +noted below; you may specify them for individual leaf partitions. > "Except where noted, these parameters are not supported on partitioned > tables." I think what it's trying to get at is "Except where noted, these parameters are not supported on partitioned tables. However, you can specify them on individual leaf partitions." regards, tom lane
Lots of incorrect comments in nodeFuncs.c
I noticed that nodeFuncs.c appears to have some pretty sloppy work done in many of the comments. Many look like they've just not been updated from a copy/paste/edit from another node function. The attached aims to clean these up. I plan to push this a later today unless anyone has anything they'd like to say about it first. David fix_numerous_copy_paste_errors_in_nodefuncs_comments.patch Description: Binary data
Re: Autovacuum on partitioned table (autoanalyze)
On Thu, Apr 08, 2021 at 05:56:25PM -0400, Alvaro Herrera wrote: > On 2021-Apr-08, Justin Pryzby wrote: > > > commit 0827e8af70f4653ba17ed773f123a60eadd9f9c9 > > |This also introduces necessary reloptions support for partitioned > > tables > > |(autovacuum_enabled, autovacuum_analyze_scale_factor, > > |autovacuum_analyze_threshold). It's unclear how best to document this > > |aspect. > > > > At least this part needs to be updated - see also ed62d3737. > > > > doc/src/sgml/ref/create_table.sgml-The storage parameters currently > > doc/src/sgml/ref/create_table.sgml-available for tables are listed > > below. > > ... > > doc/src/sgml/ref/create_table.sgml:Specifying these parameters for > > partitioned tables is not supported, > > doc/src/sgml/ref/create_table.sgml-but you may specify them for > > individual leaf partitions. > > Ah, thanks for pointing it out. How about the attached? > > This new bit reads weird: > > +Most parameters are not supported on partitioned tables, with exceptions > +noted below; you may specify them for individual leaf partitions. "Except where noted, these parameters are not supported on partitioned tables." -- Justin
Re: Autovacuum on partitioned table (autoanalyze)
On 2021-Apr-08, Justin Pryzby wrote: > commit 0827e8af70f4653ba17ed773f123a60eadd9f9c9 > |This also introduces necessary reloptions support for partitioned tables > |(autovacuum_enabled, autovacuum_analyze_scale_factor, > |autovacuum_analyze_threshold). It's unclear how best to document this > |aspect. > > At least this part needs to be updated - see also ed62d3737. > > doc/src/sgml/ref/create_table.sgml-The storage parameters currently > doc/src/sgml/ref/create_table.sgml-available for tables are listed below. > ... > doc/src/sgml/ref/create_table.sgml:Specifying these parameters for > partitioned tables is not supported, > doc/src/sgml/ref/create_table.sgml-but you may specify them for > individual leaf partitions. Ah, thanks for pointing it out. How about the attached? This new bit reads weird: +Most parameters are not supported on partitioned tables, with exceptions +noted below; you may specify them for individual leaf partitions. Maybe "Most parameters are not supported on partitioned tables, with exceptions noted below; you may specify others for individual leaf partitions." -- Álvaro Herrera39°49'30"S 73°17'W >From 37a829ec7b9c46acbbdb02f231288e39d22fcd04 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Thu, 8 Apr 2021 17:53:22 -0400 Subject: [PATCH] document reloptions for partitioned tables --- doc/src/sgml/ref/create_table.sgml | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml index 44e50620fd..3cf355cc8d 100644 --- a/doc/src/sgml/ref/create_table.sgml +++ b/doc/src/sgml/ref/create_table.sgml @@ -1369,8 +1369,8 @@ WITH ( MODULUS numeric_literal, REM If a table parameter value is set and the equivalent toast. parameter is not, the TOAST table will use the table's parameter value. -Specifying these parameters for partitioned tables is not supported, -but you may specify them for individual leaf partitions. +Most parameters are not supported on partitioned tables, with exceptions +noted below; you may specify them for individual leaf partitions. @@ -1452,6 +1452,7 @@ WITH ( MODULUS numeric_literal, REM If true, the autovacuum daemon will perform automatic VACUUM and/or ANALYZE operations on this table following the rules discussed in . + This parameter is supported on partitioned tables. If false, this table will not be autovacuumed, except to prevent transaction ID wraparound. See for more about wraparound prevention. @@ -1576,6 +1577,7 @@ WITH ( MODULUS numeric_literal, REM Per-table value for parameter. + This parameter is supported on partitioned tables. @@ -1591,6 +1593,7 @@ WITH ( MODULUS numeric_literal, REM Per-table value for parameter. + This parameter is supported on partitioned tables. -- 2.20.1
Re: [PATCH] force_parallel_mode and GUC categories
The previous patches accidentally included some unrelated changes. -- Justin >From fd67dd04d1b824a25e113796c235fd9fc9db23e0 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sat, 3 Apr 2021 19:06:37 -0500 Subject: [PATCH v2 1/4] track_activity_query_size is STATS_COLLECTOR category Not Resource Usage / Memory, as since 995fb7420 --- src/backend/utils/misc/guc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 090abdad8b..e54209995d 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -3506,7 +3506,7 @@ static struct config_int ConfigureNamesInt[] = }, { - {"track_activity_query_size", PGC_POSTMASTER, RESOURCES_MEM, + {"track_activity_query_size", PGC_POSTMASTER, STATS_COLLECTOR, gettext_noop("Sets the size reserved for pg_stat_activity.query, in bytes."), NULL, GUC_UNIT_BYTE -- 2.17.0 >From bbed9ed2c3c55b0ccd51358a5c62baa07d1a5ee1 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sat, 3 Apr 2021 19:10:01 -0500 Subject: [PATCH v2 2/4] log_autovacuum_min_duration is LOGGING_WHAT Not AUTOVACUUM, since 48f7e6439 and ef23a7744 --- doc/src/sgml/config.sgml | 56 +-- src/backend/utils/misc/postgresql.conf.sample | 8 +-- 2 files changed, 32 insertions(+), 32 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 26628f3e6d..eb154cd669 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -6812,6 +6812,34 @@ local0.*/var/log/postgresql + + log_autovacuum_min_duration (integer) + + log_autovacuum_min_duration + configuration parameter + + + + +Causes each action executed by autovacuum to be logged if it ran for at +least the specified amount of time. Setting this to zero logs +all autovacuum actions. -1 (the default) disables +logging autovacuum actions. +If this value is specified without units, it is taken as milliseconds. +For example, if you set this to +250ms then all automatic vacuums and analyzes that run +250ms or longer will be logged. In addition, when this parameter is +set to any value other than -1, a message will be +logged if an autovacuum action is skipped due to a conflicting lock or a +concurrently dropped relation. Enabling this parameter can be helpful +in tracking autovacuum activity. This parameter can only be set in +the postgresql.conf file or on the server command line; +but the setting can be overridden for individual tables by +changing table storage parameters. + + + + log_checkpoints (boolean) @@ -7827,34 +7855,6 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; - - log_autovacuum_min_duration (integer) - - log_autovacuum_min_duration - configuration parameter - - - - -Causes each action executed by autovacuum to be logged if it ran for at -least the specified amount of time. Setting this to zero logs -all autovacuum actions. -1 (the default) disables -logging autovacuum actions. -If this value is specified without units, it is taken as milliseconds. -For example, if you set this to -250ms then all automatic vacuums and analyzes that run -250ms or longer will be logged. In addition, when this parameter is -set to any value other than -1, a message will be -logged if an autovacuum action is skipped due to a conflicting lock or a -concurrently dropped relation. Enabling this parameter can be helpful -in tracking autovacuum activity. This parameter can only be set in -the postgresql.conf file or on the server command line; -but the setting can be overridden for individual tables by -changing table storage parameters. - - - - autovacuum_max_workers (integer) diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index 9830cfe382..cc9edc410f 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -575,6 +575,10 @@ #log_temp_files = -1 # log temporary files equal or larger # than the specified size in kilobytes; # -1 disables, 0 logs all temp files +#log_autovacuum_min_duration = -1 # -1 disables, 0 logs all actions and + # their durations, > 0 logs only + # actions running at least this number + # of milliseconds. #log_timezone = 'GMT' #-- @@ -616,10 +620,6 @@ #autovacuum = on # Enable autovacuum subprocess? 'on' # requires track_counts to also be on.
Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
On 4/8/21 2:00 PM, David Rowley wrote: > On Thu, 8 Apr 2021 at 22:54, David Rowley wrote: >> >> I think the changes in the patch are fairly isolated and the test >> coverage is now pretty good. I'm planning on looking at the patch >> again now and will consider pushing it for PG14. > > I push this with some minor cleanup from the v6 patch I posted earlier. > I ran the same set of benchmarks on the v6, which I think should be mostly identical to what was committed. I extended the test a bit to test table with 0, 1, 5 and 1000 rows, and also with int and text values, to see how it works with more expensive comparators. I built binaries with gcc 9.2.0 and clang 11.0.0, the full results are attached. There's a bit of difference between gcc and clang, but the general behavior is about the same, so I'll only present gcc results to keep this simple. I'll only throughput comparison to master, so >1.0 means good, <1.0 means bad. If you're interested in actual tps, see the full results. For the v5 patch (actually v4-0001 + v5-0002) and v6, the results are: integer column / v5 === rows 10/p100/p 16/p 10/s100/s 16/s --- 0 97% 97% 97% 107% 126% 108% 1 95% 82% 94% 108% 132% 110% 5 95% 83% 95% 108% 132% 110% 1000 129% 481% 171% 131% 382% 165% integer column / v6 === rows 10/p100/p 16/p 10/s100/s 16/s --- 0 97% 97% 97% 98% 98% 98% 1 96% 84% 95% 97% 97% 98% 5 97% 85% 96% 98% 97% 97% 1000 129% 489% 172% 128% 330% 162% text column / v5 rows 10/p100/p 16/p 10/s100/s 16/s --- 0 100% 100% 100% 106% 119% 108% 1 96% 81% 95% 107% 120% 109% 5 97% 82% 96% 107% 121% 109% 1000 291%1622% 402% 255%1092% 337% text column / v6 rows 10/p100/p 16/p 10/s100/s 16/s --- 0 101% 101% 101% 98% 99% 99% 1 98% 82% 96% 98% 96% 97% 5 100% 84% 98% 98% 96% 98% 1000 297%1645% 408% 255%1000% 336% Overall, the behavior for integer and text columns is the same, for both patches. There's a couple interesting observations: 1) For the "simple" query mode, v5 helped quite a bit (20-30% speedup), but v6 does not seem to help at all - it's either same or slower than unpatched master. I wonder why is that, and if we could get some of the speedup with v6? At first I thought that maybe v5 is not building the hash table in cases where v6 does, but that shouldn't be faster than master. 2) For the "prepared" mode, there's a clear performance hit the longer the array is (for both v5 and v6). For 100 elements it's about 15%, which is not great. I think the reason is fairly simple - building the hash table is not free, and with few rows it's not worth it - it'd be faster to just search the array directly. Unfortunately, the logic that makes the decision to switch to hashing only looks at the array length only, and ignores the number of rows entirely. So I think if we want to address this, convert_saop_to_hashed_saop needs to compare has_build_cost + nrows * hash_lookup_cost and nrows * linear_lookup_cost to make reasonable decision. I was thinking that maybe we can ignore this, because people probably have much larger tables in practice. But I'm not sure that's really true, because there may be other quals and it's possible the preceding ones are quite selective, filtering most of the rows. I'm not sure how much of the necessary information we have available in convert_saop_to_hashed_saop_walker, though :-( I suppose we know the number of input rows for that plan node, not sure about selectivity of the other quals, though. It's also a bit strange that we get speedup for "simple" protocol, while for "prepared" it gets slower. That seems counter-intuitive, because why should we see opposite outcomes in those cases? I'd assume that we'll see either speedup or slowdown in both cases, with the relative change being more significant in the "prepared" mode. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company saop gcc.ods Description: application/vnd.oasis.opendocument.spreadsheet saop llvm.ods Description:
Re: Autovacuum on partitioned table (autoanalyze)
On Thu, Apr 08, 2021 at 01:20:14AM -0400, Alvaro Herrera wrote: > On 2021-Apr-07, Alvaro Herrera wrote: > > > OK, I bit the bullet and re-did the logic in the way I had proposed > > earlier in the thread: do the propagation on the collector's side, by > > sending only the list of ancestors: the collector can read the tuple > > change count by itself, to add it to each ancestor. This seems less > > wasteful. Attached is v16 which does it that way and seems to work > > nicely under my testing. > > Pushed with this approach. Thanks for persisting with this. commit 0827e8af70f4653ba17ed773f123a60eadd9f9c9 |This also introduces necessary reloptions support for partitioned tables |(autovacuum_enabled, autovacuum_analyze_scale_factor, |autovacuum_analyze_threshold). It's unclear how best to document this |aspect. At least this part needs to be updated - see also ed62d3737. doc/src/sgml/ref/create_table.sgml-The storage parameters currently doc/src/sgml/ref/create_table.sgml-available for tables are listed below. ... doc/src/sgml/ref/create_table.sgml:Specifying these parameters for partitioned tables is not supported, doc/src/sgml/ref/create_table.sgml-but you may specify them for individual leaf partitions. -- Justin
Re: pgsql: autovacuum: handle analyze for partitioned tables
On 2021-Apr-08, Zhihong Yu wrote: > Hi, > Within truncate_update_partedrel_stats(), dirty is declared within the loop. > + if (rd_rel->reltuples != 0) > + { > ... > + if (dirty) > > The two if blocks can be merged. The variable dirty can be dropped. Hi, thanks for reviewing. Yes, evidently I copied vac_update_relstats too closely -- that boolean is not necessary. -- Álvaro Herrera Valdivia, Chile
Re: pg_amcheck contrib application
> On Apr 8, 2021, at 1:05 PM, Robert Haas wrote: > > On Thu, Apr 8, 2021 at 3:02 PM Mark Dilger > wrote: >> Imagine a toasted attribute with 18 chunks numbered [0..17]. Then we update >> the toast to have only 6 chunks numbered [0..5] except we corruptly keep >> chunks numbered [12..17] in the toast table. We'd rather see a report like >> this: > [ toast value NNN chunk NNN has sequence number NNN, but expected > sequence number NNN ] >> than one like this: > [ toast value NNN contains chunk NNN where chunk NNN was expected ] >> >> because saying the toast value ended at "chunk 12" after saying that it >> contains "chunk 17" is contradictory. You need the distinction between the >> chunk number and the chunk sequence number, since in corrupt circumstances >> they may not be the same. > > Hmm, I see your point, and that's a good example to illustrate it. > But, with that example in front of me, I am rather doubtful that > either of these is what users actually want. Consider the case where I > should have chunks 0..17 and chunk 1 is just plain gone. This, by the > way, seems like a pretty likely case to arise in practice, since all > we need is for a block to get truncated away or zeroed erroneously, or > for a tuple to get pruned that shouldn't. With either of the above > schemes, I guess we're going to get a message about every chunk from 2 > to 17, complaining that they're all misnumbered. We might also get a > complaint that the last chunk is the wrong size, and that the total > number of chunks isn't right. What we really want is a single > complaint saying chunk 1 is missing. > > Likewise, in your example, I sort of feel like what I really want, > rather than either of the above outputs, is to get some messages like > this: > > toast value NNN contains unexpected extra chunk [12-17] > > Both your phrasing for those messages and what I suggested make it > sound like the problem is that the chunk number is wrong. But that > doesn't seem like it's taking the right view of the situation. Chunks > 12-17 shouldn't exist at all, and if they do, we should say that, e.g. > by complaining about something like "toast value 16444 chunk 12 > follows last expected chunk 5" > > In other words, I don't buy the idea that the user will accept the > idea that there's a chunk number and a chunk sequence number, and that > they should know the difference between those things and what each of > them are. They're entitled to imagine that there's just one thing, and > that we're going to tell them about value that are extra or missing. > The fact that we're not doing that seems like it's just a matter of > missing code. Somehow, we have to get enough information about chunk_seq discontinuity into the output that if the user forwards it to -hackers, or if the output comes from a buildfarm critter, that we have all the information to help diagnose what went wrong. As a specific example, if the va_rawsize suggests 2 chunks, and we find 150 chunks all with contiguous chunk_seq values, that is different from a debugging point of view than if we find 150 chunks with chunk_seq values spread all over the [0..MAXINT] range. We can't just tell the user that there were 148 extra chunks. We also shouldn't phrase the error in terms of "extra chunks", since it might be the va_rawsize that is corrupt. I agree that the current message output might be overly verbose in how it reports this information. Conceptually, we want to store up information about the chunk issues and report them all at once, but that's hard to do in general, as the number of chunk_seq discontinuities might be quite large, much too large to fit reasonably into any one message. Maybe we could report just the first N discontinuities rather than all of them, but if somebody wants to open a hex editor and walk through the toast table, they won't appreciate having the corruption information truncated like that. All this leads me to believe that we should report the following: 1) If the total number of chunks retrieved differs from the expected number, report how many we expected vs. how many we got 2) If the chunk_seq numbers are discontiguous, report each discontiguity. 3) If the index scan returned chunks out of chunk_seq order, report that 4) If any chunk is not the expected size, report that So, for your example of chunk 1 missing from chunks [0..17], we'd report that we got one fewer chunks than we expected, that the second chunk seen was discontiguous from the first chunk seen, that the final chunk seen was smaller than expected by M bytes, and that the total size was smaller than we expected by N bytes. The third of those is somewhat misleading, since the final chunk was presumably the right size; we just weren't expecting to hit a partial chunk quite yet. But I don't see how to make that better in the general case. > If we start the index scan and get chunk 4, we can > easily emit messages for chunks 0..3
Re: Dubious coding in nbtinsert.c
Peter Geoghegan writes: > You had a near-identical complaint about a compiler warning that led > to my commit d64f1cdf2f4 -- that one involved _bt_check_unique()'s > curitup, while this one is about curitemid. While I have no problem > silencing this compiler warning now, I don't see any reason to not > just do the same thing again. Which is to initialize the pointer to > NULL. Works for me; if there is any bug in the logic, we'll get a core dump and can investigate. regards, tom lane
Re: pgsql: autovacuum: handle analyze for partitioned tables
On Thu, Apr 8, 2021 at 1:12 PM Alvaro Herrera wrote: > On 2021-Apr-08, Tom Lane wrote: > > > > So I tend to think that my initial instinct was the better direction: > we > > > should not be doing any find_all_inheritors() here at all, but instead > > > rely on pg_class.reltuples to be set for the partitioned table. > > > > +1 > > This patch does that. > > -- > Álvaro Herrera39°49'30"S 73°17'W > "I dream about dreams about dreams", sang the nightingale > under the pale moon (Sandman) > Hi, Within truncate_update_partedrel_stats(), dirty is declared within the loop. + if (rd_rel->reltuples != 0) + { ... + if (dirty) The two if blocks can be merged. The variable dirty can be dropped. Cheers
Re: pgsql: autovacuum: handle analyze for partitioned tables
On 2021-Apr-08, Tom Lane wrote: > > So I tend to think that my initial instinct was the better direction: we > > should not be doing any find_all_inheritors() here at all, but instead > > rely on pg_class.reltuples to be set for the partitioned table. > > +1 This patch does that. -- Álvaro Herrera39°49'30"S 73°17'W "I dream about dreams about dreams", sang the nightingale under the pale moon (Sandman) >From a54701552f2ba9295aae4fe0fc22c7bac912bf45 Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Thu, 8 Apr 2021 11:10:44 -0400 Subject: [PATCH] Set pg_class.reltuples for partitioned tables --- src/backend/commands/analyze.c | 12 +++ src/backend/commands/tablecmds.c| 51 - src/backend/postmaster/autovacuum.c | 39 +- 3 files changed, 63 insertions(+), 39 deletions(-) diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c index 5bdaceefd5..2de699d838 100644 --- a/src/backend/commands/analyze.c +++ b/src/backend/commands/analyze.c @@ -656,6 +656,18 @@ do_analyze_rel(Relation onerel, VacuumParams *params, in_outer_xact); } } + else if (onerel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) + { + /* + * Partitioned tables don't have storage, so we don't set any fields in + * their pg_class entries except for relpages, which is necessary for + * auto-analyze to work properly. + */ + vac_update_relstats(onerel, -1, totalrows, + 0, false, InvalidTransactionId, + InvalidMultiXactId, + in_outer_xact); + } /* * Now report ANALYZE to the stats collector. For regular tables, we do diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 1f19629a94..deca860c80 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -337,6 +337,7 @@ typedef struct ForeignTruncateInfo static void truncate_check_rel(Oid relid, Form_pg_class reltuple); static void truncate_check_perms(Oid relid, Form_pg_class reltuple); static void truncate_check_activity(Relation rel); +static void truncate_update_partedrel_stats(List *parted_rels); static void RangeVarCallbackForTruncate(const RangeVar *relation, Oid relId, Oid oldRelId, void *arg); static List *MergeAttributes(List *schema, List *supers, char relpersistence, @@ -1755,6 +1756,7 @@ ExecuteTruncateGuts(List *explicit_rels, { List *rels; List *seq_relids = NIL; + List *parted_rels = NIL; HTAB *ft_htab = NULL; EState *estate; ResultRelInfo *resultRelInfos; @@ -1908,9 +1910,15 @@ ExecuteTruncateGuts(List *explicit_rels, Relation rel = (Relation) lfirst(lc1); int extra = lfirst_int(lc2); - /* Skip partitioned tables as there is nothing to do */ + /* + * Save OID of partitioned tables for later; nothing else to do for + * them here. + */ if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) + { + parted_rels = lappend_oid(parted_rels, RelationGetRelid(rel)); continue; + } /* * Build the lists of foreign tables belonging to each foreign server @@ -2061,6 +2069,9 @@ ExecuteTruncateGuts(List *explicit_rels, ResetSequence(seq_relid); } + /* Reset partitioned tables' pg_class.reltuples */ + truncate_update_partedrel_stats(parted_rels); + /* * Write a WAL record to allow this set of actions to be logically * decoded. @@ -2207,6 +2218,44 @@ truncate_check_activity(Relation rel) CheckTableNotInUse(rel, "TRUNCATE"); } +/* + * Update pg_class.reltuples for all the given partitioned tables to 0. + */ +static void +truncate_update_partedrel_stats(List *parted_rels) +{ + Relation pg_class; + HeapTuple tuple; + Form_pg_class rd_rel; + ListCell *lc; + + pg_class = table_open(RelationRelationId, RowExclusiveLock); + + foreach(lc, parted_rels) + { + Oid relid = lfirst_oid(lc); + bool dirty = false; + + tuple = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(relid)); + if (!HeapTupleIsValid(tuple)) + elog(ERROR, "could not find tuple for relation %u", relid); + rd_rel = (Form_pg_class) GETSTRUCT(tuple); + + if (rd_rel->reltuples != 0) + { + rd_rel->reltuples = (float4) 0; + dirty = true; + } + + if (dirty) + heap_inplace_update(pg_class, tuple); + + heap_freetuple(tuple); + } + + table_close(pg_class, RowExclusiveLock); +} + /* * storage_name * returns the name corresponding to a typstorage/attstorage enum value diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index aef9ac4dd2..a799544738 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -3209,44 +3209,7 @@ relation_needs_vacanalyze(Oid relid, */ if (PointerIsValid(tabentry) && AutoVacuumingActive()) { - if (classForm->relkind != RELKIND_PARTITIONED_TABLE) - { - reltuples = classForm->reltuples; - } - else - { - /* - * If the relation is a partitioned table, we must add up - * children's reltuples. - */ -
Re: [HACKERS] Custom compression methods
On Thu, Apr 8, 2021 at 3:38 PM Justin Pryzby wrote: > It looks like this should not remove the word "data" ? Oh, yes, right. > The compression technique used for either in-line or out-of-line compressed > -data is a fairly simple and very fast member > -of the LZ family of compression techniques. See > -src/common/pg_lzcompress.c for the details. > +can be selected using the COMPRESSION option on a > per-column > +basis when creating a table. The default for columns with no explicit setting > +is taken from the value of . > > I thought this patch would need to update parts about borrowing 2 spare bits, > but maybe that's the wrong header..before. We're not borrowing any more bits from the places where we were borrowing 2 bits before. We are borrowing 2 bits from places that don't seem to be discussed in detail here, where no bits were borrowed before. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_amcheck contrib application
On Thu, Apr 8, 2021 at 3:02 PM Mark Dilger wrote: > Imagine a toasted attribute with 18 chunks numbered [0..17]. Then we update > the toast to have only 6 chunks numbered [0..5] except we corruptly keep > chunks numbered [12..17] in the toast table. We'd rather see a report like > this: [ toast value NNN chunk NNN has sequence number NNN, but expected sequence number NNN ] > than one like this: [ toast value NNN contains chunk NNN where chunk NNN was expected ] > > because saying the toast value ended at "chunk 12" after saying that it > contains "chunk 17" is contradictory. You need the distinction between the > chunk number and the chunk sequence number, since in corrupt circumstances > they may not be the same. Hmm, I see your point, and that's a good example to illustrate it. But, with that example in front of me, I am rather doubtful that either of these is what users actually want. Consider the case where I should have chunks 0..17 and chunk 1 is just plain gone. This, by the way, seems like a pretty likely case to arise in practice, since all we need is for a block to get truncated away or zeroed erroneously, or for a tuple to get pruned that shouldn't. With either of the above schemes, I guess we're going to get a message about every chunk from 2 to 17, complaining that they're all misnumbered. We might also get a complaint that the last chunk is the wrong size, and that the total number of chunks isn't right. What we really want is a single complaint saying chunk 1 is missing. Likewise, in your example, I sort of feel like what I really want, rather than either of the above outputs, is to get some messages like this: toast value NNN contains unexpected extra chunk [12-17] Both your phrasing for those messages and what I suggested make it sound like the problem is that the chunk number is wrong. But that doesn't seem like it's taking the right view of the situation. Chunks 12-17 shouldn't exist at all, and if they do, we should say that, e.g. by complaining about something like "toast value 16444 chunk 12 follows last expected chunk 5" In other words, I don't buy the idea that the user will accept the idea that there's a chunk number and a chunk sequence number, and that they should know the difference between those things and what each of them are. They're entitled to imagine that there's just one thing, and that we're going to tell them about value that are extra or missing. The fact that we're not doing that seems like it's just a matter of missing code. If we start the index scan and get chunk 4, we can easily emit messages for chunks 0..3 right on the spot, declaring them missing. Things do get a bit hairy if the index scan returns values out of order: what if it gives us chunk_seq = 2 and then chunk_seq = 1? But I think we could handle that by just issuing a complaint in any such case that "toast index returns chunks out of order for toast value NNN" and stopping further checking of that toast value. > Purely as manual testing, and not part of the patch, I hacked the backend a > bit to allow direct modification of the toast table. After corrupting the > toast with the following bit of SQL: > > WITH chunk_limit AS ( > SELECT chunk_id, MAX(chunk_seq) AS maxseq > FROM $toastname > GROUP BY chunk_id) > INSERT INTO $toastname (chunk_id, chunk_seq, chunk_data) > (SELECT t.chunk_id, > t.chunk_seq + cl.maxseq + CASE WHEN > t.chunk_seq < 3 THEN 1 ELSE 7 END, > t.chunk_data > FROM $toastname t > INNER JOIN chunk_limit cl > ON t.chunk_id = cl.chunk_id) > > pg_amcheck reports the following corruption messages: > > # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: > # toast value 16444 chunk 6 follows last expected chunk 5 > # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: > # toast value 16444 chunk 7 follows last expected chunk 5 > # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: > # toast value 16444 chunk 8 follows last expected chunk 5 > # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: > # toast value 16444 chunk 9 has sequence number 15, but expected sequence > number 9 > # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: > # toast value 16444 chunk 10 has sequence number 16, but expected > sequence number 10 > # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: > # toast value 16444 chunk 11 has sequence number 17, but expected > sequence number 11 > # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: > # toast value 16444 was expected to end at chunk 6, but ended at chunk 12 > > I think if we'd left out the
Re: Dubious coding in nbtinsert.c
On Thu, Apr 8, 2021 at 12:19 PM Tom Lane wrote: > Buildfarm member curculio, which doesn't usually produce > uninitialized-variable warnings, is showing one here: > > nbtinsert.c: In function '_bt_doinsert': > nbtinsert.c:411: warning: 'curitemid' may be used uninitialized in this > function > nbtinsert.c:411: note: 'curitemid' was declared here > > I can see its point: curitemid is set only if !inposting. > While the first two uses of the value are clearly reached > only if !inposting, it's FAR from clear that it's impossible > to reach "ItemIdMarkDead(curitemid);" without a valid value. > Could you clean that up? I'll take care of it shortly. You had a near-identical complaint about a compiler warning that led to my commit d64f1cdf2f4 -- that one involved _bt_check_unique()'s curitup, while this one is about curitemid. While I have no problem silencing this compiler warning now, I don't see any reason to not just do the same thing again. Which is to initialize the pointer to NULL. -- Peter Geoghegan
Re: [HACKERS] Custom compression methods
On Thu, Apr 08, 2021 at 02:58:04PM -0400, Robert Haas wrote: > On Thu, Apr 8, 2021 at 11:32 AM David Steele wrote: > > It looks like this CF entry should have been marked as committed so I > > did that. > > Thanks. > > Here's a patch for the doc update which was mentioned as an open item > upthread. Thanks too. It looks like this should not remove the word "data" ? The compression technique used for either in-line or out-of-line compressed -data is a fairly simple and very fast member -of the LZ family of compression techniques. See -src/common/pg_lzcompress.c for the details. +can be selected using the COMPRESSION option on a per-column +basis when creating a table. The default for columns with no explicit setting +is taken from the value of . I thought this patch would need to update parts about borrowing 2 spare bits, but maybe that's the wrong header.. -- Justin
Dubious coding in nbtinsert.c
Buildfarm member curculio, which doesn't usually produce uninitialized-variable warnings, is showing one here: nbtinsert.c: In function '_bt_doinsert': nbtinsert.c:411: warning: 'curitemid' may be used uninitialized in this function nbtinsert.c:411: note: 'curitemid' was declared here I can see its point: curitemid is set only if !inposting. While the first two uses of the value are clearly reached only if !inposting, it's FAR from clear that it's impossible to reach "ItemIdMarkDead(curitemid);" without a valid value. Could you clean that up? regards, tom lane
Re: multi-install PostgresNode fails with older postgres versions
> On Apr 7, 2021, at 8:43 AM, Andrew Dunstan wrote: > > > On 4/7/21 1:03 AM, Mark Dilger wrote: >> The v1 patch supported postgres versions back to 8.4, but v2 pushes that >> back to 8.1. >> >> The version of PostgresNode currently committed relies on IPC::Run in a way >> that is subtly wrong. The first time IPC::Run::run(X, ...) is called, it >> uses the PATH as it exists at that time, resolves the path for X, and caches >> it. Subsequent calls to IPC::Run::run(X, ...) use the cached path, without >> respecting changes to $ENV{PATH}. In practice, this means that: >> >> use PostgresNode; >> >> my $a = PostgresNode->get_new_node('a', install_path => '/my/install/8.4'); >> my $b = PostgresNode->get_new_node('b', install_path => '/my/install/9.0'); >> >> $a->safe_psql(...) # <=== Resolves and caches 'psql' as >> /my/install/8.4/bin/psql >> >> $b->safe_psql(...) # <=== Executes /my/install/8.4/bin/psql, not >> /my/install/9.0/bin/psql as one might expect >> >> PostgresNode::safe_psql() and PostgresNode::psql() both suffer from this, >> and similarly PostgresNode::pg_recvlogical_upto() because the path to >> pg_recvlogical gets cached. Calls to initdb and pg_ctl do not appear to >> suffer this problem, as they are ultimately handled by perl's system() call, >> not by IPC::Run::run. >> >> Since postgres commands work fairly similarly from one release to another, >> this can cause subtle and hard to diagnose bugs in regression tests. The >> fix in v2-0001 works for me, as demonstrated by v2-0002, but whether the fix >> in the attached v2 patch set gets used or not, I think something needs to be >> done to fix this. >> >> > > Awesome work. The IPC::Run behaviour is darned unfriendly, and AFAICS > completely undocumented. It can't even be easily modified by a client > because the cache is stashed in a lexical variable. Yes, I noticed that, too. Even if we could get a patch accepted into IPC::Run, we'd need to be compatible with older versions. So there doesn't seem to be any option but to work around the issue. > You fix looks good. Thanks for reviewing! > other notes: > > > . needs a perltidy run, some lines are too long (see > src/tools/pgindent/perltidyrc) > > > . Please use an explicit return here: > > > +# Return an array reference > +[ @result ]; Done. > . I'm not sure the computation in _pg_version_cmp is right. What if the > number of elements differ? As I read it you would return 0 for a > comparison of '1.2' and '1.2.3'. Is that what's intended? Yes, that is intended. '1.2' and '1.2.0' are not the same. '1.2' means "some unspecified micro release or development version of 1.2", whereas '1.2.0' does not. This is useful for things like $node->at_least_version("13") returning true for development versions of version 13, which are otherwise less than (not equal to) 13.0 > . The second patch has a bunch of stuff it doesn't need. The control > file should be unnecessary as should all the lines above 'ifdef > USE_PGXS' in the Makefile except 'TAP_TESTS = 1' Done. Yeah, I started the second patch as simply a means of testing the first and didn't clean it up after copying boilerplate from elsewhere. The second patch has turned into something possibly worth keeping in its own right, and having the build farm execute on a regular basis. > . the test script should have a way of passing a non-default version > file to CrossVersion::nodes(). Possible get it from an environment variable? Good idea. I changed it to use $ENV{PG_TEST_VERSIONS_FILE}. I'm open to other names for this variable. > . I'm somewhat inclined to say that CrossVersion should just return a > {name => path} map, and let the client script do the node creation. Then > crossVersion doesn't need to know anything much about the > infrastructure. But I could possibly be persuaded otherwise. Also, maybe > it belongs in src/test/perl. Hmm. That's a good thought. I've moved it to src/test/perl with the change you suggest. > . This line appears deundant, the variable is not referenced: > > > +my $path = $ENV{PATH}; Removed. > Also these lines at the bottom of CrossVersion.pm are redundant: > > > +use strict; > +use warnings; Yeah, those are silly. Removed. This patch has none of the Object Oriented changes Alvaro and Jehan-Guillaume requested, but that should not be construed as an argument against their request. It's just not handled in this particular patch. v3-0001-Extending-PostgresNode-cross-version-functionalit.patch Description: Binary data v3-0002-Adding-modules-test_cross_version.patch Description: Binary data — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
> On Apr 7, 2021, at 1:16 PM, Robert Haas wrote: > > On Sun, Apr 4, 2021 at 8:02 PM Mark Dilger > wrote: >> v18-0001 - Finishes work started in commit 3b6c1259f9 that was overlooked >> owing to how I had separated the changes in v17-0002 vs. v17-0003 > > Committed. Thank you. >> v18-0002 - Postpones the toast checks for a page until after the main table >> page lock is released > > Committed, but I changed list_free() to list_free_deep() to avoid a > memory leak, and I revised the commit message to mention the important > point that we need to avoid following TOAST pointers from > potentially-prunable tuples. Thank you, and yes, I agree with that change. >> v18-0003 - Improves the corruption messages in ways already discussed >> earlier in this thread. Changes the tests to expect the new messages, but >> adds no new checks > > Kibitizing your message wording: > > "toast value %u chunk data is null" -> "toast value %u chunk %d has > null data". We can mention the chunk number this way. Changed. > "toast value %u corrupt extended chunk has invalid varlena header: %0x > (sequence number %d)" -> "toast value %u chunk %d has invalid varlena > header %0x". We can be more consistent about how we incorporate the > chunk number into the text, and we don't really need to include the > word corrupt, because all of these are corruption complaints, and I > think it looks better without the colon. Changed. > "toast value %u chunk sequence number %u does not match the expected > sequence number %u" -> "toast value %u contains chunk %d where chunk > %d was expected". Shorter. Uses %d for a sequence number instead of > %u, which I think is correct -- anyway we should have them all one way > or all the other. I think I'd rather ditch the "sequence number" > technology and just talk about "chunk %d" or whatever. I don't agree with this one. I do agree with changing the message, but not to the message you suggest. Imagine a toasted attribute with 18 chunks numbered [0..17]. Then we update the toast to have only 6 chunks numbered [0..5] except we corruptly keep chunks numbered [12..17] in the toast table. We'd rather see a report like this: # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: # toast value 16444 chunk 6 has sequence number 12, but expected sequence number 6 # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: # toast value 16444 chunk 7 has sequence number 13, but expected sequence number 7 # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: # toast value 16444 chunk 8 has sequence number 14, but expected sequence number 8 # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: # toast value 16444 chunk 9 has sequence number 15, but expected sequence number 9 # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: # toast value 16444 chunk 10 has sequence number 16, but expected sequence number 10 # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: # toast value 16444 chunk 11 has sequence number 17, but expected sequence number 11 # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: # toast value 16444 was expected to end at chunk 6, but ended at chunk 12 than one like this: # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: # toast value 16444 contains chunk 12 where chunk 6 was expected # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: # toast value 16444 contains chunk 13 where chunk 7 was expected # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: # toast value 16444 contains chunk 14 where chunk 8 was expected # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: # toast value 16444 contains chunk 15 where chunk 9 was expected # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: # toast value 16444 contains chunk 16 where chunk 10 was expected # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: # toast value 16444 contains chunk 17 where chunk 11 was expected # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: # toast value 16444 was expected to end at chunk 6, but ended at chunk 12 because saying the toast value ended at "chunk 12" after saying that it contains "chunk 17" is contradictory. You need the distinction between the chunk number and the chunk sequence number, since in corrupt circumstances they may not be the same. > "toast value %u chunk sequence number %u exceeds the end chunk > sequence number %u" -> "toast value %u chunk %d follows last expected > chunk %d" Changed. > "toast value %u chunk size %u differs from the expected size %u" -> > "toast value %u chunk %d has size %u, but expected size %u" Changed. > Other complaints: > > Your commit message fails to mention the addition of >
Re: [HACKERS] Custom compression methods
On Thu, Apr 8, 2021 at 11:32 AM David Steele wrote: > It looks like this CF entry should have been marked as committed so I > did that. Thanks. Here's a patch for the doc update which was mentioned as an open item upthread. -- Robert Haas EDB: http://www.enterprisedb.com update-storage-docs.patch Description: Binary data
Re: pgsql: autovacuum: handle analyze for partitioned tables
Alvaro Herrera writes: > On 2021-Apr-08, Tom Lane wrote: >> Yeah. I hit this on another machine that isn't using EXEC_BACKEND, >> and I concur it looks more like a race condition. I think the problem >> is that autovacuum is calling find_all_inheritors() on a relation it >> has no lock on, contrary to that function's API spec. > Hmm. Autovacuum tries hard to avoid grabbing locks on relations until > really needed (at vacuum/analyze time), which is why all these tests > only use data that can be found in the pg_class rows and pgstat entries. Yeah, I was worried about that. > So I tend to think that my initial instinct was the better direction: we > should not be doing any find_all_inheritors() here at all, but instead > rely on pg_class.reltuples to be set for the partitioned table. +1 regards, tom lane
Re: pgsql: autovacuum: handle analyze for partitioned tables
On 2021-Apr-08, Tom Lane wrote: > Yeah. I hit this on another machine that isn't using EXEC_BACKEND, > and I concur it looks more like a race condition. I think the problem > is that autovacuum is calling find_all_inheritors() on a relation it > has no lock on, contrary to that function's API spec. find_all_inheritors > assumes the OID it's given is valid and locked, and adds it to the > result list automatically. Then it looks for children, and won't find > any in the race case where somebody else just dropped the table. Hmm. Autovacuum tries hard to avoid grabbing locks on relations until really needed (at vacuum/analyze time), which is why all these tests only use data that can be found in the pg_class rows and pgstat entries. So I tend to think that my initial instinct was the better direction: we should not be doing any find_all_inheritors() here at all, but instead rely on pg_class.reltuples to be set for the partitioned table. I'll give that another look. Most places already assume that reltuples isn't set for a partitioned table, so they shouldn't care. I wonder, though, whether we should set relpages to some value other than 0 or -1. (I'm inclined not to, since autovacuum does not use it.) -- Álvaro Herrera Valdivia, Chile
Re: Have I found an interval arithmetic bug?
> On 08-Apr-2021, at 10:24, Bruce Momjian wrote: > > On Mon, Apr 5, 2021 at 02:01:58PM -0400, Bruce Momjian wrote: >> On Mon, Apr 5, 2021 at 11:33:10AM -0500, Justin Pryzby wrote: >> Well, bug or not, we are not going to change back branches for this, and >> if you want a larger discussion, it will have to wait for PG 15. >> https://www.google.com/url?q=https://www.postgresql.org/docs/current/datatype-datetime.html%23DATATYPE-INTERVAL-INPUT=gmail-imap=161850748900=AOvVaw2h2TNbK7O41zsDn8HfD88C « …field values can have fractional parts; for example '1.5 week' or '01:02:03.45'. Such input is converted to the appropriate number of months, days, and seconds for storage. When this would result in a fractional number of months or days, the fraction is added to the lower-order fields using the conversion factors 1 month = 30 days and 1 day = 24 hours. For example, '1.5 month' becomes 1 month and 15 days. Only seconds will ever be shown as fractional on output. » >> >> I see that. What is not clear here is how far we flow down. I was >> looking at adding documentation or regression tests for that, but was >> unsure. I adjusted the docs slightly in the attached patch. > > Here is an updated patch, which will be for PG 15. It updates the > documentation to state: > > The fractional parts are used to compute appropriate values for the next > lower-order internal fields (months, days, seconds). > > It removes the flow from fractional months/weeks to > hours-minutes-seconds, and adds missing rounding for fractional > computations. Thank you Bruce. I look forward to documenting this new algorithm for YugabyteDB. The algorithm implements the transformation from this: [ yy_in numeric, mo_in numeric, dd_in numeric, hh_in numeric, mi_in numeric, ss_in numeric ] to this: [ mo_internal_representation int, dd_internal_representation int, ss_internal_representation numeric(1000,6) ] I am convinced that a prose account of the algorithm, by itself, is not the best way to tell the reader the rules that the algorithm implements. Rather, psuedocode is needed. I mentioned before that, better still, is actual executable PL/pgSQL code. (I can expect readers to be fluent in PL/pgSQL.) Given this executable simulation, an informal prose sketch of what it does will definitely add value. May I ask you to fill in the body of this stub by translating the C that you have in hand? create type internal_representation_t as( mo_internal_representation int, dd_internal_representation int, ss_internal_representation numeric(1000,6)); create function internal_representation( yy_in numeric default 0, mo_in numeric default 0, dd_in numeric default 0, hh_in numeric default 0, mi_in numeric default 0, ss_in numeric default 0) returns internal_representation_t language plpgsql as $body$ declare mo_internal_representation int not null := 0; dd_internal_representation int not null := 0; ss_internal_representation numeric not null := 0.0; ok constant boolean := (yy_in is not null) and (mo_in is not null) and (dd_in is not null) and (hh_in is not null) and (mi_in is not null) and (ss_in is not null); begin assert ok, 'No actual argument, when provided, may be null'; -- The algorithm. return (mo_internal_representation, dd_internal_representation, ss_internal_representation)::internal_representation_t; end; $body$; By the way, I believe that a user might well decide always to supply all the fields in a "from text to interval" typecast, except for the seconds, as integral values. This, after all, is what the "make_interval()" function enforces. But, because the typecast approach allows non-integral values, the reference documentation must explain the rules unambiguously so that the reader can predict the outcome of any ad hoc test that they might try. It's a huge pity that the three values of the internal representation cannot be observed directly using SQL because each behaves with different semantics when an interval value is added to a timestamptz value. However, as a second best (and knowing the algorithm above), a user can create interval values where only one of the three fields is populated and test their understanding of the semantic rules that way.
Re: VACUUM (DISABLE_PAGE_SKIPPING on)
On Thu, 8 Apr 2021 at 18:15, Alvaro Herrera wrote: > > On 2021-Apr-08, Simon Riggs wrote: > > > On Thu, 8 Apr 2021 at 16:58, David Steele wrote: > > > > It's not clear to me which patch is which, so perhaps move one CF entry > > > to next CF and clarify which patch is current? > > > > Entry: Maximize page freezing > > has this patch, perfectly fine, awaiting review from Alvaro since 29 Nov. > > one_freeze_then_max_freeze.v7.patch > > Oh dear, was this waiting on me? It was not in my to-do, so I didn't > pay close attention. No problem, if I felt the idea deserved priority attention I would have pinged you. I think I'll open a new thread later to allow it to be discussed without confusion. -- Simon Riggshttp://www.EnterpriseDB.com/
weird interaction between asynchronous queries and pg_sleep
Consider the following snippet create table data as select generate_series(1,100) s; do $d$ begin PERFORM * FROM dblink_connect('test',''); PERFORM * from dblink_send_query('test', 'SELECT * FROM data'); LOOP if dblink_is_busy('test') = 0 THEN PERFORM * FROM dblink_get_result('test') AS R(V int); PERFORM * FROM dblink_get_result('test') AS R(V int); RETURN; END IF; PERFORM pg_sleep(.001); END LOOP; PERFORM * FROM dblink_disconnect('test'); END; $d$; What's interesting here is that, when I vary the sleep parameter, I get: 0: .4 seconds (per top, this is busywait), same as running synchronous. 0.01: 1.4 seconds 0.001: 2.4 seconds 0.01: 10.6 seconds 0.1: does not terminate This effect is only noticeable when the remote query is returning volumes of data. My question is, is there any way to sleep loop client side without giving up 3x performance penalty? Why is that that when more local sleep queries are executed, performance improves? merlin
Re: test runner (was Re: SQL-standard function body)
On 2021-04-08 10:50:39 -0700, Andres Freund wrote: > It's hard to convey just how much nicer it is to see a progress report > during the test, see the failing tests at the end, without needing to > wade through reams of log output. The output of the individual tests is > in testlog.txt referenced above. https://anarazel.de/public/t/pg-meson-test-screencap-2021-04-08_10.58.26.mkv Greetings, Andres Freund
Re: pgsql: autovacuum: handle analyze for partitioned tables
Alvaro Herrera writes: > On 2021-Apr-08, Tom Lane wrote: >> Looks like this has issues under EXEC_BACKEND: >> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae=2021-04-08%2005%3A50%3A08 > Hmm, I couldn't reproduce this under EXEC_BACKEND or otherwise, but I > think this is unrelated to that, but rather a race condition. Yeah. I hit this on another machine that isn't using EXEC_BACKEND, and I concur it looks more like a race condition. I think the problem is that autovacuum is calling find_all_inheritors() on a relation it has no lock on, contrary to that function's API spec. find_all_inheritors assumes the OID it's given is valid and locked, and adds it to the result list automatically. Then it looks for children, and won't find any in the race case where somebody else just dropped the table. So we come back to relation_needs_vacanalyze with a list of just the original rel OID, and since this loop believes that every syscache fetch it does will succeed, kaboom. I do not think it is sane to do find_all_inheritors() with no lock, so I'd counsel doing something about that rather than band-aiding the symptom. On the other hand, it's also not really okay not to have an if-test-and-elog after the SearchSysCache call. "A cache lookup cannot fail" is not an acceptable assumption in my book. BTW, another thing that looks like a race condition is the extract_autovac_opts() call that is done a little bit earlier, also without lock. I think this is actually safe, but it's ONLY safe because we resisted the calls by certain people to add a toast table to pg_class. Otherwise, fetching reloptions could have involved a toast pointer dereference, and it would then be racy whether the toasted data was still there. As-is, even if the pg_class row we're looking at has been deleted, we can safely disassemble its reloptions. I think this matter is deserving of a comment at least. regards, tom lane
test runner (was Re: SQL-standard function body)
Hi, This started out as a reply to https://postgr.es/m/20210408170802.GA9392%40alvherre.pgsql but it's independent enough to just start a new thread... On 2021-04-08 13:08:02 -0400, Alvaro Herrera wrote: > Yes, coverage.pg.org runs "make check-world". > > Maybe it would make sense to change that script, so that it runs the > buildfarm's run_build.pl script instead of "make check-world". That > would make coverage.pg.org report what the buildfarm actually tests ... > it would have made this problem a bit more obvious. We desperately need to unify the different test run environments we have. I did spent some time trying to do that, and ended up with it being hard to do in a good way in the make / msvc environment. Not sure that I took the right path, but I end up doing experimental port of the buildsystem meson - which has a builtin test runner (working on all platforms...). andres@awork3:/tmp$ ccache --clear andres@awork3:/tmp$ ~/src/meson/meson.py setup ~/src/postgresql /tmp/pg-meson --prefix=/tmp/pg-meson-install The Meson build system Version: 0.57.999 Source dir: /home/andres/src/postgresql Build dir: /tmp/pg-meson Build type: native build Project name: postgresql Project version: 14devel ... Header has symbol "fdatasync" : YES Header has symbol "F_FULLSYNC" : NO Checking for alignment of "short" : 2 Checking for alignment of "int" : 4 ... Configuring pg_config_ext.h using configuration Configuring pg_config.h using configuration Configuring pg_config_paths.h using configuration Program sed found: YES (/usr/bin/sed) Build targets in project: 116 Found ninja-1.10.1 at /usr/bin/ninja ... andres@awork3:/tmp/pg-meson$ time ninja [10/1235] Generating snowball_create with a custom command Generating tsearch script... [41/1235] Generating generated_catalog_headers with a custom command [1235/1235] Linking target contrib/test_decoding/test_decoding.so real0m10.752s user3m47.020s sys 0m50.281s ... andres@awork3:/tmp/pg-meson$ time ninja [1/1] Generating test clean with a custom command real0m0.085s user0m0.068s sys 0m0.016s ... andres@awork3:/tmp/pg-meson$ time ~/src/meson/meson.py install --quiet ninja: Entering directory `.' real0m0.541s user0m0.412s sys 0m0.130s ... andres@awork3:/tmp/pg-meson$ ninja test [1/2] Running all tests. 1/74 postgresql:setup / temp_install OK0.52s 2/74 postgresql:setup / cleanup_old OK0.01s 3/74 postgresql:tap+pg_archivecleanup / pg_archivecleanup/t/010_pg_archivecleanup.pl OK0.29s 42 subtests passed 4/74 postgresql:tap+pg_checksums / pg_checksums/t/001_basic.pl OK0.27s 8 subtests passed 5/74 postgresql:tap+pg_config / pg_config/t/001_pg_config.pl OK0.26s 20 subtests passed ... 68/74 postgresql:tap+pg_dump / pg_dump/t/002_pg_dump.pl OK 28.26s 6408 subtests passed ... 74/74 postgresql:isolation / pg_isolation_regress OK 114.91s Ok: 74 Expected Fail: 0 Fail: 0 Unexpected Pass:0 Skipped:0 Timeout:0 Full log written to /tmp/pg-meson/meson-logs/testlog.txt And in cases of failures it'll show the failure when it happens (including the command to rerun just that test, without the harness in between), and then a summary at the end: 61/74 postgresql:tap+pg_verifybackup / pg_verifybackup/t/003_corruption.pl OK 10.65s 44 subtests passed 49/74 postgresql:tap+recovery / recovery/t/019_replslot_limit.pl ERROR 7.53s exit status 1 >>> MALLOC_PERTURB_=16 PATH=/tmp/pg-meson/tmp_install///usr/local/bin:/home/andres/bin/pg:/home/andres/bin/bin:/usr/sbin:/sbin:/home/andres/bin/pg:/home/andres/bin/bin:/usr/sbin:/sbin:/home/andres/bin/pg:/home/andres/bin/bin:/usr/sbin:/sbin:/home/andres/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:/snap/bin PG_REGRESS=/tmp/pg-meson/src/test/regress/pg_regress REGRESS_SHLIB=/tmp/pg-meson/src/test/regress/regress.so LD_LIBRARY_PATH=/tmp/pg-meson/tmp_install///usr/local/lib/x86_64-linux-gnu /home/andres/src/postgresql/src/tools/testwrap /tmp/pg-meson recovery t/019_replslot_limit.pl perl -I /home/andres/src/postgresql/src/test/perl -I /home/andres/src/postgresql/src/test/recovery /home/andres/src/postgresql/src/test/recovery/t/019_replslot_limit.pl
Re: Have I found an interval arithmetic bug?
On Mon, Apr 5, 2021 at 02:01:58PM -0400, Bruce Momjian wrote: > On Mon, Apr 5, 2021 at 11:33:10AM -0500, Justin Pryzby wrote: > Well, bug or not, we are not going to change back branches for this, and > if you want a larger discussion, it will have to wait for PG 15. > > > > https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-INTERVAL-INPUT > > > « …field values can have fractional parts; for example '1.5 week' or > > > '01:02:03.45'. Such input is converted to the appropriate number of > > > months, days, and seconds for storage. When this would result in a > > > fractional number of months or days, the fraction is added to the > > > lower-order fields using the conversion factors 1 month = 30 days and 1 > > > day = 24 hours. For example, '1.5 month' becomes 1 month and 15 days. > > > Only seconds will ever be shown as fractional on output. » > > I see that. What is not clear here is how far we flow down. I was > looking at adding documentation or regression tests for that, but was > unsure. I adjusted the docs slightly in the attached patch. Here is an updated patch, which will be for PG 15. It updates the documentation to state: The fractional parts are used to compute appropriate values for the next lower-order internal fields (months, days, seconds). It removes the flow from fractional months/weeks to hours-minutes-seconds, and adds missing rounding for fractional computations. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion. >From 0b5bf02e5baaa29e74fac9ac3823d593b6c5e3f2 Mon Sep 17 00:00:00 2001 From: Bruce Momjian Date: Thu, 8 Apr 2021 13:07:16 -0400 Subject: [PATCH] interval squash commit --- doc/src/sgml/datatype.sgml| 16 +++- src/backend/utils/adt/datetime.c | 13 ++--- src/interfaces/ecpg/pgtypeslib/interval.c | 12 ++-- src/test/regress/expected/interval.out| 8 src/test/regress/sql/interval.sql | 2 +- 5 files changed, 24 insertions(+), 27 deletions(-) diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml index 7c341c8e3f..9454107e75 100644 --- a/doc/src/sgml/datatype.sgml +++ b/doc/src/sgml/datatype.sgml @@ -2770,15 +2770,13 @@ P years-months- - In the verbose input format, and in some fields of the more compact - input formats, field values can have fractional parts; for example - '1.5 week' or '01:02:03.45'. Such input is - converted to the appropriate number of months, days, and seconds - for storage. When this would result in a fractional number of - months or days, the fraction is added to the lower-order fields - using the conversion factors 1 month = 30 days and 1 day = 24 hours. - For example, '1.5 month' becomes 1 month and 15 days. - Only seconds will ever be shown as fractional on output. + Field values can have fractional parts; for example, '1.5 + weeks' or '01:02:03.45'. The fractional + parts are used to compute appropriate values for the next lower-order + internal fields (months, days, seconds). For example, '1.5 + months' becomes 1 month 15 days. + The fractional conversion factors used are 1 month = 30 days and 1 day + = 24 hours. Only seconds will ever be shown as fractional on output. diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c index 889077f55c..d5b3705145 100644 --- a/src/backend/utils/adt/datetime.c +++ b/src/backend/utils/adt/datetime.c @@ -526,7 +526,6 @@ AdjustFractDays(double frac, struct pg_tm *tm, fsec_t *fsec, int scale) extra_days = (int) frac; tm->tm_mday += extra_days; frac -= extra_days; - AdjustFractSeconds(frac, tm, fsec, SECS_PER_DAY); } /* Fetch a fractional-second value with suitable error checking */ @@ -3307,28 +3306,28 @@ DecodeInterval(char **field, int *ftype, int nf, int range, case DTK_YEAR: tm->tm_year += val; if (fval != 0) - tm->tm_mon += fval * MONTHS_PER_YEAR; + tm->tm_mon += rint(fval * MONTHS_PER_YEAR); tmask = DTK_M(YEAR); break; case DTK_DECADE: tm->tm_year += val * 10; if (fval != 0) - tm->tm_mon += fval * MONTHS_PER_YEAR * 10; + tm->tm_mon += rint(fval * MONTHS_PER_YEAR * 10); tmask = DTK_M(DECADE); break; case DTK_CENTURY: tm->tm_year += val * 100; if (fval != 0) - tm->tm_mon += fval * MONTHS_PER_YEAR * 100; + tm->tm_mon += rint(fval * MONTHS_PER_YEAR * 100); tmask = DTK_M(CENTURY); break; case DTK_MILLENNIUM: tm->tm_year += val * 1000; if (fval != 0) - tm->tm_mon += fval * MONTHS_PER_YEAR * 1000; + tm->tm_mon += rint(fval * MONTHS_PER_YEAR * 1000); tmask = DTK_M(MILLENNIUM); break; @@ -3565,7 +3564,7 @@
Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
On 2021-Apr-08, James Coleman wrote: > On Thu, Apr 8, 2021 at 1:04 PM Alvaro Herrera wrote: > > > > On 2021-Apr-08, James Coleman wrote: > > > > > I assume proper procedure for the CF entry is to move it into the > > > current CF and then mark it as committed, however I don't know how (or > > > don't have permissions?) to move it into the current CF. How does one > > > go about doing that? > > > > > > Here's the entry: https://commitfest.postgresql.org/29/2542/ > > > > Done, thanks. > Thanks. Is that something I should be able to do myself No, sorry. -- Álvaro Herrera Valdivia, Chile "La vida es para el que se aventura"
Re: VACUUM (DISABLE_PAGE_SKIPPING on)
On 2021-Apr-08, Simon Riggs wrote: > On Thu, 8 Apr 2021 at 16:58, David Steele wrote: > > It's not clear to me which patch is which, so perhaps move one CF entry > > to next CF and clarify which patch is current? > > Entry: Maximize page freezing > has this patch, perfectly fine, awaiting review from Alvaro since 29 Nov. > one_freeze_then_max_freeze.v7.patch Oh dear, was this waiting on me? It was not in my to-do, so I didn't pay close attention. -- Álvaro Herrera39°49'30"S 73°17'W
Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
On Thu, Apr 8, 2021 at 1:04 PM Alvaro Herrera wrote: > > On 2021-Apr-08, James Coleman wrote: > > > I assume proper procedure for the CF entry is to move it into the > > current CF and then mark it as committed, however I don't know how (or > > don't have permissions?) to move it into the current CF. How does one > > go about doing that? > > > > Here's the entry: https://commitfest.postgresql.org/29/2542/ > > Done, thanks. > > -- > Álvaro Herrera39°49'30"S 73°17'W Thanks. Is that something I should be able to do myself (should I be asking someone for getting privileges in the app to do so)? I'm not sure what the project policy is on that. James
Re: psql - add SHOW_ALL_RESULTS option
Bonjour Fabien, On Thu, Apr 08, 2021 at 07:04:01PM +0200, Fabien COELHO wrote: > > > > Thanks for the patch Fabien. I've hit this issue multiple time and this is > > indeed unwelcome. Should we add it as an open item? > > It is definitely a open item. I'm not sure where you want to add it… > possibly the "Pg 14 Open Items" wiki page? Correct. > I tried but I do not have enough > privileges, if you can do it please proceed. I added an entry in the next CF > in the bugfix section. That's strange, I don't think you need special permission there. It's working for me so I added an item with a link to the patch!
Re: SQL-standard function body
On 2021-Apr-08, Julien Rouhaud wrote: > On Thu, Apr 08, 2021 at 02:58:02AM -0400, Tom Lane wrote: > > No, because if that were the explanation then we'd be getting no > > buildfarm coverage at all for for pg_stat_statements. Which aside > > from being awful contradicts the results at coverage.postgresql.org. > > Is there any chance that coverage.postgresql.org isn't backed by the buildfarm > client but a plain make check-world or something like that? Yes, coverage.pg.org runs "make check-world". Maybe it would make sense to change that script, so that it runs the buildfarm's run_build.pl script instead of "make check-world". That would make coverage.pg.org report what the buildfarm actually tests ... it would have made this problem a bit more obvious. -- Álvaro Herrera39°49'30"S 73°17'W
Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
On 2021-Apr-08, James Coleman wrote: > I assume proper procedure for the CF entry is to move it into the > current CF and then mark it as committed, however I don't know how (or > don't have permissions?) to move it into the current CF. How does one > go about doing that? > > Here's the entry: https://commitfest.postgresql.org/29/2542/ Done, thanks. -- Álvaro Herrera39°49'30"S 73°17'W
Re: psql - add SHOW_ALL_RESULTS option
Bonjour Julien, Attached a patch which attempts to fix this by moving the cancellation cancelling request after processing results. Thank you for your fixing. I tested and the problem has been solved after applying your patch. Thanks for the patch Fabien. I've hit this issue multiple time and this is indeed unwelcome. Should we add it as an open item? It is definitely a open item. I'm not sure where you want to add it… possibly the "Pg 14 Open Items" wiki page? I tried but I do not have enough privileges, if you can do it please proceed. I added an entry in the next CF in the bugfix section. -- Fabien.
Re: SQL-standard function body
On 4/8/21 2:40 AM, Julien Rouhaud wrote: > On Wed, Apr 07, 2021 at 11:33:20PM -0700, Andres Freund wrote: >> Hi, >> >> On 2021-04-08 02:05:25 -0400, Tom Lane wrote: >>> So far the buildfarm seems to be turning green after b3ee4c503 ... >>> so I wonder what extra condition is needed to cause the failure >>> Andres is seeing. >> Nothing special, really. Surprised the BF doesn't see it: >> >> andres@awork3:~/build/postgres/dev-assert/vpath$ cat /tmp/test.conf >> force_parallel_mode=regress >> andres@awork3:~/build/postgres/dev-assert/vpath$ make -j48 -s && >> EXTRA_REGRESS_OPTS='--temp-config /tmp/test.conf' make -s -C >> contrib/pg_stat_statements/ check >> All of PostgreSQL successfully made. Ready to install. >> ... >> The differences that caused some tests to fail can be viewed in the >> file >> "/home/andres/build/postgres/dev-assert/vpath/contrib/pg_stat_statements/regression.diffs". >> A copy of the test summary that you see >> above is saved in the file >> "/home/andres/build/postgres/dev-assert/vpath/contrib/pg_stat_statements/regression.out". >> ... > Is think this is because the buildfarm client is running installcheck for the > contribs rather than check, and pg_stat_statements/Makefile has: > > # Disabled because these tests require > "shared_preload_libraries=pg_stat_statements", > # which typical installcheck users do not have (e.g. buildfarm clients). > NO_INSTALLCHECK = 1 > > > Yeah, Julien is right, we run "make check" for these in src/test/modules but I missed contrib. I have fixed this on crake so we get some immediate coverage and a fix will be pushed to github shortly. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Thu, Apr 8, 2021 at 12:51:06PM -0400, Álvaro Herrera wrote: > On 2021-Apr-08, Bruce Momjian wrote: > > > pg_stat_activity.queryid is new, but I can imagine cases where you would > > join pg_stat_activity to pg_stat_statements to get an estimate of how > > long the query will take --- having one using an underscore and another > > one not seems odd. > > OK. So far, you have one vote for queryid (your own) and two votes for > query_id (mine and Julien's). And even yourself were hesitating about > it earlier in the thread. OK, if people are fine with pg_stat_activity.query_id not matching pg_stat_statements.queryid, I am fine with that. I just don't want someone to say it was a big mistake later. ;-) -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
On Thu, Apr 8, 2021 at 8:01 AM David Rowley wrote: > > On Thu, 8 Apr 2021 at 22:54, David Rowley wrote: > > > > I think the changes in the patch are fairly isolated and the test > > coverage is now pretty good. I'm planning on looking at the patch > > again now and will consider pushing it for PG14. > > I push this with some minor cleanup from the v6 patch I posted earlier. > > David Thank you! I assume proper procedure for the CF entry is to move it into the current CF and then mark it as committed, however I don't know how (or don't have permissions?) to move it into the current CF. How does one go about doing that? Here's the entry: https://commitfest.postgresql.org/29/2542/ Thanks, James
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On 2021-Apr-08, Bruce Momjian wrote: > pg_stat_activity.queryid is new, but I can imagine cases where you would > join pg_stat_activity to pg_stat_statements to get an estimate of how > long the query will take --- having one using an underscore and another > one not seems odd. OK. So far, you have one vote for queryid (your own) and two votes for query_id (mine and Julien's). And even yourself were hesitating about it earlier in the thread. -- Álvaro Herrera Valdivia, Chile
Re: VACUUM (DISABLE_PAGE_SKIPPING on)
On Thu, 8 Apr 2021 at 17:44, David Steele wrote: > > On 4/8/21 12:29 PM, Simon Riggs wrote: > > On Thu, 8 Apr 2021 at 16:58, David Steele wrote: > > > It has been five months since this patch was updated, so marking > Returned with Feedback. > > Please resubmit to the next CF when you have a new patch. > >>> > >>> There are 2 separate patch-sets on this thread, with separate CF entries. > >>> > >>> One has requested changes that have not been made. I presume this one > >>> has been RwF'd. > >>> > >>> What is happening about the other patch? > >> > >> Hmmm, well https://commitfest.postgresql.org/32/2908 and > >> https://commitfest.postgresql.org/32/2909 are both linked to the same > >> thread with the same patch, so I thought it was a duplicate. > > > > Nobody had mentioned any confusion. Multiple patches on one thread is > > common. > > Yes, but these days they generally get updated in the same reply so the > cfbot can test them all. In your case only the latest patch was being > tested and it was not applying cleanly. > > >> It's not clear to me which patch is which, so perhaps move one CF entry > >> to next CF and clarify which patch is current? > > > > Entry: Maximize page freezing > > has this patch, perfectly fine, awaiting review from Alvaro since 29 Nov. > > one_freeze_then_max_freeze.v7.patch > > That CF entry was marked Waiting for Author on 3/11, so I thought it was > for this patch. In fact, both CF entries were Waiting for Author for > some time. That was done by someone that hadn't mentioned it to me, or the list. It is not Waiting on Author. > Moved to the next CF in Waiting for Author state. That is clearly an incorrect state. -- Simon Riggshttp://www.EnterpriseDB.com/
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Fri, Apr 9, 2021 at 12:38:29AM +0800, Julien Rouhaud wrote: > On Thu, Apr 08, 2021 at 11:34:25AM -0400, Bruce Momjian wrote: > > > > OK, let's get some details. First, pg_stat_statements.queryid already > > exists (no underscore), and I don't think anyone wants to change that. > > > > pg_stat_activity.queryid is new, but I can imagine cases where you would > > join pg_stat_activity to pg_stat_statements to get an estimate of how > > long the query will take --- having one using an underscore and another > > one not seems odd. > > Indeed, and also being able to join with a USING clause rather than an ON > could > also save some keystrokes. But unfortunately, we already have (userid, dbid) > on pg_stat_statements side vs (usesysid, datid) on pg_stat_activity side, so > this unfortunately won't fix all the oddities. Wow, good point. Shame they don't match. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: VACUUM (DISABLE_PAGE_SKIPPING on)
On 4/8/21 12:29 PM, Simon Riggs wrote: On Thu, 8 Apr 2021 at 16:58, David Steele wrote: It has been five months since this patch was updated, so marking Returned with Feedback. Please resubmit to the next CF when you have a new patch. There are 2 separate patch-sets on this thread, with separate CF entries. One has requested changes that have not been made. I presume this one has been RwF'd. What is happening about the other patch? Hmmm, well https://commitfest.postgresql.org/32/2908 and https://commitfest.postgresql.org/32/2909 are both linked to the same thread with the same patch, so I thought it was a duplicate. Nobody had mentioned any confusion. Multiple patches on one thread is common. Yes, but these days they generally get updated in the same reply so the cfbot can test them all. In your case only the latest patch was being tested and it was not applying cleanly. It's not clear to me which patch is which, so perhaps move one CF entry to next CF and clarify which patch is current? Entry: Maximize page freezing has this patch, perfectly fine, awaiting review from Alvaro since 29 Nov. one_freeze_then_max_freeze.v7.patch That CF entry was marked Waiting for Author on 3/11, so I thought it was for this patch. In fact, both CF entries were Waiting for Author for some time. Moved to the next CF in Waiting for Author state. Regards, -- -David da...@pgmasters.net
Re: doc review for v14
Another round of doc review, not yet including all of yesterday's commits. 29c8d614c3 duplicate words diff --git a/src/include/lib/sort_template.h b/src/include/lib/sort_template.h index 771c789ced..24d6d0006c 100644 --- a/src/include/lib/sort_template.h +++ b/src/include/lib/sort_template.h @@ -241,7 +241,7 @@ ST_SCOPE void ST_SORT(ST_ELEMENT_TYPE *first, size_t n /* * Find the median of three values. Currently, performance seems to be best - * if the the comparator is inlined here, but the med3 function is not inlined + * if the comparator is inlined here, but the med3 function is not inlined * in the qsort function. */ static pg_noinline ST_ELEMENT_TYPE * e7c370c7c5 pg_amcheck: remove Double semi-colon diff --git a/src/bin/pg_amcheck/t/004_verify_heapam.pl b/src/bin/pg_amcheck/t/004_verify_heapam.pl index 36607596b1..2171d236a7 100644 --- a/src/bin/pg_amcheck/t/004_verify_heapam.pl +++ b/src/bin/pg_amcheck/t/004_verify_heapam.pl @@ -175,7 +175,7 @@ sub write_tuple seek($fh, $offset, 0) or BAIL_OUT("seek failed: $!"); defined(syswrite($fh, $buffer, HEAPTUPLE_PACK_LENGTH)) - or BAIL_OUT("syswrite failed: $!");; + or BAIL_OUT("syswrite failed: $!"); return; } b745e9e60e a statistics objects diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c index 463d44a68a..4674168ff8 100644 --- a/src/backend/statistics/extended_stats.c +++ b/src/backend/statistics/extended_stats.c @@ -254,7 +254,7 @@ BuildRelationExtStatistics(Relation onerel, double totalrows, * that would require additional columns. * * See statext_compute_stattarget for details about how we compute statistics - * target for a statistics objects (from the object target, attribute targets + * target for a statistics object (from the object target, attribute targets * and default statistics target). */ int e7d5c5d9dc guc.h: remove mention of "doit" diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h index 1892c7927b..1126b34798 100644 --- a/src/include/utils/guc.h +++ b/src/include/utils/guc.h @@ -90,8 +90,7 @@ typedef enum * dividing line between "interactive" and "non-interactive" sources for * error reporting purposes. * - * PGC_S_TEST is used when testing values to be used later ("doit" will always - * be false, so this never gets stored as the actual source of any value). + * PGC_S_TEST is used when testing values to be used later. * For example, ALTER DATABASE/ROLE tests proposed per-database or per-user * defaults this way, and CREATE FUNCTION tests proposed function SET clauses * this way. This is an interactive case, but it needs its own source value ad5f9a2023 Caller diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c index 9961d27df4..09fcff6729 100644 --- a/src/backend/utils/adt/jsonfuncs.c +++ b/src/backend/utils/adt/jsonfuncs.c @@ -1651,7 +1651,7 @@ push_null_elements(JsonbParseState **ps, int num) * this path. E.g. the path [a][0][b] with the new value 1 will produce the * structure {a: [{b: 1}]}. * - * Called is responsible to make sure such path does not exist yet. + * Caller is responsible to make sure such path does not exist yet. */ static void push_path(JsonbParseState **st, int level, Datum *path_elems, @@ -4887,7 +4887,7 @@ IteratorConcat(JsonbIterator **it1, JsonbIterator **it2, * than just one last element, this flag will instruct to create the whole * chain of corresponding objects and insert the value. * - * JB_PATH_CONSISTENT_POSITION for an array indicates that the called wants to + * JB_PATH_CONSISTENT_POSITION for an array indicates that the caller wants to * keep values with fixed indices. Indices for existing elements could be * changed (shifted forward) in case if the array is prepended with a new value * and a negative index out of the range, so this behavior will be prevented 9acedbd4af as diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 20e7d57d41..40a54ad0bd 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -410,7 +410,7 @@ CopyMultiInsertBufferCleanup(CopyMultiInsertInfo *miinfo, * Once flushed we also trim the tracked buffers list down to size by removing * the buffers created earliest first. * - * Callers should pass 'curr_rri' is the ResultRelInfo that's currently being + * Callers should pass 'curr_rri' as the ResultRelInfo that's currently being * used. When cleaning up old buffers we'll never remove the one for * 'curr_rri'. */ 9f78de5042 exist diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c index 5bdaceefd5..182a133033 100644 --- a/src/backend/commands/analyze.c +++ b/src/backend/commands/analyze.c @@ -617,7 +617,7 @@ do_analyze_rel(Relation onerel, VacuumParams *params, * * We assume that VACUUM hasn't set pg_class.reltuples already, even * during a
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
On Thu, Apr 08, 2021 at 11:34:25AM -0400, Bruce Momjian wrote: > > OK, let's get some details. First, pg_stat_statements.queryid already > exists (no underscore), and I don't think anyone wants to change that. > > pg_stat_activity.queryid is new, but I can imagine cases where you would > join pg_stat_activity to pg_stat_statements to get an estimate of how > long the query will take --- having one using an underscore and another > one not seems odd. Indeed, and also being able to join with a USING clause rather than an ON could also save some keystrokes. But unfortunately, we already have (userid, dbid) on pg_stat_statements side vs (usesysid, datid) on pg_stat_activity side, so this unfortunately won't fix all the oddities.
Re: psql - add SHOW_ALL_RESULTS option
On Thu, Apr 08, 2021 at 01:32:12AM +, shiy.f...@fujitsu.com wrote: > > Attached a patch which attempts to fix this by moving the cancellation > > cancelling request after processing results. > > Thank you for your fixing. I tested and the problem has been solved after > applying your patch. Thanks for the patch Fabien. I've hit this issue multiple time and this is indeed unwelcome. Should we add it as an open item?
Re: VACUUM (DISABLE_PAGE_SKIPPING on)
On Thu, 8 Apr 2021 at 16:58, David Steele wrote: > >> It has been five months since this patch was updated, so marking > >> Returned with Feedback. > >> > >> Please resubmit to the next CF when you have a new patch. > > > > There are 2 separate patch-sets on this thread, with separate CF entries. > > > > One has requested changes that have not been made. I presume this one > > has been RwF'd. > > > > What is happening about the other patch? > > Hmmm, well https://commitfest.postgresql.org/32/2908 and > https://commitfest.postgresql.org/32/2909 are both linked to the same > thread with the same patch, so I thought it was a duplicate. Nobody had mentioned any confusion. Multiple patches on one thread is common. > It's not clear to me which patch is which, so perhaps move one CF entry > to next CF and clarify which patch is current? Entry: Maximize page freezing has this patch, perfectly fine, awaiting review from Alvaro since 29 Nov. one_freeze_then_max_freeze.v7.patch Other patch is awaiting changes. -- Simon Riggshttp://www.EnterpriseDB.com/
Re: SQL-standard function body
Julien Rouhaud writes: > On Thu, Apr 08, 2021 at 02:58:02AM -0400, Tom Lane wrote: >> No, because if that were the explanation then we'd be getting no >> buildfarm coverage at all for for pg_stat_statements. Which aside >> from being awful contradicts the results at coverage.postgresql.org. > Is there any chance that coverage.postgresql.org isn't backed by the buildfarm > client but a plain make check-world or something like that? Hmm, I think you're right. Poking around in the log files from one of my own buildfarm animals, there's no evidence that pg_stat_statements is being tested at all. Needless to say, that's just horrid :-( I see that contrib/test_decoding also sets NO_INSTALLCHECK = 1, and the reason it gets tested is that the buildfarm script has a special module for that. I guess we need to clone that module, or maybe better, find a way to generalize it. There are also some src/test/modules modules that set NO_INSTALLCHECK, but apparently those do have coverage (modules-check is the step that runs their SQL tests, and then the TAP tests if any get broken out as separate buildfarm steps). regards, tom lane
Re: Autovacuum on partitioned table (autoanalyze)
On 2021-Apr-08, Tomas Vondra wrote: > On 4/8/21 5:27 PM, Alvaro Herrera wrote: > > > Same as for any other relation: ANALYZE would set it, after it's done > > scanning the table. We would to make sure that nothing resets it to > > empty, though, and that it doesn't cause issues elsewhere. (The patch I > > sent contains the minimal change to make it work, but of course that's > > missing having other pieces of code maintain it.) > > So ANALYZE would inspect the child relations, sum the reltuples and set > it for the parent? IMO that'd be problematic because it'd mean we're > comparing the current number of changes with reltuples value which may > be arbitrarily stale (if we haven't analyzed the parent for a while). What? Not at all. reltuples would be set by ANALYZE on one run, and then the value is available for the future autovacuum run. That's how it works for regular tables too, so I'm not sure what you problem have with that. The (possibly stale) reltuples value is multiplied by the scale factor, and added to the analyze_threshold value, and that's compared with the current changes_since_analyze to determine whether to analyze or not. > That's essentially the issue I described when explaining why I think the > code needs to propagate the changes, reread the stats and then evaluate > which relations need vacuuming. It's similar to the issue of comparing > old changes_since_analyze vs. current reltuples, which is why the code > is rereading the stats before checking the thresholds. This time it's > the opposite direction - the reltuples might be stale. Well, I don't think the issue is the same. reltuples is always stale, even for regular tables, because that's just how it works. changes_since_analyze is not stale for regular tables, and that's why it makes sense to propagate it from partitions to ancestors prior to checking the analyze condition. > FWIW I think the current refresh logic is not quite correct, because > autovac_refresh_stats does some throttling (STATS_READ_DELAY). It > probably needs a "force" parameter to ensure it actually reads the > current stats in this one case. Hmm ... good catch, but actually that throttling only applies to the launcher. do_autovacuum runs in the worker, so there's no throttling. -- Álvaro Herrera39°49'30"S 73°17'W
Re: VACUUM (DISABLE_PAGE_SKIPPING on)
On 4/8/21 11:41 AM, Simon Riggs wrote: On Thu, 8 Apr 2021 at 16:23, David Steele wrote: On 3/17/21 4:50 PM, Simon Riggs wrote: On Fri, 12 Mar 2021 at 22:16, Tomas Vondra wrote: On 1/28/21 2:33 PM, Simon Riggs wrote: On Thu, 28 Jan 2021 at 12:53, Masahiko Sawada wrote: This entry has been "Waiting on Author" status and the patch has not been updated since Nov 30. Are you still planning to work on this? Yes, new patch version tomorrow. Thanks for the nudge and the review. So, is it tomorrow already? ;-) Been a long day. ;-) It has been five months since this patch was updated, so marking Returned with Feedback. Please resubmit to the next CF when you have a new patch. There are 2 separate patch-sets on this thread, with separate CF entries. One has requested changes that have not been made. I presume this one has been RwF'd. What is happening about the other patch? Hmmm, well https://commitfest.postgresql.org/32/2908 and https://commitfest.postgresql.org/32/2909 are both linked to the same thread with the same patch, so I thought it was a duplicate. It's not clear to me which patch is which, so perhaps move one CF entry to next CF and clarify which patch is current? Regards, -- -David da...@pgmasters.net
Re: Autovacuum on partitioned table (autoanalyze)
On 4/8/21 5:27 PM, Alvaro Herrera wrote: > On 2021-Apr-08, Tomas Vondra wrote: > >> On 4/8/21 5:22 AM, Alvaro Herrera wrote: > >>> However, I just noticed there is a huge problem, which is that the new >>> code in relation_needs_vacanalyze() is doing find_all_inheritors(), and >>> we don't necessarily have a snapshot that lets us do that. While adding >>> a snapshot acquisition at that spot is a very easy fix, I hesitate to >>> fix it that way, because the whole idea there seems quite wasteful: we >>> have to look up, open and lock every single partition, on every single >>> autovacuum iteration through the database. That seems bad. I'm >>> inclined to think that a better idea may be to store reltuples for the >>> partitioned table in pg_class.reltuples, instead of having to add up the >>> reltuples of each partition. I haven't checked if this is likely to >>> break anything. >> >> How would that value get updated, for the parent? > > Same as for any other relation: ANALYZE would set it, after it's done > scanning the table. We would to make sure that nothing resets it to > empty, though, and that it doesn't cause issues elsewhere. (The patch I > sent contains the minimal change to make it work, but of course that's > missing having other pieces of code maintain it.) > So ANALYZE would inspect the child relations, sum the reltuples and set it for the parent? IMO that'd be problematic because it'd mean we're comparing the current number of changes with reltuples value which may be arbitrarily stale (if we haven't analyzed the parent for a while). That's essentially the issue I described when explaining why I think the code needs to propagate the changes, reread the stats and then evaluate which relations need vacuuming. It's similar to the issue of comparing old changes_since_analyze vs. current reltuples, which is why the code is rereading the stats before checking the thresholds. This time it's the opposite direction - the reltuples might be stale. FWIW I think the current refresh logic is not quite correct, because autovac_refresh_stats does some throttling (STATS_READ_DELAY). It probably needs a "force" parameter to ensure it actually reads the current stats in this one case. >>> (Also, a minor buglet: if we do ANALYZE (col1), then ANALYZE (col2) a >>> partition, then we repeatedly propagate the counts to the parent table, >>> so we would cause the parent to be analyzed more times than it should. >>> Sounds like we should not send the ancestor list when a column list is >>> given to manual analyze. I haven't verified this, however.) >> >> Are you sure? I haven't tried, but shouldn't this be prevented by only >> sending the delta between the current and last reported value? > > I did try, and yes it behaves as you say. > OK, good. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: VACUUM (DISABLE_PAGE_SKIPPING on)
On Thu, 8 Apr 2021 at 16:23, David Steele wrote: > > On 3/17/21 4:50 PM, Simon Riggs wrote: > > On Fri, 12 Mar 2021 at 22:16, Tomas Vondra > > wrote: > >> > >> On 1/28/21 2:33 PM, Simon Riggs wrote: > >>> On Thu, 28 Jan 2021 at 12:53, Masahiko Sawada > >>> wrote: > >>> > This entry has been "Waiting on Author" status and the patch has not > been updated since Nov 30. Are you still planning to work on this? > >>> > >>> Yes, new patch version tomorrow. Thanks for the nudge and the review. > >>> > >> > >> So, is it tomorrow already? ;-) > > > > Been a long day. ;-) > > It has been five months since this patch was updated, so marking > Returned with Feedback. > > Please resubmit to the next CF when you have a new patch. There are 2 separate patch-sets on this thread, with separate CF entries. One has requested changes that have not been made. I presume this one has been RwF'd. What is happening about the other patch? -- Simon Riggshttp://www.EnterpriseDB.com/