Re: "PANIC: could not open critical system index 2662" - twice
On Tue, May 9, 2023 at 3:15 AM Michael Paquier wrote: > > On Mon, May 08, 2023 at 07:15:20PM +0530, Dilip Kumar wrote: > > I am able to reproduce this using the steps given above, I am also > > trying to analyze this further. I will send the update once I get > > some clue. > > Have you been able to reproduce this on HEAD or at the top of > REL_15_STABLE, or is that 15.2-ish without fa5dd46? > I am able to reproduce on REL_15_STABLE as well with your test case. The only difference is without fa5dd46 the issue gets reproduced just in a couple of seconds and very consistent OTOH on REL_15_STABLE it takes time to reproduce 2-3 mins and also reproduction is not very consistent. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: "PANIC: could not open critical system index 2662" - twice
Hi, On 2023-05-08 17:46:37 -0700, Andres Freund wrote: > My current gut feeling is that we should use datconnlimit == -2 to prevent > connections after reaching DropDatabaseBuffers() in dropdb(), and use a new > column in 16, for both createdb() and dropdb(). Attached is a rough prototype of that idea (only using datconnlimit == -2 for now). This would need a fair bit more polish. The tests are crappy and output stuff to stderr and don't validate enough. The error messages are bad. No docs changes. More comments about why datconnlimit == -2 is used. Etc. But I think it should be sufficient to discuss whether this is a viable path. I guess we need to move this to -hackers. Perhaps I'll post subsequent versions below https://www.postgresql.org/message-id/20230314174521.74jl6ffqsee5mtug%40awork3.anarazel.de ? Greetings, Andres Freund >From 5784ba0b21eaf05c2989c57ce1f73c13edf0111e Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Mon, 8 May 2023 18:28:33 -0700 Subject: [PATCH v1] Handle interrupted DROP DATABASE Author: Reviewed-by: Discussion: https://postgr.es/m/20230509004637.cgvmfwrbht7xm...@awork3.anarazel.de Backpatch: --- src/include/catalog/pg_database.h | 2 +- src/backend/commands/dbcommands.c | 50 ++--- src/backend/utils/init/postinit.c | 11 + src/test/recovery/meson.build | 1 + src/test/recovery/t/037_invalid_database.pl | 48 5 files changed, 95 insertions(+), 17 deletions(-) create mode 100644 src/test/recovery/t/037_invalid_database.pl diff --git a/src/include/catalog/pg_database.h b/src/include/catalog/pg_database.h index d004f4dc8aa..5cd972da324 100644 --- a/src/include/catalog/pg_database.h +++ b/src/include/catalog/pg_database.h @@ -49,7 +49,7 @@ CATALOG(pg_database,1262,DatabaseRelationId) BKI_SHARED_RELATION BKI_ROWTYPE_OID /* new connections allowed? */ bool datallowconn; - /* max connections allowed (-1=no limit) */ + /* max connections allowed (-1=no limit, -2=invalid database) */ int32 datconnlimit; /* all Xids < this are frozen in this DB */ diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c index 2e242eeff24..a0360d9ad80 100644 --- a/src/backend/commands/dbcommands.c +++ b/src/backend/commands/dbcommands.c @@ -1569,6 +1569,7 @@ dropdb(const char *dbname, bool missing_ok, bool force) bool db_istemplate; Relation pgdbrel; HeapTuple tup; + Form_pg_database datform; int notherbackends; int npreparedxacts; int nslots, @@ -1684,17 +1685,6 @@ dropdb(const char *dbname, bool missing_ok, bool force) dbname), errdetail_busy_db(notherbackends, npreparedxacts))); - /* - * Remove the database's tuple from pg_database. - */ - tup = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(db_id)); - if (!HeapTupleIsValid(tup)) - elog(ERROR, "cache lookup failed for database %u", db_id); - - CatalogTupleDelete(pgdbrel, >t_self); - - ReleaseSysCache(tup); - /* * Delete any comments or security labels associated with the database. */ @@ -1711,6 +1701,31 @@ dropdb(const char *dbname, bool missing_ok, bool force) */ dropDatabaseDependencies(db_id); + /* + * Tell the cumulative stats system to forget it immediately, too. + */ + pgstat_drop_database(db_id); + + tup = SearchSysCacheCopy1(DATABASEOID, ObjectIdGetDatum(db_id)); + if (!HeapTupleIsValid(tup)) + elog(ERROR, "cache lookup failed for database %u", db_id); + datform = (Form_pg_database) GETSTRUCT(tup); + + /* + * Subsequent actions are not transactional (consider + * DropDatabaseBuffers() discarding modified buffers). But we might crash + * or get interrupted below. To prevent accessing a database with invalid + * contents, mark the database as invalid using an in-place update. + */ + datform->datconnlimit = -2; + heap_inplace_update(pgdbrel, tup); + + /* + * Also delete the tuple - transactionally. If this transaction commits, + * the row will be gone, but if we fail, dropdb() can be invoked again. + */ + CatalogTupleDelete(pgdbrel, >t_self); + /* * Drop db-specific replication slots. */ @@ -1723,11 +1738,6 @@ dropdb(const char *dbname, bool missing_ok, bool force) */ DropDatabaseBuffers(db_id); - /* - * Tell the cumulative stats system to forget it immediately, too. - */ - pgstat_drop_database(db_id); - /* * Tell checkpointer to forget any pending fsync and unlink requests for * files in the database; else the fsyncs will fail at next checkpoint, or @@ -2339,6 +2349,14 @@ AlterDatabase(ParseState *pstate, AlterDatabaseStmt *stmt, bool isTopLevel) datform = (Form_pg_database) GETSTRUCT(tuple); dboid = datform->oid; + if (datform->datconnlimit == -2) + { + ereport(FATAL, +errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), +errmsg("database \"%s\" is invalid", stmt->dbname), +errdetail("Use DROP DATABASE to drop invalid databases")); + } + if (!object_ownercheck(DatabaseRelationId, dboid, GetUserId()))
Re: "PANIC: could not open critical system index 2662" - twice
Hi, On 2023-05-08 14:04:00 -0700, Andres Freund wrote: > But perhaps a similar approach could be the solution? My gut says that the > rought direction might allow us to keep dropdb() a single transaction. I started to hack on the basic approach of committing after the catalog changes. But then I started wondering if we're not tackling this all wrong. We don't actually want to drop the catalog contents early as that prevents the database from being dropped again, while potentially leaving behind contents in the case of a crash / interrupt. Similary, the fact that we only commit the transaction at the end of createdb() leads to interrupts / crashes orphaning the contents of that [partial] database. We also hve similar issues with movedb(), I think. This is a non-transactional operation. I think we should copy the approach of the CONCURRENTLY operations. Namely add a new column to pg_database, indicating whether the database contents are valid. An invalid database can be dropped, but not connected to. Then we could have createdb() commit before starting to create the target database directory (with invalid = true, of course). After all the filesystem level stuff is complete, set invalid = false. For dropping a database we'd use heap_inplace_update() to set invalid = true just before the DropDatabaseBuffers(), preventing any connections after that point. Making movedb() safe is probably a bit harder - I think it'd temporarily require two pg_database entries? Of course we can't add a new column in the back branches. IIRC we had a similar issue with CIC some point, and just ended up misusing some other column for the backbranches? We could e.g. use datconnlimit == -2 for that purpose (but would need to make sure that ALTER DATABASE can't unset it). My current gut feeling is that we should use datconnlimit == -2 to prevent connections after reaching DropDatabaseBuffers() in dropdb(), and use a new column in 16, for both createdb() and dropdb(). In some ways handling createdb() properly is a new feature, but it's also arguably a bug that we leak the space - and I think the code will be better if we work on both together. Greetings, Andres Freund
Re: "PANIC: could not open critical system index 2662" - twice
On Mon, May 08, 2023 at 06:04:23PM -0400, Tom Lane wrote: > Andres seems to think it's a problem with aborting a DROP DATABASE. > Adding more data might serve to make the window wider, perhaps. And the odds get indeed much better once I use these two toys: CREATE OR REPLACE FUNCTION create_tables(num_tables int) RETURNS VOID AS $func$ BEGIN FOR i IN 1..num_tables LOOP EXECUTE format(' CREATE TABLE IF NOT EXISTS %I (id int)', 't_' || i); END LOOP; END $func$ LANGUAGE plpgsql; CREATE OR REPLACE FUNCTION drop_tables(num_tables int) RETURNS VOID AS $func$ BEGIN FOR i IN 1..num_tables LOOP EXECUTE format(' DROP TABLE IF EXISTS %I', 't_' || i); END LOOP; END $func$ LANGUAGE plpgsql; And then use this loop with the others I have mentioned upthread (just create origindb and the functions in them): while true; do psql -c 'select create_tables(1000)' origindb > /dev/null 2>&1 ; psql testdb -c "select 1" > /dev/null 2>&1 ; psql -c 'select drop_tables(1000)' origindb > /dev/null 2>&1 ; psql testdb -c "select 1" > /dev/null 2>&1 ; done; On top of that, I have also been able to reproduce the issue on HEAD, and luckily some pg_class file remain around, full of zeros: $ hexdump ./base/199634/1259 000 The contents of 2662, though, looked OK. Echoing Alvaro.. Could we, err, revisit the choice of making WAL_LOG the default strategy even for this set of minor releases? FWIW, I've mentioned that this choice was too aggressive in the thread of 8a86618.. -- Michael signature.asc Description: PGP signature
Re: "PANIC: could not open critical system index 2662" - twice
On Tue, May 9, 2023 at 10:04 AM Tom Lane wrote: > Michael Paquier writes: > > One thing I was wondering about to improve the odds of the hits is to > > be more aggressive with the number of relations created at once, so as > > we are much more aggressive with the number of pages extended in > > pg_class from the origin database. > > Andres seems to think it's a problem with aborting a DROP DATABASE. > Adding more data might serve to make the window wider, perhaps. Here's an easy way: @@ -1689,6 +1689,14 @@ dropdb(const char *dbname, bool missing_ok, bool force) /* Close all smgr fds in all backends. */ WaitForProcSignalBarrier(EmitProcSignalBarrier(PROCSIGNAL_BARRIER_SMGRRELEASE)); +/* XXX pretend one of the above steps got interrupted by a statement timeout or ^C */ +if (random() % 2 == 0) +{ +QueryCancelPending = true; +InterruptPending = true; +CHECK_FOR_INTERRUPTS(); +} postgres=# create database db2; CREATE DATABASE postgres=# drop database db2; ERROR: canceling statement due to user request $ psql db2 psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: PANIC: could not open critical system index 2662 $ od -t x1 base/17/2662 000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 010 $ od -t x1 base/17/2837 000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 004 $ od -t x1 base/17/2840 000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 010
Re: "PANIC: could not open critical system index 2662" - twice
Michael Paquier writes: > One thing I was wondering about to improve the odds of the hits is to > be more aggressive with the number of relations created at once, so as > we are much more aggressive with the number of pages extended in > pg_class from the origin database. Andres seems to think it's a problem with aborting a DROP DATABASE. Adding more data might serve to make the window wider, perhaps. regards, tom lane
Re: "PANIC: could not open critical system index 2662" - twice
On Mon, May 08, 2023 at 07:15:20PM +0530, Dilip Kumar wrote: > I am able to reproduce this using the steps given above, I am also > trying to analyze this further. I will send the update once I get > some clue. Have you been able to reproduce this on HEAD or at the top of REL_15_STABLE, or is that 15.2-ish without fa5dd46? One thing I was wondering about to improve the odds of the hits is to be more aggressive with the number of relations created at once, so as we are much more aggressive with the number of pages extended in pg_class from the origin database. -- Michael signature.asc Description: PGP signature
Re: "PANIC: could not open critical system index 2662" - twice
Hi, On 2023-05-08 20:27:14 +, Evgeny Morozov wrote: > On 8/05/2023 9:47 pm, Andres Freund wrote: > > Did you have any occasions where CREATE or DROP DATABASE was interrupted? > > Either due the connection being terminated or a crash? > > I've uploaded an edited version of the PG log for the time as > https://objective.realityexists.net/temp/log-extract-2023-05-02.txt > (test_behavior_638186279733138190 and test_behavior_638186280406544656 > are the two DBs that got corrupted). > > I cannot see any crash in the test logs or the PG logs, but whether it > was interrupted is less clear. I don't know whether the the tests ran > successfully up to the point where they tried to drop the DBs (I've > since added logging to show that next time), but DROP DATABASE did not > return after 30 seconds and the client library (Npgsql) then tried to > cancel the requests. We then tried to drop the DB again, with the same > results in both cases. After the second attempts timed out we closed the > connections anyway - so maybe that was the interruption? Yep, that is the interruption. I suspect that something was not processing interrupts, which then lead the WaitForProcSignalBarrier() in dropdb() to block. Are you using any extensions? Do you have any chance to figure out what statements were running concurrently with the DROP DATABASE? Seems we figured out what the problem is... Just need to figure out how to fix it. I think the minimal fix will entail moving the commit of the transaction into dropdb(). We need to commit before executing DropDatabaseBuffers(), as the database is inconsistent after that point. Luckily DROP DATABASE already can't be executed in a transaction, so that's easily possible. That means we'd loose track of the files for the database in case of a crash, but that's surely better than a corrupt database. And it's already the case for CREATE DATABASE. Additionally we probably need to prevent processing cancellations between DropDatabaseBuffers() and remove_dbtablespaces(). But we can't just stop accepting interrupts, because that'd break WaitForProcSignalBarrier(). Ah, what fun. Unfortunately QueryCancelHoldoffCount is not sufficient for this purpose - we don't just need to prevent cancellations, we also can't accept termintions either... For a bit I thought we might be able to just reorder operations, moving the WaitForProcSignalBarrier() earlier. Unfortunately I don't think that works, because until DropDatabaseBuffers() has executed, backends might write such buffers back, we need to do WaitForProcSignalBarrier() after that. I've toyed with the idea of splitting DropDatabaseBuffers() into two. In a first pass, mark all buffers as IO_IN_PROGRESS (plus perhaps some other flag). That'd prevent new IO from being started for those buffers. Then WaitForProcSignalBarrier(), to make sure nobody has an open FD for those files. Then actually drop all those buffers and unlink the files. That now can happen with interrupts held, without any chance of being blocked, afaict. In case of being cancelled during WaitForProcSignalBarrier(), AbortBufferIO() would remove IO_IN_PROGRESS, and everything would be fine. I don't like the idea of WaitForProcSignalBarrier() while having buffers marked as in-progress. I don't *immediately* see a deadlock danger, but I'd be unsurprised if there were some. But perhaps a similar approach could be the solution? My gut says that the rought direction might allow us to keep dropdb() a single transaction. Greetings, Andres Freund
Re: "PANIC: could not open critical system index 2662" - twice
On 8/05/2023 9:47 pm, Andres Freund wrote: > Did you have any occasions where CREATE or DROP DATABASE was interrupted? > Either due the connection being terminated or a crash? I've uploaded an edited version of the PG log for the time as https://objective.realityexists.net/temp/log-extract-2023-05-02.txt (test_behavior_638186279733138190 and test_behavior_638186280406544656 are the two DBs that got corrupted). I cannot see any crash in the test logs or the PG logs, but whether it was interrupted is less clear. I don't know whether the the tests ran successfully up to the point where they tried to drop the DBs (I've since added logging to show that next time), but DROP DATABASE did not return after 30 seconds and the client library (Npgsql) then tried to cancel the requests. We then tried to drop the DB again, with the same results in both cases. After the second attempts timed out we closed the connections anyway - so maybe that was the interruption? > As described in > https://postgr.es/m/20230314174521.74jl6ffqsee5mtug%40awork3.anarazel.de > we don't handle that correctly for DROP DATABASE. > > I think that might actually fit the symptoms - the DropDatabaseBuffers() will > throw away the dirty buffer contents from the WAL strategy CREATE DATABASE, > but if you then get cancelled ata point before all the files are removed, the > on-disk fails with all-zeroes would remain. Oooh, that does seem to fit! Thank you for digging that up.
Re: ICU, locale and collation question
On Mon, 2023-05-08 at 16:35 +0200, Oscar Carlberg wrote: > We have a bunch of existing Postgres 10 clusters running on CentOS 7, > which have been initialized (initdb) with these collation options; > > -E 'UTF-8' > --lc-collate=sv_SE.UTF-8 > --lc-ctype=sv_SE.UTF-8 > --lc_monetary=sv_SE.UTF-8 > --lc-numeric=sv_SE.UTF-8 > --lc-time=sv_SE.UTF-8 > --lc-messages=en_US.UTF-8 > > And createdb were provided with these locale options when setting up > databases: > --lc-collate=sv_SE.UTF-8 > --lc-ctype=sv_SE.UTF-8 > > We're upgrading the servers using logical replication, and would like to > take the opportunity to switch to ICU rather than relying on glibc, to > avoid future problems with index corruption if using physical > replication between servers with different versions of glibc. > > We're trying to figure out the most correct way to configure postgres to > achieve this. Currently we have: > > -E 'UTF-8' > --locale-provider=icu > --icu-locale=sv-SE-x-icu > > And createdb are provided with locale options: > --lc-collate=C > --lc-ctype=C > > Is this a safe configuration to avoid index corruption, and other > problems, while still being compatible with the previous locale > settings? Yes, that is safe. But it is not compatible with the old setup when it comes to lc_time, lc_messages and the others. You should use sv_SE.UTF-8 for these locale categories. Note that that won't cause problems with upgrading the C library. Yours, Laurenz Albe
Re: "PANIC: could not open critical system index 2662" - twice
Hi, On 2023-05-07 16:10:28 +, Evgeny Morozov wrote: > Yes, kind of. We have a test suite that creates one test DB and runs a > bunch of tests on it. Two of these tests, however, create another DB > each (also by cloning the same template DB) in order to test copying > data between DBs. It's only these "extra" DBs that were corrupted, at > least on this occasion. Did you have any occasions where CREATE or DROP DATABASE was interrupted? Either due the connection being terminated or a crash? As described in https://postgr.es/m/20230314174521.74jl6ffqsee5mtug%40awork3.anarazel.de we don't handle that correctly for DROP DATABASE. I think that might actually fit the symptoms - the DropDatabaseBuffers() will throw away the dirty buffer contents from the WAL strategy CREATE DATABASE, but if you then get cancelled ata point before all the files are removed, the on-disk fails with all-zeroes would remain. Greetings, Andres Freund
Re: "PANIC: could not open critical system index 2662" - twice
Alvaro Herrera writes: > Maybe it would be sensible to make STRATEGY_FILE=FILE_COPY the default > again, for branch 15, before today's release. If we had more than one such report, I'd be in favor of that. But I think it's a bit premature to conclude that the copy strategy is to blame. regards, tom lane
ICU, locale and collation question
Hello, We have a bunch of existing Postgres 10 clusters running on CentOS 7, which have been initialized (initdb) with these collation options; -E 'UTF-8' --lc-collate=sv_SE.UTF-8 --lc-ctype=sv_SE.UTF-8 --lc_monetary=sv_SE.UTF-8 --lc-numeric=sv_SE.UTF-8 --lc-time=sv_SE.UTF-8 --lc-messages=en_US.UTF-8 And createdb were provided with these locale options when setting up databases: --lc-collate=sv_SE.UTF-8 --lc-ctype=sv_SE.UTF-8 \l in psql gives: Name | Owner | Encoding | Collate | Ctype | --+--+--+-+-+ test-db | test-user | UTF8 | sv_SE.UTF-8 | sv_SE.UTF-8 | We're upgrading the servers using logical replication, and would like to take the opportunity to switch to ICU rather than relying on glibc, to avoid future problems with index corruption if using physical replication between servers with different versions of glibc. We're trying to figure out the most correct way to configure postgres to achieve this. Currently we have: -E 'UTF-8' --locale-provider=icu --icu-locale=sv-SE-x-icu And createdb are provided with locale options: --lc-collate=C --lc-ctype=C \l in psql now gives: Name | Owner | Encoding | Collate | Ctype | ICU Locale | Locale Provider | -+-+--+-+-+-+-+ test-db | test-user | UTF8 | C | C | sv-SE-x-icu | icu | Is this a safe configuration to avoid index corruption, and other problems, while still being compatible with the previous locale settings? We have done some testing and it appears ORDER BY does sort rows according to Swedish localization in the ICU configured test-db. We're uncertain since this blogpost -> https://peter.eisentraut.org/blog/2022/09/26/icu-features-in-postgresql-15 mentions that there are still some postgres code relying on libc locale facilities. Should we set lc-collate and lc-ctype to sv_SE.UTF-8 when creating databases in addition to the ICU options provided to initdb due to this? Will we still be safe from glibc related corruption as long as --locale-provider=icu --icu-locale=sv-SE-x-icu is set? Best Regards, Oscar -- Innehållet i detta e-postmeddelande är konfidentiellt och avsett endast för adressaten.Varje spridning, kopiering eller utnyttjande av innehållet är förbjuden utan tillåtelse av avsändaren. Om detta meddelande av misstag gått till fel adressat vänligen radera det ursprungliga meddelandet och underrätta avsändaren via e-post
Re: "PANIC: could not open critical system index 2662" - twice
On Mon, May 8, 2023 at 7:55 AM Michael Paquier wrote: > > On Sun, May 07, 2023 at 10:30:52PM +1200, Thomas Munro wrote: > > Bug-in-PostgreSQL explanations could include that we forgot it was > > dirty, or some backend wrote it out to the wrong file; but if we were > > forgetting something like permanent or dirty, would there be a more > > systematic failure? Oh, it could require special rare timing if it is > > similar to 8a8661828's confusion about permanence level or otherwise > > somehow not setting BM_PERMANENT, but in the target blocks, so I think > > that'd require a checkpoint AND a crash. It doesn't reproduce for me, > > but perhaps more unlucky ingredients are needed. > > > > Bug-in-OS/FS explanations could include that a whole lot of writes > > were mysteriously lost in some time window, so all those files still > > contain the zeroes we write first in smgrextend(). I guess this > > previously rare (previously limited to hash indexes?) use of sparse > > file hole-punching could be a factor in an it's-all-ZFS's-fault > > explanation: > > Yes, you would need a bit of all that. > > I can reproduce the same backtrace here. That's just my usual laptop > with ext4, so this would be a Postgres bug. First, here are the four > things running in parallel so as I can get a failure in loading a > critical index when connecting: > 1) Create and drop a database with WAL_LOG as strategy and the > regression database as template: > while true; do > createdb --template=regression --strategy=wal_log testdb; > dropdb testdb; > done > 2) Feeding more data to pg_class in the middle, while testing the > connection to the database created: > while true; > do psql -c 'create table popo as select 1 as a;' regression > /dev/null > 2>&1 ; > psql testdb -c "select 1" > /dev/null 2>&1 ; > psql -c 'drop table popo' regression > /dev/null 2>&1 ; > psql testdb -c "select 1" > /dev/null 2>&1 ; > done; > 3) Force some checkpoints: > while true; do psql -c 'checkpoint' > /dev/null 2>&1; sleep 4; done > 4) Force a few crashes and recoveries: > while true ; do pg_ctl stop -m immediate ; pg_ctl start ; sleep 4 ; done > I am able to reproduce this using the steps given above, I am also trying to analyze this further. I will send the update once I get some clue. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: huge discrepancy between EXPLAIN cost and actual time (but the table has just been ANALYZED)
Kent Tong writes: > I have a complex query involving over 15 joins and a CTE query and it takes > over 17s to complete. If you want constructive comments about that, you're going to need to show us the whole thing (and the underlying tables' schemas). Little bits in isolation are not very intelligible. https://wiki.postgresql.org/wiki/Slow_Query_Questions One thought is that you might need to raise from_collapse_limit and/or join_collapse_limit to at least 15 to get the planner to do a complete search of the join-order space. regards, tom lane
Re: huge discrepancy between EXPLAIN cost and actual time (but the table has just been ANALYZED)
On 5/8/23 07:29, Kent Tong wrote: Hi, I have a complex query involving over 15 joins and a CTE query and it takes over 17s to complete. The output of EXPLAIN ANALYZE includes (somewhere deep inside): Index Scan using document_pkey on document document0_ (cost=0.29..8.31 rows=1 width=3027) (actual time=16243.959..16243.961 rows=1 loops=1) This shows an index scan with a very small cost but a very large actual time. The strange thing is, all the tables have just been analyzed with the ANALYZE command (it is not a foreign table). Furthermore, if I run a simple query using that index, both the cost and the actual time are small. Another snippet is: -> CTE Scan on all_related_document p (cost=1815513.32..3030511.77 rows=241785 width=16) (actual time=203.969..203.976 rows=0 loops=1) I think the cost-actual time discrepancy is fine as it is a recursive CTE so postgresql can't estimate the cost well. It is materialized and a full table scan is performed. However, the actual time is not that bad. Also, the estimated rows and the actual rows are also vastly different, but I guess this is fine, isn't it? Any idea how I should check further? ANALYZE just samples the table. If data within the relevant indexed columns aren't evenly distributed, then the statistics might not show the true data distribution. -- Born in Arizona, moved to Babylonia.
Re: Additive backup and restore?
On 5/8/23 05:24, Age Apache wrote: I am designing a database for a web application. In the near future I will require past data for Audit, Security and Analysis purpose. But storing all that data will increase the database size. What are some well known approaches to archiving data for later use without increasing the database size? One approach I am thinking of is compressed additive backup Additive? Why not one compressed archive file per month per table? and restore i.e. copy the rows of the tables that will be required later and store them in a compressed form, and then delete those rows from the table. And when the data is required then restore them from the backup files in an additive way. Is there an easy way to achieve this, say via pg_dump? If the tables are partitioned on the relevant date field, then archiving old data will be "trivially" easy. If not, then it'll still be "easy": - COPY TO a file - compress it - store it somewhere. Reverse to "de-archive" the data. -- Born in Arizona, moved to Babylonia.
huge discrepancy between EXPLAIN cost and actual time (but the table has just been ANALYZED)
Hi, I have a complex query involving over 15 joins and a CTE query and it takes over 17s to complete. The output of EXPLAIN ANALYZE includes (somewhere deep inside): Index Scan using document_pkey on document document0_ (cost=0.29..8.31 rows=1 width=3027) (actual time=16243.959..16243.961 rows=1 loops=1) This shows an index scan with a very small cost but a very large actual time. The strange thing is, all the tables have just been analyzed with the ANALYZE command (it is not a foreign table). Furthermore, if I run a simple query using that index, both the cost and the actual time are small. Another snippet is: -> CTE Scan on all_related_document p (cost=1815513.32..3030511.77 rows=241785 width=16) (actual time=203.969..203.976 rows=0 loops=1) I think the cost-actual time discrepancy is fine as it is a recursive CTE so postgresql can't estimate the cost well. It is materialized and a full table scan is performed. However, the actual time is not that bad. Also, the estimated rows and the actual rows are also vastly different, but I guess this is fine, isn't it? Any idea how I should check further? Many thanks in advance -- Kent Tong IT author and consultant, child education coach
Re: Additive backup and restore?
Consider table partitioning. You can detach, save and delete partitions, and you can restore and attach partitions. On 8 May 2023 12:24:06 CEST, Age Apache wrote: >I am designing a database for a web application. In the near future I will >require past data for Audit, Security and Analysis purpose. But storing all >that data will increase the database size. What are some well known >approaches to archiving data for later use without increasing the database >size? > >One approach I am thinking of is compressed additive backup and restore >i.e. copy the rows of the tables that will be required later and store them >in a compressed form, and then delete those rows from the table. And when >the data is required then restore them from the backup files in an additive >way. > >Is there an easy way to achieve this, say via pg_dump?
Additive backup and restore?
I am designing a database for a web application. In the near future I will require past data for Audit, Security and Analysis purpose. But storing all that data will increase the database size. What are some well known approaches to archiving data for later use without increasing the database size? One approach I am thinking of is compressed additive backup and restore i.e. copy the rows of the tables that will be required later and store them in a compressed form, and then delete those rows from the table. And when the data is required then restore them from the backup files in an additive way. Is there an easy way to achieve this, say via pg_dump?
Re: "PANIC: could not open critical system index 2662" - twice
On 2023-May-07, Thomas Munro wrote: > Did you previously run this same workload on versions < 15 and never > see any problem? 15 gained a new feature CREATE DATABASE ... > STRATEGY=WAL_LOG, which is also the default. I wonder if there is a > bug somewhere near that, though I have no specific idea. If you > explicitly added STRATEGY=FILE_COPY to your CREATE DATABASE commands, > you'll get the traditional behaviour. Maybe it would be sensible to make STRATEGY_FILE=FILE_COPY the default again, for branch 15, before today's release. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "Doing what he did amounts to sticking his fingers under the hood of the implementation; if he gets his fingers burnt, it's his problem." (Tom Lane)
Re: "PANIC: could not open critical system index 2662" - twice
On 8/05/2023 4:24 am, Michael Paquier wrote: > here are the four things running in parallel so as I can get a failure > in loading a critical index when connecting Wow, that is some amazing detective work! We do indeed create tables during our tests, specifically partitions of tables copied from the template DB. Checkpoints seem to be happening every few minutes (we don't force them, but there is a big DB with more writes on the same instance - it's probably due to that). PG is not crashing in our case, though - not this time. Do you have any idea why the "drop database" command would have timed out (not completed after 30 seconds) for the corrupted DBs? On 8/05/2023 4:17 am, Thomas Munro wrote: > Maybe you could do some one way and some the other, so that we try to > learn more? Do you still want me to try this given what Michael has found? Or anything else to help narrow this down?
Re: Hung Query with No Locking Issues
Okay - that worked. How did you know that would work? That's incredible. On Sun, May 7, 2023 at 4:25 PM Tom Lane wrote: > "Michael P. McDonnell" writes: > > I have 2 stored procedures that need to run back to back. It could > > convert to a single one - but it's easier from a maintenance perspective > to > > keep them separated. > > > The first procedure effectively is > > INSERT INTO table_b () SELECT FROM _table_a_; > > COMMIT; > > > Total execution time - about 180s. Nothing in the pg_locks table and > > nothing in the pg_stat_activity table suggests anything is hung over. > > > The second procedure mutates table_b data into table_b_collapsed > > INSERT INTO table_c () SELECT _ FROM _table_b_ JOIN _table_b as b1_ > > JOIN _table_b as b2_ JOIN _table_b as b3_, etc...; > > COMMIT; > > > The first time I run my second stored procedure - it hangs for up to 8 > > hours. > > If I immediately cancel and re-run the second stored procedure it runs > in 2 > > seconds. > > Perhaps an "ANALYZE table_b" in between would help. > > regards, tom lane >
Re: "PANIC: could not open critical system index 2662" - twice
On Mon, May 08, 2023 at 02:46:37PM +1200, Thomas Munro wrote: > That sounds like good news, but I'm still confused: do you see all 0s > in the target database (popo)'s catalogs, as reported (and if so can > you explain how they got there?), or is it regression that is > corrupted in more subtle ways also involving 1s? Nope, I have not been able to confirm that yet without 8a86618. The test runs at a high frequency, so that's kind of hard to catch. I have not been able to get things in a state where I could look at a FPW for pg_class or its index, either, in a way similar to Evgeny. -- Michael signature.asc Description: PGP signature