Re: Gradual migration from integer to bigint?
On Sat, 30 Sept 2023, 23:37 Tom Lane, wrote: > > I think what you're asking for is a scheme whereby some rows in a > table have datatype X in a particular column while other rows in > the very same physical table have datatype Y in the same column. > An alternative for NOT NULL columns would be to use a new attnum for the bigint version of the id, but add a column to pg_attribute allowing linking the new id col to the dropped old id col, to avoid the table rewrite. Global read code change needed: on finding a NULL in a NOT NULL column, check for a link to a dropped old col and use that value instead if found. The check could be almost free in the normal case if there's already a check for unexpected NULL or tuple too short. Then a metadata-only operation can create the new id col and drop and rename and link the old id col, and fix up fkeys etc for the attnum change. Indexes are an issue. Require the in-advance creation of indexes like btree(id::bigint) mirroring every index involving id maybe ? Those could then be swapped in as part of the same metadata operation.
Re: AW: [Extern] Re: consistent postgresql snapshot
On Thu, 12 May 2022 at 14:48, Tom Lane wrote: > "Zwettler Markus (OIZ)" writes: > > I don't want to do use the normal backup algorithm where pg_start_backup > + pg_stop_backup will fix any fractured block and I am required to have all > archived logfiles, therefore. > > I want to produce an atomic consistent disk snapshot. > > [ shrug... ] You can't have that. [snip] > > The only way you could get a consistent on-disk image is to shut > the server down (being sure to do a clean not "immediate" shutdown) > and then take the snapshot. > I think you could work around that by taking a dirty snapshot, making a writable filesystem from it, waiting until you've archived enough WAL to get that to a consistent state, and then firing up a temporary postmaster on that filesystem to go through recovery and shut down cleanly.
Re: Handling glibc v2.28 breaking changes
On Mon, 25 Apr 2022 at 12:45, Laurenz Albe wrote: > > You could consider upgrade in several steps: > > - pg_upgrade to v14 on the current operating system > - use replication, than switchover to move to a current operating system > on a different > machine > - REINDEX CONCURRENTLY all indexes on string expressions > > You could get data corruption and bad query results between the second and > the third steps, > so keep that interval short. > We did something like this, with the addition of a step where we used a new-OS replica to run amcheck's bt_index_check() over all of the btree indexes to find those actually corrupted by the libc upgrade in practice with our data. It was a small fraction of them, and we were able to fit an offline reindex of those btrees and all texty non-btree indexes into an acceptable downtime window, with REINDEX CONCURRENTLY of everything else as a lower priority after the upgrade.
Re: Is this a reasonable use for advisory locks?
On Thu, 14 Apr 2022 at 10:47, Steve Baldwin wrote: > Ok, so you want to allow _other_ updates to a customer while this process > is happening? In that case, advisory locks will probably work. The only > consideration is that the 'id' is a bigint. If your customer id maps to > that, great. If not (for example we use UUID's), you will need some way to > convert that id to a bigint. > Alternatively, create a new table that records the start timestamp of the most recent run of your code block for each customer, and update that as the first action in your transaction. Then row locks on that table will protect you from concurrent runs.
Re: Advice on using materialized views
On Mon, 6 Dec 2021 at 18:48, Phil Endecott wrote: > > - I have a raw data table, indexed on a timestamp column, to which > new data is regularly being added. > > - I have a set of views that produce hourly/daily/monthly summaries > of this data. Querying these views is slow, so I'd like to make > them materialized views. > > - But I'd like the most recent data to be included in the results > of my queries. So I think I need a combined view that is the > union of the materialized view and a non-materialised view for > the recent values. Assuming your table is insert-only: How about instead of using a materialized view at all, you define a table of hourly summaries which your script updates, and define a view which merges that with an on-the-fly summary of main table rows newer than the most recent summarised hour.
Re: concurrent re-partitioning of declarative partitioned tables
On Wed, 2 Dec 2020 at 15:59, Michael Lewis wrote: > You can not have overlapping partitions that are both attached. > Not directly, no. That's why I'm considering the _partition_channel hack. Why do you want to merge partitions that you are "done with" instead of > just leaving them partitioned by day? > I have some random access index lookups on columns not in the partition key, where values are unique over the entire table so at most one partition is going to return a row. A lookup that touches 4 or 5 pages in each of 100 partition indexes is more expensive than one that touches 6 or 7 pages in each of 10 larger partition indexes. Why are you partitioning at all? Are you confident that you need partitions > for performance & that the trade-offs are worth the cost, or are you > needing to detach/drop old data quickly to adhere to a retention policy? > I do want cheap drops of old data, but also many queries have indexable conditions on non-key columns and also only want records from the most recent N days, so partition pruning is useful there with small partitions for recent records.
Re: concurrent re-partitioning of declarative partitioned tables
On Wed, 2 Dec 2020 at 16:07, David G. Johnston wrote: > On Mon, Nov 30, 2020 at 8:36 AM Nick Cleaton wrote: > >> I want to set up a large table on postgresql 12.4, using declarative >> partitioning to partition by record creation date. I'd like to have recent >> records in small partitions but old records in a few larger partitions, so >> I want merges. The merges should be concurrent, in the sense that they lock >> out readers or writers only for very short intervals if at all. >> > > Once a date has passed is the table for that date effectively read-only? > No, old records get updated from time to time, although updates are much less common than for recent records.
concurrent re-partitioning of declarative partitioned tables
I want to set up a large table on postgresql 12.4, using declarative partitioning to partition by record creation date. I'd like to have recent records in small partitions but old records in a few larger partitions, so I want merges. The merges should be concurrent, in the sense that they lock out readers or writers only for very short intervals if at all. I'm looking at adding an extra boolean column and partitioning on that at the top level, with two parallel date-based partition trees underneath, so that I can effectively have overlapping date partitions: create table mytable ( record_date timestamp with time zone not null, _partition_channel boolean, ... ) partition by list (_partition_channel); create table mytable_chan_null partition of mytable for values in (null) partition by range (record_date); create table mytable_chan_true partition of mytable for values in (true) partition by range (record_date); create table mytable_day_20200101 partition of mytable_chan_null for values from ('2020-01-01') to ('2020-01-02'); ... create table mytable_day_20200107 partition of mytable_chan_null for values from ('2020-01-07') to ('2020-01-08'); Then to merge several day-partitions into a week-partition: create table mytable_week_20200101 partition of mytable_chan_true for values from ('2020-01-01') to ('2020-01-08'); ... and migrate rows in batches by updating _partition_channel to true, then finally drop the empty day partitions. Since record_date is an insertion timestamp, I don't mind that after this merge updating the record_date of a merged row could fail due to a missing partition. Likewise there's no need for new rows to be inserted with record_date values in previously merged ranges. Questions: Are there any hidden pitfalls with this approach ? Have I missed a simpler way ? Is there a project out there that will manage this for me ?
Re: A limit clause can cause a poor index choice
On Tue, 19 May 2020 at 22:15, Michael Lewis wrote: > Increase default_statistics_target, at least on that column, and see if you get a much much better plan. I don't know where I got this query from online, but here ya go. I'd be curious how frac_MCV in this changes when default_statistics_target is more like 250 or 500 and the table is analyzed again to reflect that change. It chooses the fast plan for a limit of 10 if the stats target is approaching the number of distinct customer_id values, which is 6000 for this test table: stats | frac_mcv | n_distinct | n_mcv | n_hist | correlation | l10 | l100 | l1000 ---+-++---++-+-+--+--- -1 | 0.01566 | 5728 |34 |101 | 0.98172975 | f | f | t 150 | 0.01505 | 5821 |38 |151 | 0.9817175 | f | f | t 250 | 0.04347998 | 5867 | 134 |251 | 0.98155195 | f | t | t 500 | 0.12606017 | 5932 | 483 |501 | 0.98155344 | f | t | t 750 | 0.18231618 | 5949 | 750 |751 | 0.98166454 | f | t | t 1000 | 0.2329197 | 5971 | 1000 | 1001 | 0.9816691 | f | t | t 1500 | 0.3312785 | 5982 | 1500 | 1501 |0.981609 | f | t | t 3000 | 0.6179379 | 5989 | 3000 | 2989 |0.981612 | f | t | t 4000 | 0.8033856 | 5994 | 4000 | 1994 | 0.9816348 | f | t | t 4500 | 0.8881603 | 5994 | 4500 | 1494 | 0.98160636 | f | t | t 4800 | 0.9281193 | 5993 | 4800 | 1193 | 0.9816273 | f | t | t 4900 | 0.9396781 | 5994 | 4900 | 1094 | 0.9816546 | f | t | t 5000 | 0.9500147 | 5993 | 5000 |993 | 0.9816481 | t | t | t 6000 |0.999714 | 5996 | 5923 | 73 | 0.98162216 | t | t | t 1 | 0.5905 | 5998 | 5970 | 28 | 0.98164326 | t | t | t
Re: A limit clause can cause a poor index choice
On Tue, 19 May 2020 at 21:56, Mohamed Wael Khobalatte wrote: > I believe a second ordering, by id desc, will get your query to use the right > index, and shouldn't be functionally different from what you would expect. Thanks, that works nicely on our production table, even with much larger sets of customer_id values. > What happens when you raise the limit? Say to a 1000? A limit of 1000 makes it choose the fast plan. A limit of 100 causes it to choose the fast plan if I raise the stats target on that column to 250 or above, otherwise not.
A limit clause can cause a poor index choice
The attached script builds a 10G test table which demonstrates a problem that we have in production with postgresql 12.3-1.pgdg18.04+1 on ubuntu linux. Indexes: test_orders_o_date_idx btree(o_date) test_orders_customer_id_o_date_idx btree(customer_id, o_date) We query for the most recent orders for sets of customers, and sometimes none of those customers have any orders and the results are empty: explain analyze select * from test_orders where customer_id=any(ARRAY[9993,9997,9912,9954,9100,9101,9102,9234,9500,9512]) order by o_date desc; QUERY PLAN Sort (cost=24848.96..24870.67 rows=8686 width=1839) (actual time=1.101..1.102 rows=0 loops=1) Sort Key: o_date DESC Sort Method: quicksort Memory: 25kB -> Index Scan using test_orders_customer_id_o_date_idx on test_orders (cost=0.43..17361.20 rows=8686 width=1839) (actual time=1.047..1.047 rows=0 loops=1) Index Cond: (customer_id = ANY ('{9993,9997,9912,9954,9100,9101,9102,9234,9500,9512}'::integer[])) Planning Time: 3.821 ms Execution Time: 1.174 ms (7 rows) So far so good. But if we add a limit clause to the query then the plan goes very wrong: explain analyze select * from test_orders where customer_id=any(ARRAY[9993,9997,9912,9954,9100,9101,9102,9234,9500,9512]) order by o_date desc limit 10; QUERY PLAN - Limit (cost=0.43..1660.98 rows=10 width=1839) (actual time=4990.424..4990.424 rows=0 loops=1) -> Index Scan Backward using test_orders_o_date_idx on test_orders (cost=0.43..1442355.43 rows=8686 width=1839) (actual time=4990.423..4990.423 rows=0 loops=1) Filter: (customer_id = ANY ('{9993,9997,9912,9954,9100,9101,9102,9234,9500,9512}'::integer[])) Rows Removed by Filter: 500 Planning Time: 0.063 ms Execution Time: 4990.435 ms Is there something we can adjust to get it to prefer test_orders_customer_id_o_date_idx even when there's a limit clause ? #!/usr/bin/python3 import random import datetime secs_in_day = 24*60*60 longstr = """iufdpoaiusoto3u5034534i5j345k345lku09s80s9dfjwer.,newrwwerwerwerlwerjlwejrlkewjr""" * 10 print(""" drop table if exists test_orders; drop sequence if exists test_orders_id_seq; CREATE SEQUENCE test_orders_id_seq START WITH 1 INCREMENT BY 1 NO MINVALUE NO MAXVALUE CACHE 1; CREATE TABLE test_orders ( id integer DEFAULT nextval('test_orders_id_seq'::regclass) NOT NULL, o_date timestamp with time zone NOT NULL, customer_id integer, str1 text, num1 integer, long1 text, long2 text, long3 text, long4 text ); COPY test_orders(o_date, customer_id, str1, num1, long1, long2, long3, long4) FROM stdin;""") for day in range(5000): orders = [(secs_in_day * day + random.randrange(secs_in_day), customer) for customer in range(day, day+1000)] for o_date, customer_id in sorted(orders): print(datetime.datetime.fromtimestamp(1234234234 + o_date).isoformat(), customer_id, "blah", random.randrange(100), longstr, longstr, longstr, longstr, sep="\t") print("""\\. create index test_orders_o_date_idx on test_orders using btree(o_date); create index test_orders_customer_id_o_date_idx on test_orders using btree(customer_id, o_date); analyze test_orders; """)
Re: how to securely delete the storage freed when a table is dropped?
On 13 April 2018 at 18:48, Jonathan Morganwrote: > For a system with information stored in a PostgreSQL 9.5 database, in which > data stored in a table that is deleted must be securely deleted (like shred > does to files), and where the system is persistent even though any > particular table likely won't be (so can't just shred the disks at > "completion"), I'm trying to figure out my options for securely deleting the > underlying data files when a table is dropped. > > As background, I'm not a DBA, but I am an experienced implementor in many > languages, contexts, and databases. I've looked online and haven't been able > to find a way to ask PostgreSQL to do the equivalent of shredding its > underlying files before releasing them to the OS when a table is DROPped. Is > there a built-in way to ask PostgreSQL to do this? (I might just not have > searched for the right thing - my apologies if I missed something) > > A partial answer we're looking at is shredding the underlying data files for > a given relation and its indexes manually before dropping the tables, but > this isn't so elegant, and I'm not sure it is getting all the information > from the tables that we need to delete. > > We also are looking at strategies for shredding free space on our data disk > - either running a utility to do that, or periodically replicating the data > volume, swapping in the results of the copy, then shredding the entire > volume that was the source so its "free" space is securely overwritten in > the process. > > Are we missing something? Are there other options we haven't found? If we > have to clean up manually, are there other places we need to go to shred > data than the relation files for a given table, and all its related indexes, > in the database's folder? Any help or advice will be greatly appreciated. Can you encrypt the data in the application, above the DB level ? That would be cleaner if you can. If not, you'll have to worry about both the DB's data files themselves and the WAL files in pg_xlog/ which hold copies of the recently written data. Even if you securely scrub the deleted parts of the filesystems after dropping the table, there could still be copies of secret table data in WAL files that haven't yet been overwritten. One way to scrub deleted files would be to use ZFS and have an extra disk. When it's time to scrub, "zpool attach" the extra disk to your zpool, which will cause ZFS to copy over only the files that haven't been deleted, in the background. When that's finished you can detach the original disk from the zpool and then do a low-level overwrite of that entire disk. For extra security points use encrypted block devices underneath ZFS, and instead of scrubbing the disk just destroy the encryption key that you were using for it.