Re: surprisingly slow creation of gist index used in exclude constraint
Hi, A year and a half later, now that version 14 is the latest available, I wonder if anything has changed with respect to gist index creation? Probably also worth asking: are there now different index types this application should be using? cheers, Chris On 14/05/2020 21:11, Chris Withers wrote: Hi, I'm upgrading a database from 9.4 to 11.5 by dumping from the old cluster and loading into the new cluster. The database is tiny: around 2.3G, but importing this table is proving problematic: Column | Type|Modifiers +---+-- period | tsrange | not null col1 | character varying | not null col2 | character varying | not null col3 | integer | not null col4 | character varying | not null default ''::character varying id | integer | not null default nextval('mkt_profile_id_seq'::regclass) deleted| boolean | not null default false managed| boolean | not null default false col5 | character varying | Indexes: "mkt_profile_pkey" PRIMARY KEY, btree (id) "mkt_profile_period_col1_col4_col2_chan_excl" EXCLUDE USING gist (period WITH &&, col1 WITH =, col4 WITH =, col2 WITH =, col3 WITH =) Check constraints: "mkt_profile_period_check" CHECK (period <> 'empty'::tsrange) Foreign-key constraints: "mkt_profile_col1_fkey" FOREIGN KEY (col1) REFERENCES host(name) It has 4.1 million rows in it and while importing the data only takes a couple of minutes, when I did a test load into the new cluster, building the mkt_profile_period_col1_col4_col2_chan_excl index for the exclude constraint took 15 hours. I feel like asking what I'm doing wrong here? The new server is pretty decent hardware... Concrete questions: - what, if anything, am I getting badly wrong here? - what can I do to speed up creation of this index? - failing that, what can I do to import and then create the index in the background? As you can imagine, a 15hr outage for an upgrade has not met with large amounts of happiness from the people whose application it is ;-) Chris
Re: surprisingly slow creation of gist index used in exclude constraint
On 14/05/2020 21:31, Tom Lane wrote: Chris Withers writes: It has 4.1 million rows in it and while importing the data only takes a couple of minutes, when I did a test load into the new cluster, building the mkt_profile_period_col1_col4_col2_chan_excl index for the exclude constraint took 15 hours. Don't recall for sure, but I think GIST index build is sensitive to the maintenance_work_mem setting; did you have that cranked up? postgres=# show maintenance_work_mem; maintenance_work_mem -- 2GB (1 row) Would it be worth turning that up more? The server has ~130GB memory. - failing that, what can I do to import and then create the index in the background? CREATE INDEX CONCURRENTLY, perhaps. How would I bring this into play with respect to the dump and load cycle? Is there an option to pg_dump or something else I should use or is this a case of awk'ing the create index in the dump? Chris
Re: surprisingly slow creation of gist index used in exclude constraint
On 14/05/2020 21:16, k...@rice.edu wrote: Hi Chris, This sounds like a candidate for pg_logical replicating from the old to new system. Can you point me to a good guide as to how to easily set this up for one database and would work between pg 9.4 and pg 11.5? cheers, Chris
surprisingly slow creation of gist index used in exclude constraint
Hi, I'm upgrading a database from 9.4 to 11.5 by dumping from the old cluster and loading into the new cluster. The database is tiny: around 2.3G, but importing this table is proving problematic: Column | Type|Modifiers +---+-- period | tsrange | not null col1 | character varying | not null col2 | character varying | not null col3 | integer | not null col4 | character varying | not null default ''::character varying id | integer | not null default nextval('mkt_profile_id_seq'::regclass) deleted| boolean | not null default false managed| boolean | not null default false col5 | character varying | Indexes: "mkt_profile_pkey" PRIMARY KEY, btree (id) "mkt_profile_period_col1_col4_col2_chan_excl" EXCLUDE USING gist (period WITH &&, col1 WITH =, col4 WITH =, col2 WITH =, col3 WITH =) Check constraints: "mkt_profile_period_check" CHECK (period <> 'empty'::tsrange) Foreign-key constraints: "mkt_profile_col1_fkey" FOREIGN KEY (col1) REFERENCES host(name) It has 4.1 million rows in it and while importing the data only takes a couple of minutes, when I did a test load into the new cluster, building the mkt_profile_period_col1_col4_col2_chan_excl index for the exclude constraint took 15 hours. I feel like asking what I'm doing wrong here? The new server is pretty decent hardware... Concrete questions: - what, if anything, am I getting badly wrong here? - what can I do to speed up creation of this index? - failing that, what can I do to import and then create the index in the background? As you can imagine, a 15hr outage for an upgrade has not met with large amounts of happiness from the people whose application it is ;-) Chris
Re: Fwd: sensible configuration of max_connections
On 07/02/2020 12:49, Chris Ellis wrote: What's "too much" for max_connections? What happens when you set it to high? What factors affect that number? When sizing max_connections you need to trade off how many connections your application will use at peak vs how much RAM and CPU you have. Each connection is capable of allocating work_mem and has a stack etc. As such you don't want max_connections to be able to run your system out of RAM. Sure, but that's where I'm trying to find out what's sensible. The box has 196GB memory, most of that in hugepages, 18 core Intel Skylake with HT on giving 36 cores and tonnes of SSD for storage. How would I turn that spec into a sensible number for max_connections? As that number grows, what contention points in postgres will start creaking (shared memory where the IPC happens?) In case I forgot to say, this is PostgreSQL 11... Chris PS: definitely thinking of pg_bouncer, but still trying to figure out what to sensibly set for max_connections.
sensible configuration of max_connections
Hi All, What's a sensible way to pick the number to use for max_connections? I'm looking after a reasonable size multi-tenant cluster, where the master handles all the load and there's a slave in case of hardware failure in the master. The machine is used to host what I suspect are mainly django applications, so lots of short web requests, not sure how much, if any, django's orm does connection pooling. I arbitrarily picked 1000 for max_connections and haven't had any problems, but onboarding another app that handles a few million requests per day on Monday and thinking about potential problems related to the number of available connections. What's "too much" for max_connections? What happens when you set it to high? What factors affect that number? cheers, Chris
Re: here does postgres take its timezone information from?
On 05/11/2019 22:54, Adrian Klaver wrote: On 11/5/19 2:46 PM, Chris Withers wrote: Hi All, Brazil recently abolished daylight savings time, resulting in updates to system timezone information packages. Does postgres use these? If so, does it need a reload or restart to see the updated zone info? If not, how does postgres store/obtain its timezone zone information and how would this be updated? https://www.postgresql.org/about/news/1960/ PostgreSQL 11.5, 10.10, 9.6.15, 9.5.19, 9.4.24, and 12 Beta 3 Released! "This update also contains tzdata release 2019b for DST law changes in Brazil, plus historical corrections for Hong Kong, Italy, and Palestine. This update also adds support for zic's new -b slim option to reduce the size of the installed zone files, though it is not currently being used by PostgreSQL." Hmm. Is there any option to use the system timezone packages? If not, why not? Chris
here does postgres take its timezone information from?
Hi All, Brazil recently abolished daylight savings time, resulting in updates to system timezone information packages. Does postgres use these? If so, does it need a reload or restart to see the updated zone info? If not, how does postgres store/obtain its timezone zone information and how would this be updated? cheers, Chris
Re: granting right to create and delete just one database
On 05/06/2019 09:52, Laurenz Albe wrote: Chris Withers wrote: Is there any way to grant rights to a user such that they can drop and re-create only a single database? No; what I'd do if I needed that is to create a SECURITY DEFINER function that is owned by a user with the CREATEDB privilege. This function can be called by a normal user that has the EXECUTE privilege on the function. Don't forget to "SET search_path" on such a function (as mentioned in the documentation). It might also be a good idea to REVOKE EXECUTE on the function from PUBLIC. Thanks, that's a great idea! Is this pattern documented anywhere as a complete finished thing? cheers, Chris
granting right to create and delete just one database
Hi All, Is there any way to grant rights to a user such that they can drop and re-create only a single database? cheers, Chris
why would postgres be throttling a streaming replication slot's sending?
On 11/12/2018 14:48, Achilleas Mantzios wrote: On 11/12/18 4:00 μ.μ., Chris Withers wrote: I'm looking after a multi-tenant PG 9.4 cluster, and we've started getting alerts for the number of WALs on the server. It'd be great to understand what's generating all that WAL and what's likely to be causing any problems.\ Regarding you wals in pg_wal, a good threshold could be anything more than a e.g. 10% increase from wal_keep_segments with a trend to go up. If this number goes up chances are something bad is happening. wal_keep_segments is 0 here, so I went hunting in pg_replication_slots and found that it's the barman slot that's behind on the WAL. strace'ing on the barman receiving side shows it waiting on the socket, so no problem on that side. on the sending side, it's a little confusing, since the server is basically idle, no cpu or disk activity, and yet strace'ing the sending process attached to the other end of the socket shows time being spent waiting on a poll which, while it includes the socket being sent to, also includes a bunch of pipes. I've attached a chunk of the output below in the hope that someone on this list could offer an explanation as to what might cause the WAL to be trickling over this port rather than catching up as fast as it can? cheers, Chris strace output for the streaming_barman slot servicing the barman replication slot: --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=3577, si_uid=26} --- write(4, "\0", 1) = 1 rt_sigreturn({mask=[]}) = -1 EINTR (Interrupted system call) read(3, "\0", 16) = 1 read(6, 0x7fffdcd7, 1) = -1 EAGAIN (Resource temporarily unavailable) recvfrom(10, 0x7fffdcaf, 1, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) read(5, "\3\1\0\0\7\326\355\2\343\0\0\0@\n\0\0\0\17\1\230\313\4\0\0)\347\372l\0\0\0\0"..., 4272) = 4272 sendto(10, "d\0\0\20\315w\0\0\4\313\230\1\17P\0\0\4\313\230\1 \0\0\2\37\321\343\345\306\20\3\1"..., 4302, 0, NULL, 0) = 4302 read(3, 0x7fffdc90, 16) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=10, events=POLLIN}, {fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 3, 26076) = ? ERESTART_RESTARTBLOCK (Interrupted by signal) --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=3577, si_uid=26} --- write(4, "\0", 1) = 1 rt_sigreturn({mask=[]}) = -1 EINTR (Interrupted system call) read(3, "\0", 16) = 1 read(6, 0x7fffdcd7, 1) = -1 EAGAIN (Resource temporarily unavailable) recvfrom(10, 0x7fffdcaf, 1, 0, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) read(5, "~\320\5\0\3\0\0\0\0 \1\230\313\4\0\0\3\0\0\0\0\0\0\0\323\316\314\0\0\0\0\0"..., 8192) = 8192 sendto(10, "d\0\0 \35w\0\0\4\313\230\1 \0\0\0\4\313\230\1@\0\0\2\37\321\343\361\376\226~\320"..., 8222, 0, NULL, 0) = 8222 read(3, 0x7fffdc90, 16) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=10, events=POLLIN}, {fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 3, 25275) = ? ERESTART_RESTARTBLOCK (Interrupted by signal) --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=3577, si_uid=26} --- Interesting numbers: - process 3577 is the wal writer process - fd 10 is the socket, 3 and 6 are pipes - time is visibly spent sitting on that poll - the network is only seeing 2Mbit/sec, which is nothing cheers, Chris
finding out what's generating WALs
Hi All, With a 9.4 cluster, what's the best way to find out what's generating the most WAL? I'm looking after a multi-tenant PG 9.4 cluster, and we've started getting alerts for the number of WALs on the server. It'd be great to understand what's generating all that WAL and what's likely to be causing any problems.\ More generally, what's number of WALs is "too much"? check_postgres.pl when used in nagios format only appears to be able to alert on absolute thresholds, does this always make sense? What's a good threshold to alert on? cheers, Chris
Re: debugging intermittent slow updates under higher load
On 06/12/2018 11:00, Alexey Bashtanov wrote: I'm loath to start hacking something up when I'd hope others have done a better job already... If you log all queries that take more than a second to complete, is your update the only one logged, or something (the would-be blocker) gets logged down together with it? Nope, only ones logged are these updates. Chris
Re: debugging intermittent slow updates under higher load
On 05/12/2018 15:40, Alexey Bashtanov wrote: One of the reasons could be the row already locked by another backend, doing the same kind of an update or something different. Are these updates performed in a longer transactions? Nope, the transaction will just be updating one row at a time. Can they hit the same row from two clients at the same time? I've looked for evidence of this, but can't find any. Certainly nothing running for 2-10s, queries against this table are normally a few hundred ms. Is there any other write or select-for-update/share load on the table? Not that I'm aware of. How would I go about getting metrics on problems like these? Have you tried periodical logging of the non-granted locks? Try querying pg_stat_activity and pg_locks (possibly joined and maybe repeatedly self-joined, google for it) to get the backends that wait one for another while competing for to lock the same row or object. Is there any existing tooling that does this? I'm loath to start hacking something up when I'd hope others have done a better job already... Chris
Re: surprising query optimisation
On 05/12/2018 14:38, Stephen Frost wrote: Greetings, * Chris Withers (ch...@withers.org) wrote: On 30/11/2018 15:33, Stephen Frost wrote: * Chris Withers (ch...@withers.org) wrote: On 28/11/2018 22:49, Stephen Frost wrote: For this, specifically, it's because you end up with exactly what you have: a large index with tons of duplicate values. Indexes are particularly good when you have high-cardinality fields. Now, if you have a skewed index, where there's one popular value and a few much less popular ones, then that's where you really want a partial index (as I suggest earlier) so that queries against the non-popular value(s) is able to use the index and the index is much smaller. Interesting! In my head, for some reason, I'd always assumed a btree index would break down a char field based on the characters within it. Does that never happen? Not sure what you mean by 'break down a char field'. Rather than breaking into three buckets ('NEW', 'ACK', 'RSV'), a more complicated hierarchy ('N', 'NE', 'A', 'AC', etc). If I changed this to be an enum field, would != still perform poorly or can the query optimiser spot that it's an enum and just look for the other options? I don't believe we've got any kind of optimization like that today for enums. Good to know, I see query optimisers as magic, and postgres often seems to achieve magic results ;-) Chris
Re: surprising query optimisation
On 30/11/2018 22:10, Gavin Flower wrote: I once optimised a very complex set queries that made extensive use of indexes. However, with the knowledge I have today, I would have most likely had fewer and smaller indexes. As I now realize, that some of my indexes were probably counter productive, especially as I'd given no thought to how much RAM would be required to host the data plus indexes! Fortunately, while the objective was to run all those queries within 24 hours, they actually only took a couple of hours. So, interestingly, this box has 250GB memory in it, and even though I've set effective_cache_size to 200GB, I only see 9G of memory being used. How can I persuade postgres to keep more in memory? cheers, Chris
Re: surprising query optimisation
On 30/11/2018 15:33, Stephen Frost wrote: Greetings, * Chris Withers (ch...@withers.org) wrote: On 28/11/2018 22:49, Stephen Frost wrote: * Chris Withers (ch...@withers.org) wrote: We have an app that deals with a lot of queries, and we've been slowly seeing performance issues emerge. We take a lot of free form queries from users and stumbled upon a very surprising optimisation. So, we have a 'state' column which is a 3 character string column with an index on it. Despite being a string, this column is only used to store one of three values: 'NEW', 'ACK', or 'RSV'. Sounds like a horrible field to have an index on. That's counter-intuitive for me. What leads you to say this and what would you do/recommend instead? For this, specifically, it's because you end up with exactly what you have: a large index with tons of duplicate values. Indexes are particularly good when you have high-cardinality fields. Now, if you have a skewed index, where there's one popular value and a few much less popular ones, then that's where you really want a partial index (as I suggest earlier) so that queries against the non-popular value(s) is able to use the index and the index is much smaller. Interesting! In my head, for some reason, I'd always assumed a btree index would break down a char field based on the characters within it. Does that never happen? If I changed this to be an enum field, would != still perform poorly or can the query optimiser spot that it's an enum and just look for the other options? cheers, Chris
Re: surprising query optimisation
On 30/11/2018 12:55, Stephen Frost wrote: > I'd suggest you check out the wiki article written about this kind of > question: > > https://wiki.postgresql.org/wiki/Slow_Query_Questions Have you tried a partial index on state!=‘RSV’? The solution I originally posted, that we do easily enough at our query generation layer, is working perfectly, but this is good to know for next time. My post here is mainly to try and understand what's going on so I can improve my general feel for how to use postgres at it's best. So, why was the query ending up being a big scan rather than some quick lookups based on the index? cheers, Chris
Re: surprising query optimisation
On 28/11/2018 22:49, Stephen Frost wrote: * Chris Withers (ch...@withers.org) wrote: We have an app that deals with a lot of queries, and we've been slowly seeing performance issues emerge. We take a lot of free form queries from users and stumbled upon a very surprising optimisation. So, we have a 'state' column which is a 3 character string column with an index on it. Despite being a string, this column is only used to store one of three values: 'NEW', 'ACK', or 'RSV'. Sounds like a horrible field to have an index on. That's counter-intuitive for me. What leads you to say this and what would you do/recommend instead? Really though, if you want something more than wild speculation, posting the 'explain analyze' of each query along with the actual table definitions and sizes and such would be the best way to get it. table definition: # \d alerts_alert Table "public.alerts_alert" Column | Type | Modifiers -+--+--- tags| jsonb| not null id | character varying(86)| not null earliest_seen | timestamp with time zone | not null latest_seen | timestamp with time zone | not null created | timestamp with time zone | not null modified| timestamp with time zone | not null type| character varying(300) | not null state | character varying(3) | not null until | timestamp with time zone | latest_note | text | not null created_by_id | integer | not null modified_by_id | integer | not null owner_id| integer | owning_group_id | integer | not null latest_new | timestamp with time zone | not null Indexes: "alerts_alert_pkey" PRIMARY KEY, btree (id) "alert_tags_index" gin (tags) "alerts_alert_1efacf1d" btree (latest_seen) "alerts_alert_3103a7d8" btree (until) "alerts_alert_599dcce2" btree (type) "alerts_alert_5e7b1936" btree (owner_id) "alerts_alert_9ae73c65" btree (modified) "alerts_alert_9ed39e2e" btree (state) "alerts_alert_b3da0983" btree (modified_by_id) "alerts_alert_c5151f5a" btree (earliest_seen) "alerts_alert_e2fa5388" btree (created) "alerts_alert_e93cb7eb" btree (created_by_id) "alerts_alert_efea2d76" btree (owning_group_id) "alerts_alert_id_13155e16_like" btree (id varchar_pattern_ops) "alerts_alert_latest_new_e8d1fbde_uniq" btree (latest_new) "alerts_alert_state_90ab480b_like" btree (state varchar_pattern_ops) "alerts_alert_type_3021f46f_like" btree (type varchar_pattern_ops) Foreign-key constraints: "alerts_alert_created_by_id_520608c0_fk_alerts_user_id" FOREIGN KEY (created_by_id) REFERENCES alerts_user(id) DEFERRABLE INITIALLY DEFERRED "alerts_alert_modified_by_id_6db4b04b_fk_alerts_user_id" FOREIGN KEY (modified_by_id) REFERENCES alerts_user(id) DEFERRABLE INITIALLY DEFERRED "alerts_alert_owner_id_0c00548a_fk_alerts_user_id" FOREIGN KEY (owner_id) REFERENCES alerts_user(id) DEFERRABLE INITIALLY DEFERRED "alerts_alert_owning_group_id_a4869b66_fk_auth_group_id" FOREIGN KEY (owning_group_id) REFERENCES auth_group(id) DEFERRABLE INITIALLY DEFERRED Referenced by: TABLE "alerts_alertevent" CONSTRAINT "alerts_alertevent_alert_id_edd734b8_fk_alerts_alert_id" FOREIGN KEY (alert_id) REFERENCES alerts_alert(id) DEFERRABLE INITIALLY DEFERRED Row counts by state: # select state, count(*) from alerts_alert group by 1 order by 1; state | count ---+- ACK |1053 NEW |1958 RSV | 1528623 (3 rows) here's an example of the "bad" query plan: https://explain.depesz.com/s/cDkp here's an example with all the "state!='RSV'" clauses rewritten as I described: https://explain.depesz.com/s/B9Xi I'd suggest you check out the wiki article written about this kind of question: https://wiki.postgresql.org/wiki/Slow_Query_Questions Thanks! Chris
surprising query optimisation
Hi All, We have an app that deals with a lot of queries, and we've been slowly seeing performance issues emerge. We take a lot of free form queries from users and stumbled upon a very surprising optimisation. So, we have a 'state' column which is a 3 character string column with an index on it. Despite being a string, this column is only used to store one of three values: 'NEW', 'ACK', or 'RSV'. One of our most common queries clauses is "state!='RSV'" and we've found that by substituting this clause with "state='ACK' or state='NEW'" wherever it was used, we've dropped the postgres server's load average from 20 down to 4 and the CPU usage from 60% in user space down to <5%. This seems counter-intuitive to me, so thought I'd ask here. Why would this be likely to make such a difference? We're currently on 9.4, is this something that's likely to be different (better? worse?) if we got all the way up to 10 or 11? cheers, Chris