Pulling initial physical replication pg_basebackup from a downstream server
I'm changing out a data center and I need to setup a new replicated server. The bandwidth speeds between the new data center and the master are slower than the speeds between the new data center and the current replica. Can I get the pg_base_backup from the current replica and then tell the new server to connect to the master? Would I need to do anything special to keep the master from discarding the transaction logs needed to come up to speed? Thanks!
Re: Help troubleshooting SubtransControlLock problems
On Wed, Mar 7, 2018 at 3:50 PM, Alvaro Herrera <alvhe...@alvh.no-ip.org> wrote: > > Scott Frazer wrote: > > > It's only happening on the read replicas, though. I've just set my master > > to handle all the traffic, but that's not really sustainable > > I failed to notice at start of thread that these were replicas. I > suppose then you would have to be thinking about the range of XIDs in > the primary server. Maybe you left some old transaction open, or an > uncommitted prepared transaction there? > > If not, I suppose you're up for some "perf" profiling ... I think you got it. I found a 2-day old transaction on the master server that never closed. I had to pg_terminate_backend on it and the problem seems to have cleared up.
Re: Help troubleshooting SubtransControlLock problems
On Wed, Mar 7, 2018 at 10:39 AM, Alvaro Herrerawrote: > > Laurenz Albe wrote: > I think you could get in this situation if the range of open > transactions exceeds what fits in the buffers for subtrans.c pages, and > the subtransaction cache overflows (64 entries apiece; > PGPROC_MAX_CACHED_SUBXIDS in src/include/storage/proc.h). Each page is > 2048 transactions (4 bytes per xact, 8192 bytes per page; > SUBTRANS_XACTS_PER_PAGE in src/backend/access/transam/subtrans.c), and > we keep 32 pages (NUM_SUBTRANS_BUFFERS src/include/access/subtrans.h). > So if your oldest transaction is over 64k XIDs old, and you have > transactions with more than 64 subtransactions, you get in trouble. > > A simple solution is to increase NUM_SUBTRANS_BUFFERS and recompile. Currently this seems to be happening when the server just starts taking connections. It doesn't even need time to scale up anymore (previously it took about 8 hours for the problem to re-appear) It's only happening on the read replicas, though. I've just set my master to handle all the traffic, but that's not really sustainable
Re: Help troubleshooting SubtransControlLock problems
Server version is 9.6.5 Is there a decent guide to getting a stack trace on Centos7 when using the official Postgres repo? trying to follow the Fedora guide put the debug info for 9.2.23 on the box instead of the 9.6.5 version. On Wed, Mar 7, 2018 at 9:52 AM, Laurenz Albe <laurenz.a...@cybertec.at> wrote: > Scott Frazer wrote: > > Hi, we have a Postgres 9.6 setup using replication that has recently > started seeing a lot of processes stuck in > > "SubtransControlLock" as a wait_event on the read-replicas. Like this, > only usually about 300-800 of them: > > > > > > 179706 | LWLockNamed | SubtransControlLock > > 186602 | LWLockNamed | SubtransControlLock > > 186606 | LWLockNamed | SubtransControlLock > > 180947 | LWLockNamed | SubtransControlLock > > 186621 | LWLockNamed | SubtransControlLock > > > > The server then begins to crawl, with some queries just never finishing > until I finally shut the server down. > > > > Searching for that particular combo of wait_event_type and wait_event > only seems to turn up the page > > about statistics collection, but no helpful information on > troubleshooting this lock. > > > > Restarting the replica server clears the locks and allows us to start > working again, but it's happened > > twice now in 12 hours and I'm worried it will happen again. > > > > Does anyone have any advice on where to start looking? > > I don't think there is any connection to statistics collection. > > This lock is used when subtransactions (SAVEPOINTs in SQL or EXCEPTION > blocks > in PL/pgSQL) are created, read or removed. > > This sounds like a PostgreSQL bug. > > What is the exact PostgreSQL version you are running? > > It would be cool if you could get a stack trace from the backend that is > holding the lock. > > Yours, > Laurenz Albe >
Re: Help troubleshooting SubtransControlLock problems
These don't seem like normal locks. Nothing shows up in a "SELECT relation::regclass, * FROM pg_locks WHERE NOT GRANTED;" These processes are all active but the wait_event and wait_event_type fields indicate they are waiting on (I believe) shared memory locks. pid | usesysid | usename| application_name | state_change | wait_e vent_type | wait_event | state +--+--+-+---+--- --+-+ 165024 |16392 | content_user | application | nymapp01.nym | 6192 | 2018-03-07 09:19:09.770155-06 | LWLock Named | SubtransControlLock | active On Tue, Mar 6, 2018 at 11:43 PM, Rene Romero Benavides < rene.romer...@gmail.com> wrote: > For such issues, I find this view very useful (the first one): > https://wiki.postgresql.org/wiki/Lock_Monitoring > > Examine blocking_pid's , and tell us what kind of operation is blocking > the other processes . Also, are there many long running transactions in > your server? > > > 2018-03-06 21:24 GMT-06:00 Scott Frazer <sfra...@couponcabin.com>: > >> Hi, we have a Postgres 9.6 setup using replication that has recently >> started seeing a lot of processes stuck in "SubtransControlLock" as a >> wait_event on the read-replicas. Like this, only usually about 300-800 of >> them: >> >> >> 179706 | LWLockNamed | SubtransControlLock >> >> 186602 | LWLockNamed | SubtransControlLock >> >> 186606 | LWLockNamed | SubtransControlLock >> >> 180947 | LWLockNamed | SubtransControlLock >> >> 186621 | LWLockNamed | SubtransControlLock >> >> The server then begins to crawl, with some queries just never finishing >> until I finally shut the server down. >> >> Searching for that particular combo of wait_event_type and wait_event >> only seems to turn up the page about statistics collection, but no helpful >> information on troubleshooting this lock. >> >> Restarting the replica server clears the locks and allows us to start >> working again, but it's happened twice now in 12 hours and I'm worried it >> will happen again. >> >> Does anyone have any advice on where to start looking? >> >> Thanks, >> Scott >> >> > > > -- > El genio es 1% inspiraciĆ³n y 99% transpiraciĆ³n. > Thomas Alva Edison > http://pglearn.blogspot.mx/ > >
Help troubleshooting SubtransControlLock problems
Hi, we have a Postgres 9.6 setup using replication that has recently started seeing a lot of processes stuck in "SubtransControlLock" as a wait_event on the read-replicas. Like this, only usually about 300-800 of them: 179706 | LWLockNamed | SubtransControlLock 186602 | LWLockNamed | SubtransControlLock 186606 | LWLockNamed | SubtransControlLock 180947 | LWLockNamed | SubtransControlLock 186621 | LWLockNamed | SubtransControlLock The server then begins to crawl, with some queries just never finishing until I finally shut the server down. Searching for that particular combo of wait_event_type and wait_event only seems to turn up the page about statistics collection, but no helpful information on troubleshooting this lock. Restarting the replica server clears the locks and allows us to start working again, but it's happened twice now in 12 hours and I'm worried it will happen again. Does anyone have any advice on where to start looking? Thanks, Scott